OLS

java.lang.Object
- smile.regression.OLS

All Implemented Interfaces:

java.io.Serializable, Regression<double[]>
```
public class OLS
extends java.lang.Object
implements Regression<double[]>
```
Ordinary least squares. In linear regression, the model specification is that the dependent variable is a linear combination of the parameters (but need not be linear in the independent variables). The residual is the difference between the value of the dependent variable predicted by the model, and the true value of the dependent variable. Ordinary least squares obtains parameter estimates that minimize the sum of squared residuals, SSE (also denoted RSS).
The OLS estimator is consistent when the independent variables are exogenous and there is no multicollinearity, and optimal in the class of linear unbiased estimators when the errors are homoscedastic and serially uncorrelated. Under these conditions, the method of OLS provides minimum-variance mean-unbiased estimation when the errors have finite variances.
There are several different frameworks in which the linear regression model can be cast in order to make the OLS technique applicable. Each of these settings produces the same formulas and same results, the only difference is the interpretation and the assumptions which have to be imposed in order for the method to give meaningful results. The choice of the applicable framework depends mostly on the nature of data at hand, and on the inference task which has to be performed.
Least squares corresponds to the maximum likelihood criterion if the experimental errors have a normal distribution and can also be derived as a method of moments estimator.
Once a regression model has been constructed, it may be important to confirm the goodness of fit of the model and the statistical significance of the estimated parameters. Commonly used checks of goodness of fit include the R-squared, analysis of the pattern of residuals and hypothesis testing. Statistical significance can be checked by an F-test of the overall fit, followed by t-tests of individual parameters.
Interpretations of these diagnostic tests rest heavily on the model assumptions. Although examination of the residuals can be used to invalidate a model, the results of a t-test or F-test are sometimes more difficult to interpret if the model's assumptions are violated. For example, if the error term does not have a normal distribution, in small samples the estimated parameters will not follow normal distributions and complicate inference. With relatively large samples, however, a central limit theorem can be invoked such that hypothesis testing may proceed using asymptotic approximations.

See Also:

Serialized Form

Nested Class Summary

Nested Classes
Modifier and Type Class and Description

static class OLS.Trainer
Trainer for linear regression by ordinary least squares.

Nested Classes
Modifier and Type	Class and Description
`static class`	`OLS.Trainer` Trainer for linear regression by ordinary least squares.

Constructor Summary

Constructors
Constructor and Description
`OLS(smile.data.AttributeDataset data)` Constructor.
`OLS(smile.data.AttributeDataset data, boolean SVD)` Constructor.
`OLS(double[][] x, double[] y)` Constructor.
`OLS(double[][] x, double[] y, boolean SVD)` Constructor.

Method Summary

All Methods Instance Methods Concrete Methods
Modifier and Type	Method and Description
`double`	`adjustedRSquared()` Returns adjusted R² statistic.
`double[]`	`coefficients()` Returns the linear coefficients (without intercept).
`int`	`df()` Returns the degree-of-freedom of residual standard error.
`double`	`error()` Returns the residual standard error.
`double[]`	`fittedValues()` Returns the fitted values.
`double`	`ftest()` Returns the F-statistic of goodness-of-fit.
`double`	`intercept()` Returns the intercept.
`double`	`predict(double[] x)` Predicts the dependent variable of an instance.
`double`	`pvalue()` Returns the p-value of goodness-of-fit test.
`double[]`	`residuals()` Returns the residuals, that is response minus fitted values.
`double`	`RSquared()` Returns R² statistic.
`double`	`RSS()` Returns the residual sum of squares.
`java.lang.String`	`toString()`
`double[][]`	`ttest()` Returns the t-test of the coefficients (including intercept).

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait

Methods inherited from interface smile.regression.Regression
predict

- Constructor Detail
  - OLS
```
public OLS(double[][] x,
           double[] y)
```
    Constructor. Learn the ordinary least squares model with QR decomposition.
    
    Parameters:
    
    x - a matrix containing the explanatory variables. NO NEED to include a constant column of 1s for bias.
    
    y - the response values.
  - OLS
```
public OLS(double[][] x,
           double[] y,
           boolean SVD)
```
    Constructor. Learn the ordinary least squares model.
    
    Parameters:
    
    x - a matrix containing the explanatory variables. NO NEED to include a constant column of 1s for bias.
    
    y - the response values.
    
    SVD - If true, use SVD to fit the model. Otherwise, use QR decomposition. SVD is slower than QR but can handle rank-deficient matrix.
  - OLS
```
public OLS(smile.data.AttributeDataset data)
```
    Constructor. Learn the ordinary least squares model.
    
    Parameters:
    
    data - the dataset containing the explanatory and response variables and their attributes. NO NEED to include a constant column of 1s for bias.
  - OLS
```
public OLS(smile.data.AttributeDataset data,
           boolean SVD)
```
    Constructor. Learn the ordinary least squares model.
    
    Parameters:
    
    data - the dataset containing the explanatory and response variables and their attributes. NO NEED to include a constant column of 1s for bias.
    
    SVD - If true, use SVD to fit the model. Otherwise, use QR decomposition. SVD is slower than QR but can handle rank-deficient matrix.
- Method Detail
  - ttest
```
public double[][] ttest()
```
    Returns the t-test of the coefficients (including intercept). The first column is the coefficients, the second column is the standard error of coefficients, the third column is the t-score of the hypothesis test if the coefficient is zero, the fourth column is the p-values of test. The last row is of intercept.
  - coefficients
```
public double[] coefficients()
```
    Returns the linear coefficients (without intercept).
  - intercept
```
public double intercept()
```
    Returns the intercept.
  - residuals
```
public double[] residuals()
```
    Returns the residuals, that is response minus fitted values.
  - fittedValues
```
public double[] fittedValues()
```
    Returns the fitted values.
  - RSS
```
public double RSS()
```
    Returns the residual sum of squares.
  - error
```
public double error()
```
    Returns the residual standard error.
  - df
```
public int df()
```
    Returns the degree-of-freedom of residual standard error.
  - RSquared
```
public double RSquared()
```
    Returns R² statistic. In regression, the R² coefficient of determination is a statistical measure of how well the regression line approximates the real data points. An R² of 1.0 indicates that the regression line perfectly fits the data.
    In the case of ordinary least-squares regression, R² increases as we increase the number of variables in the model (R² will not decrease). This illustrates a drawback to one possible use of R², where one might try to include more variables in the model until "there is no more improvement". This leads to the alternative approach of looking at the adjusted R².
  - adjustedRSquared
```
public double adjustedRSquared()
```
    Returns adjusted R² statistic. The adjusted R² has almost same explanation as R² but it penalizes the statistic as extra variables are included in the model.
  - ftest
```
public double ftest()
```
    Returns the F-statistic of goodness-of-fit.
  - pvalue
```
public double pvalue()
```
    Returns the p-value of goodness-of-fit test.
  - predict
```
public double predict(double[] x)
```
    Description copied from interface: Regression
    
    Predicts the dependent variable of an instance.
    
    Specified by:
    
    predict in interface Regression<double[]>
    
    Parameters:
    
    x - the instance.
    
    Returns:
    
    the predicted value of dependent variable.
  - toString
```
public java.lang.String toString()
```
    Overrides:
    
    toString in class java.lang.Object

Class OLS

Nested Class Summary

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Methods inherited from interface smile.regression.Regression

Constructor Detail

OLS

OLS

OLS

OLS

Method Detail

ttest

coefficients

intercept

residuals

fittedValues

RSS

error

df

RSquared

adjustedRSquared

ftest

pvalue

predict

toString