public class OLS extends java.lang.Object implements Regression<double[]>
The OLS estimator is consistent when the independent variables are exogenous and there is no multicollinearity, and optimal in the class of linear unbiased estimators when the errors are homoscedastic and serially uncorrelated. Under these conditions, the method of OLS provides minimum-variance mean-unbiased estimation when the errors have finite variances.
There are several different frameworks in which the linear regression model can be cast in order to make the OLS technique applicable. Each of these settings produces the same formulas and same results, the only difference is the interpretation and the assumptions which have to be imposed in order for the method to give meaningful results. The choice of the applicable framework depends mostly on the nature of data at hand, and on the inference task which has to be performed.
Least squares corresponds to the maximum likelihood criterion if the experimental errors have a normal distribution and can also be derived as a method of moments estimator.
Once a regression model has been constructed, it may be important to confirm the goodness of fit of the model and the statistical significance of the estimated parameters. Commonly used checks of goodness of fit include the R-squared, analysis of the pattern of residuals and hypothesis testing. Statistical significance can be checked by an F-test of the overall fit, followed by t-tests of individual parameters.
Interpretations of these diagnostic tests rest heavily on the model assumptions. Although examination of the residuals can be used to invalidate a model, the results of a t-test or F-test are sometimes more difficult to interpret if the model's assumptions are violated. For example, if the error term does not have a normal distribution, in small samples the estimated parameters will not follow normal distributions and complicate inference. With relatively large samples, however, a central limit theorem can be invoked such that hypothesis testing may proceed using asymptotic approximations.
Modifier and Type | Class and Description |
---|---|
static class |
OLS.Trainer
Trainer for linear regression by ordinary least squares.
|
Constructor and Description |
---|
OLS(smile.data.AttributeDataset data)
Constructor.
|
OLS(smile.data.AttributeDataset data,
boolean SVD)
Constructor.
|
OLS(double[][] x,
double[] y)
Constructor.
|
OLS(double[][] x,
double[] y,
boolean SVD)
Constructor.
|
Modifier and Type | Method and Description |
---|---|
double |
adjustedRSquared()
Returns adjusted R2 statistic.
|
double[] |
coefficients()
Returns the linear coefficients (without intercept).
|
int |
df()
Returns the degree-of-freedom of residual standard error.
|
double |
error()
Returns the residual standard error.
|
double[] |
fittedValues()
Returns the fitted values.
|
double |
ftest()
Returns the F-statistic of goodness-of-fit.
|
double |
intercept()
Returns the intercept.
|
double |
predict(double[] x)
Predicts the dependent variable of an instance.
|
double |
pvalue()
Returns the p-value of goodness-of-fit test.
|
double[] |
residuals()
Returns the residuals, that is response minus fitted values.
|
double |
RSquared()
Returns R2 statistic.
|
double |
RSS()
Returns the residual sum of squares.
|
java.lang.String |
toString() |
double[][] |
ttest()
Returns the t-test of the coefficients (including intercept).
|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
predict
public OLS(double[][] x, double[] y)
x
- a matrix containing the explanatory variables. NO NEED to include a constant column of 1s for bias.y
- the response values.public OLS(double[][] x, double[] y, boolean SVD)
x
- a matrix containing the explanatory variables. NO NEED to include a constant column of 1s for bias.y
- the response values.SVD
- If true, use SVD to fit the model. Otherwise, use QR decomposition. SVD is slower than QR but
can handle rank-deficient matrix.public OLS(smile.data.AttributeDataset data)
data
- the dataset containing the explanatory and response variables and their attributes. NO NEED to include a constant column of 1s for bias.public OLS(smile.data.AttributeDataset data, boolean SVD)
data
- the dataset containing the explanatory and response variables and their attributes. NO NEED to include a constant column of 1s for bias.SVD
- If true, use SVD to fit the model. Otherwise, use QR decomposition. SVD is slower than QR but
can handle rank-deficient matrix.public double[][] ttest()
public double[] coefficients()
public double intercept()
public double[] residuals()
public double[] fittedValues()
public double RSS()
public double error()
public int df()
public double RSquared()
In the case of ordinary least-squares regression, R2 increases as we increase the number of variables in the model (R2 will not decrease). This illustrates a drawback to one possible use of R2, where one might try to include more variables in the model until "there is no more improvement". This leads to the alternative approach of looking at the adjusted R2.
public double adjustedRSquared()
public double ftest()
public double pvalue()
public double predict(double[] x)
Regression
predict
in interface Regression<double[]>
x
- the instance.public java.lang.String toString()
toString
in class java.lang.Object