public class GLM
extends java.lang.Object
implements java.io.Serializable
In GLM, each outcome Y
of the dependent variables is assumed
to be generated from a particular distribution in an exponential family.
The mean, μ
, of the distribution depends on the
independent variables, X
, through:
E(Y) = μ = g-1(Xβ)where
E(Y)
is the expected value of Y
;
Xβ
is the linear combination of linear predictors
and unknown parameters β; g is the link function that is a monotonic,
differentiable function. THe link function that transforms the mean to
the natural parameter is called the canonical link.
In this framework, the variance is typically a function, V
,
of the mean:
Var(Y) = V(μ) = V(g-1(Xβ))It is convenient if
V
follows from an exponential family
of distributions, but it may simply be that the variance is a function
of the predicted value, such as V(μi) = μi
for the Poisson, V(μi) = μi(1 - μi)
for the Bernoulli, and V(μi) = σ2
(i.e., constant) for the normal.
The unknown parameters, β
, are typically estimated
with maximum likelihood, maximum quasi-likelihood, or Bayesian techniques.
Modifier and Type | Field and Description |
---|---|
protected double[] |
beta
The linear weights.
|
protected double |
deviance
The deviance = 2 * (LogLikelihood(Saturated Model) - LogLikelihood(Proposed Model)).
|
protected double[] |
devianceResiduals
The deviance residuals.
|
protected int |
df
The degrees of freedom of the residual deviance.
|
protected smile.data.formula.Formula |
formula
The symbolic description of the model to be fitted.
|
protected double |
loglikelihood
Log-likelihood.
|
protected Model |
model
The model specifications (link function, deviance, etc.).
|
protected double[] |
mu
The fitted mean values.
|
protected double |
nullDeviance
The null deviance = 2 * (LogLikelihood(Saturated Model) - LogLikelihood(Null Model)).
|
protected double[][] |
ztest
The coefficients, their standard errors, z-scores, and p-values.
|
Constructor and Description |
---|
GLM(smile.data.formula.Formula formula,
java.lang.String[] predictors,
Model model,
double[] beta,
double loglikelihood,
double deviance,
double nullDeviance,
double[] mu,
double[] residuals,
double[][] ztest)
Constructor.
|
Modifier and Type | Method and Description |
---|---|
double |
AIC()
Returns the AIC score.
|
double |
BIC()
Returns the BIC score.
|
double[] |
coefficients()
Returns an array of size (p+1) containing the linear weights
of binary logistic regression, where p is the dimension of
feature vectors.
|
double |
deviance()
Returns the deviance of model.
|
double[] |
devianceResiduals()
Returns the deviance residuals.
|
static GLM |
fit(smile.data.formula.Formula formula,
smile.data.DataFrame data,
Model model)
Fits the generalized linear model with IWLS (iteratively reweighted least squares).
|
static GLM |
fit(smile.data.formula.Formula formula,
smile.data.DataFrame data,
Model model,
double tol,
int maxIter)
Fits the generalized linear model with IWLS (iteratively reweighted least squares).
|
static GLM |
fit(smile.data.formula.Formula formula,
smile.data.DataFrame data,
Model model,
java.util.Properties prop)
Fits the generalized linear model with IWLS (iteratively reweighted least squares).
|
double[] |
fittedValues()
Returns the fitted mean values.
|
double |
loglikelihood()
Returns the log-likelihood of model.
|
double[] |
predict(smile.data.DataFrame df)
Predicts the mean response.
|
double |
predict(smile.data.Tuple x)
Predicts the mean response.
|
java.lang.String |
toString() |
double[][] |
ztest()
Returns the z-test of the coefficients (including intercept).
|
protected smile.data.formula.Formula formula
protected Model model
protected double[] beta
protected double[][] ztest
protected double[] mu
protected double nullDeviance
The saturated model, also referred to as the full model or maximal model, allows a different mean response for each group of replicates. One can think of the saturated model as having the most general possible mean structure for the data since the means are unconstrained.
The null model assumes that all observations have the same distribution with common parameter. Like the saturated model, the null model does not depend on predictor variables. While the saturated most is the most general model, the null model is the most restricted model.
protected double deviance
protected double[] devianceResiduals
protected int df
protected double loglikelihood
public GLM(smile.data.formula.Formula formula, java.lang.String[] predictors, Model model, double[] beta, double loglikelihood, double deviance, double nullDeviance, double[] mu, double[] residuals, double[][] ztest)
public double[] coefficients()
public double[][] ztest()
public double[] devianceResiduals()
public double[] fittedValues()
public double deviance()
public double loglikelihood()
public double AIC()
public double BIC()
public double predict(smile.data.Tuple x)
public double[] predict(smile.data.DataFrame df)
public java.lang.String toString()
toString
in class java.lang.Object
public static GLM fit(smile.data.formula.Formula formula, smile.data.DataFrame data, Model model)
formula
- a symbolic description of the model to be fitted.data
- the data frame of the explanatory and response variables.public static GLM fit(smile.data.formula.Formula formula, smile.data.DataFrame data, Model model, java.util.Properties prop)
formula
- a symbolic description of the model to be fitted.data
- the data frame of the explanatory and response variables.public static GLM fit(smile.data.formula.Formula formula, smile.data.DataFrame data, Model model, double tol, int maxIter)
formula
- a symbolic description of the model to be fitted.data
- the data frame of the explanatory and response variables.tol
- the tolerance for stopping iterations.maxIter
- the maximum number of iterations.