public class Maxent extends java.lang.Object implements SoftClassifier<int[]>, java.io.Serializable
Basically, maximum entropy classifier is another name of multinomial logistic regression applied to categorical independent variables, which are converted to binary dummy variables. Maximum entropy models are widely used in natural language processing. Here, we provide an implementation which assumes that binary features are stored in a sparse array, of which entries are the indices of nonzero features.
Modifier and Type | Class and Description |
---|---|
static class |
Maxent.Trainer
Trainer for maximum entropy classifier.
|
Constructor and Description |
---|
Maxent(int p,
int[][] x,
int[] y)
Learn maximum entropy classifier from samples of binary sparse features.
|
Maxent(int p,
int[][] x,
int[] y,
double lambda)
Learn maximum entropy classifier from samples of binary sparse features.
|
Maxent(int p,
int[][] x,
int[] y,
double lambda,
double tol,
int maxIter)
Learn maximum entropy classifier from samples of binary sparse features.
|
Modifier and Type | Method and Description |
---|---|
int |
getDimension()
Returns the dimension of input space.
|
double |
loglikelihood()
Returns the log-likelihood of model.
|
int |
predict(int[] x)
Predicts the class label of an instance.
|
int |
predict(int[] x,
double[] posteriori)
Predicts the class label of an instance and also calculate a posteriori
probabilities.
|
public Maxent(int p, int[][] x, int[] y)
p
- the dimension of feature space.x
- training samples. Each sample is represented by a set of sparse
binary features. The features are stored in an integer array, of which
are the indices of nonzero features.y
- training labels in [0, k), where k is the number of classes.public Maxent(int p, int[][] x, int[] y, double lambda)
p
- the dimension of feature space.x
- training samples. Each sample is represented by a set of sparse
binary features. The features are stored in an integer array, of which
are the indices of nonzero features.y
- training labels in [0, k), where k is the number of classes.lambda
- λ > 0 gives a "regularized" estimate of linear
weights which often has superior generalization performance, especially
when the dimensionality is high.public Maxent(int p, int[][] x, int[] y, double lambda, double tol, int maxIter)
p
- the dimension of feature space.x
- training samples. Each sample is represented by a set of sparse
binary features. The features are stored in an integer array, of which
are the indices of nonzero features.y
- training labels in [0, k), where k is the number of classes.lambda
- λ > 0 gives a "regularized" estimate of linear
weights which often has superior generalization performance, especially
when the dimensionality is high.tol
- tolerance for stopping iterations.maxIter
- maximum number of iterations.public int getDimension()
public double loglikelihood()
public int predict(int[] x)
Classifier
predict
in interface Classifier<int[]>
x
- the instance to be classified.public int predict(int[] x, double[] posteriori)
SoftClassifier
predict
in interface SoftClassifier<int[]>
x
- the instance to be classified.posteriori
- the array to store a posteriori probabilities on output.