Package opennlp.tools.ml.maxent
Class GISTrainer
java.lang.Object
opennlp.tools.ml.AbstractTrainer
opennlp.tools.ml.AbstractEventTrainer
opennlp.tools.ml.maxent.GISTrainer
- All Implemented Interfaces:
EventTrainer
An implementation of Generalized Iterative Scaling. The reference paper
for this implementation was Adwait Ratnaparkhi's tech report at the
University of Pennsylvania's Institute for Research in Cognitive Science,
and is available at
ftp://ftp.cis.upenn.edu/pub/ircs/tr/97-08.ps.Z
.
The slack parameter used in the above implementation has been removed by default
from the computation and a method for updating with Gaussian smoothing has been
added per Investigating GIS and Smoothing for Maximum Entropy Taggers, Clark and Curran (2002).
http://acl.ldc.upenn.edu/E/E03/E03-1071.pdf
The slack parameter can be used by setting useSlackParameter
to true.
Gaussian smoothing can be used by setting useGaussianSmoothing
to true.
A prior can be used to train models which converge to the distribution which minimizes the relative entropy between the distribution specified by the empirical constraints of the training data and the specified prior. By default, the uniform distribution is used as the prior.
-
Field Summary
FieldsModifier and TypeFieldDescriptionstatic final double
static final String
static final String
static final String
Deprecated.Fields inherited from class opennlp.tools.ml.AbstractEventTrainer
DATA_INDEXER_ONE_PASS_REAL_VALUE, DATA_INDEXER_ONE_PASS_VALUE, DATA_INDEXER_PARAM, DATA_INDEXER_TWO_PASS_VALUE
Fields inherited from class opennlp.tools.ml.AbstractTrainer
ALGORITHM_PARAM, CUTOFF_DEFAULT, CUTOFF_PARAM, ITERATIONS_DEFAULT, ITERATIONS_PARAM, TRAINER_TYPE_PARAM, VERBOSE_DEFAULT, VERBOSE_PARAM
Fields inherited from interface opennlp.tools.ml.EventTrainer
EVENT_VALUE
-
Constructor Summary
ConstructorsConstructorDescriptionCreates a newGISTrainer
instance which does not print progress messages about training to STDOUT. -
Method Summary
Modifier and TypeMethodDescriptiondoTrain
(DataIndexer indexer) void
init
(TrainingParameters trainingParameters, Map<String, String> reportMap) boolean
void
setGaussianSigma
(double sigmaValue) Sets whether this trainer will use smoothing while training the model.void
setSmoothing
(boolean smooth) Sets whether this trainer will use smoothing while training the model.void
setSmoothingObservation
(double timesSeen) Sets whether this trainer will use smoothing while training the model.trainModel
(int iterations, DataIndexer di) Train a model using the GIS algorithm.trainModel
(int iterations, DataIndexer di, int threads) Train a model using the GIS algorithm.trainModel
(int iterations, DataIndexer di, Prior modelPrior, int threads) Train a model using the GIS algorithm.trainModel
(ObjectStream<Event> eventStream) Train a model using the GIS algorithm, assuming 100 iterations and no cutoff.trainModel
(ObjectStream<Event> eventStream, int iterations, int cutoff) Trains a GIS model on the event in the specified event stream, using the specified number of iterations and the specified count cutoff.Methods inherited from class opennlp.tools.ml.AbstractEventTrainer
getDataIndexer, isValid, train, train, validate
Methods inherited from class opennlp.tools.ml.AbstractTrainer
getAlgorithm, getCutoff, getIterations, init
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
Methods inherited from interface opennlp.tools.ml.EventTrainer
init
-
Field Details
-
OLD_LL_THRESHOLD_PARAM
Deprecated.- See Also:
-
LOG_LIKELIHOOD_THRESHOLD_PARAM
- See Also:
-
LOG_LIKELIHOOD_THRESHOLD_DEFAULT
public static final double LOG_LIKELIHOOD_THRESHOLD_DEFAULT- See Also:
-
MAXENT_VALUE
- See Also:
-
-
Constructor Details
-
GISTrainer
public GISTrainer()Creates a newGISTrainer
instance which does not print progress messages about training to STDOUT.
-
-
Method Details
-
isSortAndMerge
public boolean isSortAndMerge()- Specified by:
isSortAndMerge
in classAbstractEventTrainer
-
init
- Specified by:
init
in interfaceEventTrainer
- Overrides:
init
in classAbstractTrainer
-
doTrain
- Specified by:
doTrain
in classAbstractEventTrainer
- Throws:
IOException
-
setSmoothing
public void setSmoothing(boolean smooth) Sets whether this trainer will use smoothing while training the model. This can improve model accuracy, though training will potentially take longer and use more memory. Model size will also be larger.- Parameters:
smooth
- true if smoothing is desired, false if not
-
setSmoothingObservation
public void setSmoothingObservation(double timesSeen) Sets whether this trainer will use smoothing while training the model. This can improve model accuracy, though training will potentially take longer and use more memory. Model size will also be larger.- Parameters:
timesSeen
- the "number" of times we want the trainer to imagine it saw a feature that it actually didn't see
-
setGaussianSigma
public void setGaussianSigma(double sigmaValue) Sets whether this trainer will use smoothing while training the model. This can improve model accuracy, though training will potentially take longer and use more memory. Model size will also be larger. -
trainModel
Train a model using the GIS algorithm, assuming 100 iterations and no cutoff.- Parameters:
eventStream
- The EventStream holding the data on which this model will be trained.- Returns:
- The newly trained model, which can be used immediately or saved to disk using an opennlp.tools.ml.maxent.io.GISModelWriter object.
- Throws:
IOException
-
trainModel
public GISModel trainModel(ObjectStream<Event> eventStream, int iterations, int cutoff) throws IOException Trains a GIS model on the event in the specified event stream, using the specified number of iterations and the specified count cutoff.- Parameters:
eventStream
- A stream of all events.iterations
- The number of iterations to use for GIS.cutoff
- The number of times a feature must occur to be included.- Returns:
- A GIS model trained with specified
- Throws:
IOException
-
trainModel
Train a model using the GIS algorithm.- Parameters:
iterations
- The number of GIS iterations to perform.di
- The data indexer used to compress events in memory.- Returns:
- The newly trained model, which can be used immediately or saved to disk using an opennlp.tools.ml.maxent.io.GISModelWriter object.
-
trainModel
Train a model using the GIS algorithm.- Parameters:
iterations
- The number of GIS iterations to perform.di
- The data indexer used to compress events in memory.threads
-- Returns:
- The newly trained model, which can be used immediately or saved to disk using an opennlp.tools.ml.maxent.io.GISModelWriter object.
-
trainModel
Train a model using the GIS algorithm.- Parameters:
iterations
- The number of GIS iterations to perform.di
- The data indexer used to compress events in memory.modelPrior
- The prior distribution used to train this model.- Returns:
- The newly trained model, which can be used immediately or saved to disk using an opennlp.tools.ml.maxent.io.GISModelWriter object.
-