Package opennlp.tools.tokenize
Class TokenizerFactory
java.lang.Object
opennlp.tools.util.BaseToolFactory
opennlp.tools.tokenize.TokenizerFactory
The factory that provides
Tokenizer
default implementations and
resources. Users can extend this class if their application requires
overriding the TokenContextGenerator
, Dictionary
etc.-
Constructor Summary
ConstructorsConstructorDescriptionCreates aTokenizerFactory
that provides the default implementation of the resources.TokenizerFactory
(String languageCode, Dictionary abbreviationDictionary, boolean useAlphaNumericOptimization, Pattern alphaNumericPattern) Creates aTokenizerFactory
. -
Method Summary
Modifier and TypeMethodDescriptionstatic TokenizerFactory
create
(String subclassName, String languageCode, Dictionary abbreviationDictionary, boolean useAlphaNumericOptimization, Pattern alphaNumericPattern) Factory method the framework uses create a newTokenizerFactory
.Creates aMap
with pairs of keys and objects.Creates the manifest entries that will be added to the model manifestGets the abbreviation dictionaryGets the alpha numeric pattern.Gets the context generatorRetrieves the language code.boolean
Gets whether to use alphanumeric optimization.void
Validates the parsed artifacts.Methods inherited from class opennlp.tools.util.BaseToolFactory
create, create, createArtifactSerializersMap
-
Constructor Details
-
TokenizerFactory
public TokenizerFactory()Creates aTokenizerFactory
that provides the default implementation of the resources. -
TokenizerFactory
public TokenizerFactory(String languageCode, Dictionary abbreviationDictionary, boolean useAlphaNumericOptimization, Pattern alphaNumericPattern) Creates aTokenizerFactory
. Use this constructor to programmatically create a factory.- Parameters:
languageCode
- the language of the natural textabbreviationDictionary
- an abbreviations dictionaryuseAlphaNumericOptimization
- if true alpha numerics are skippedalphaNumericPattern
- null or a custom alphanumeric pattern (default is: "^[A-Za-z0-9]+$", provided byFactory.DEFAULT_ALPHANUMERIC
-
-
Method Details
-
validateArtifactMap
Description copied from class:BaseToolFactory
Validates the parsed artifacts. If something is not valid subclasses should throw anInvalidFormatException
. Note: Subclasses should generally invoke super.validateArtifactMap at the beginning of this method.- Specified by:
validateArtifactMap
in classBaseToolFactory
- Throws:
InvalidFormatException
-
createArtifactMap
Description copied from class:BaseToolFactory
Creates aMap
with pairs of keys and objects. The models implementation should call this constructor that creates a model programmatically.The base implementation will return a
HashMap
that should be populated by sub-classes.- Overrides:
createArtifactMap
in classBaseToolFactory
-
createManifestEntries
Description copied from class:BaseToolFactory
Creates the manifest entries that will be added to the model manifest- Overrides:
createManifestEntries
in classBaseToolFactory
- Returns:
- the manifest entries to added to the model manifest
-
create
public static TokenizerFactory create(String subclassName, String languageCode, Dictionary abbreviationDictionary, boolean useAlphaNumericOptimization, Pattern alphaNumericPattern) throws InvalidFormatException Factory method the framework uses create a newTokenizerFactory
.- Parameters:
subclassName
- the name of the class implementing theTokenizerFactory
languageCode
- the language code the tokenizer should useabbreviationDictionary
- an optional dictionary containing abbreviations, or null if not presentuseAlphaNumericOptimization
- indicate if the alpha numeric optimization should be enabled or disabledalphaNumericPattern
- the pattern the alpha numeric optimization should use- Returns:
- the instance of the Tokenizer Factory
- Throws:
InvalidFormatException
- if once of the input parameters doesn't comply if the expected format
-
getAlphaNumericPattern
Gets the alpha numeric pattern.- Returns:
- the user specified alpha numeric pattern or a default.
-
isUseAlphaNumericOptmization
public boolean isUseAlphaNumericOptmization()Gets whether to use alphanumeric optimization.- Returns:
- true if the alpha numeric optimization is enabled, otherwise false
-
getAbbreviationDictionary
Gets the abbreviation dictionary- Returns:
- null or the abbreviation dictionary
-
getLanguageCode
Retrieves the language code.- Returns:
- the language code
-
getContextGenerator
Gets the context generator- Returns:
- a new instance of the context generator
-