Class TokenizerFactory

java.lang.Object
opennlp.tools.util.BaseToolFactory
opennlp.tools.tokenize.TokenizerFactory

public class TokenizerFactory extends BaseToolFactory
The factory that provides Tokenizer default implementations and resources. Users can extend this class if their application requires overriding the TokenContextGenerator, Dictionary etc.
  • Constructor Details

    • TokenizerFactory

      public TokenizerFactory()
      Creates a TokenizerFactory that provides the default implementation of the resources.
    • TokenizerFactory

      public TokenizerFactory(String languageCode, Dictionary abbreviationDictionary, boolean useAlphaNumericOptimization, Pattern alphaNumericPattern)
      Creates a TokenizerFactory. Use this constructor to programmatically create a factory.
      Parameters:
      languageCode - the language of the natural text
      abbreviationDictionary - an abbreviations dictionary
      useAlphaNumericOptimization - if true alpha numerics are skipped
      alphaNumericPattern - null or a custom alphanumeric pattern (default is: "^[A-Za-z0-9]+$", provided by Factory.DEFAULT_ALPHANUMERIC
  • Method Details

    • validateArtifactMap

      public void validateArtifactMap() throws InvalidFormatException
      Description copied from class: BaseToolFactory
      Validates the parsed artifacts. If something is not valid subclasses should throw an InvalidFormatException. Note: Subclasses should generally invoke super.validateArtifactMap at the beginning of this method.
      Specified by:
      validateArtifactMap in class BaseToolFactory
      Throws:
      InvalidFormatException
    • createArtifactMap

      public Map<String,Object> createArtifactMap()
      Description copied from class: BaseToolFactory
      Creates a Map with pairs of keys and objects. The models implementation should call this constructor that creates a model programmatically.

      The base implementation will return a HashMap that should be populated by sub-classes.

      Overrides:
      createArtifactMap in class BaseToolFactory
    • createManifestEntries

      public Map<String,String> createManifestEntries()
      Description copied from class: BaseToolFactory
      Creates the manifest entries that will be added to the model manifest
      Overrides:
      createManifestEntries in class BaseToolFactory
      Returns:
      the manifest entries to added to the model manifest
    • create

      public static TokenizerFactory create(String subclassName, String languageCode, Dictionary abbreviationDictionary, boolean useAlphaNumericOptimization, Pattern alphaNumericPattern) throws InvalidFormatException
      Factory method the framework uses create a new TokenizerFactory.
      Parameters:
      subclassName - the name of the class implementing the TokenizerFactory
      languageCode - the language code the tokenizer should use
      abbreviationDictionary - an optional dictionary containing abbreviations, or null if not present
      useAlphaNumericOptimization - indicate if the alpha numeric optimization should be enabled or disabled
      alphaNumericPattern - the pattern the alpha numeric optimization should use
      Returns:
      the instance of the Tokenizer Factory
      Throws:
      InvalidFormatException - if once of the input parameters doesn't comply if the expected format
    • getAlphaNumericPattern

      public Pattern getAlphaNumericPattern()
      Gets the alpha numeric pattern.
      Returns:
      the user specified alpha numeric pattern or a default.
    • isUseAlphaNumericOptmization

      public boolean isUseAlphaNumericOptmization()
      Gets whether to use alphanumeric optimization.
      Returns:
      true if the alpha numeric optimization is enabled, otherwise false
    • getAbbreviationDictionary

      public Dictionary getAbbreviationDictionary()
      Gets the abbreviation dictionary
      Returns:
      null or the abbreviation dictionary
    • getLanguageCode

      public String getLanguageCode()
      Retrieves the language code.
      Returns:
      the language code
    • getContextGenerator

      public TokenContextGenerator getContextGenerator()
      Gets the context generator
      Returns:
      a new instance of the context generator