Interface TextProcessor

All Known Subinterfaces:
Tokenizer
All Known Implementing Classes:
BertFullTokenizer, BertTokenizer, HyphenNormalizer, LambdaProcessor, LowerCaseConvertor, PunctuationSeparator, SimpleTokenizer, TextCleaner, TextTerminator, TextTruncator, UnicodeNormalizer, WordpieceTokenizer

public interface TextProcessor
TextProcessor allows applying pre-processing to input tokens for natural language applications. Multiple implementations of TextProcessor can be applied on the same input. The order of application of different implementations of TextProcessor can make a difference in the final output.
  • Method Summary

    Modifier and Type
    Method
    Description
    Applies the preprocessing defined to the given input tokens.
  • Method Details

    • preprocess

      List<String> preprocess(List<String> tokens)
      Applies the preprocessing defined to the given input tokens.
      Parameters:
      tokens - the tokens created after the input text is tokenized
      Returns:
      the preprocessed tokens