Class TextCleaner

  • All Implemented Interfaces:
    TextProcessor

    public class TextCleaner
    extends java.lang.Object
    implements TextProcessor
    Applies remove or replace of certain characters based on condition.
    • Constructor Summary

      Constructors 
      Constructor Description
      TextCleaner​(java.util.function.Function<java.lang.Character,​java.lang.Boolean> condition)
      Remove a character if it meets the condition supplied.
      TextCleaner​(java.util.function.Function<java.lang.Character,​java.lang.Boolean> condition, char replace)
      Replace a character if it meets the condition supplied.
    • Method Summary

      All Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      java.util.List<java.lang.String> preprocess​(java.util.List<java.lang.String> tokens)
      Applies the preprocessing defined to the given input tokens.
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Constructor Detail

      • TextCleaner

        public TextCleaner​(java.util.function.Function<java.lang.Character,​java.lang.Boolean> condition)
        Remove a character if it meets the condition supplied.
        Parameters:
        condition - lambda function that defines whether a character meets condition
      • TextCleaner

        public TextCleaner​(java.util.function.Function<java.lang.Character,​java.lang.Boolean> condition,
                           char replace)
        Replace a character if it meets the condition supplied.
        Parameters:
        condition - lambda function that defines whether a character meets condition
        replace - the character to replace
    • Method Detail

      • preprocess

        public java.util.List<java.lang.String> preprocess​(java.util.List<java.lang.String> tokens)
        Applies the preprocessing defined to the given input tokens.
        Specified by:
        preprocess in interface TextProcessor
        Parameters:
        tokens - the tokens created after the input text is tokenized
        Returns:
        the preprocessed tokens