Interface Tokenizer

    • Method Summary

      All Methods Instance Methods Abstract Methods Default Methods 
      Modifier and Type Method Description
      java.lang.String buildSentence​(java.util.List<java.lang.String> tokens)
      Combines a list of tokens to form a sentence.
      default java.util.List<java.lang.String> preprocess​(java.util.List<java.lang.String> tokens)
      Applies the preprocessing defined to the given input tokens.
      java.util.List<java.lang.String> tokenize​(java.lang.String sentence)
      Breaks down the given sentence into a list of tokens that can be represented by embeddings.
    • Method Detail

      • preprocess

        default java.util.List<java.lang.String> preprocess​(java.util.List<java.lang.String> tokens)
        Applies the preprocessing defined to the given input tokens.
        Specified by:
        preprocess in interface TextProcessor
        Parameters:
        tokens - the tokens created after the input text is tokenized
        Returns:
        the preprocessed tokens
      • tokenize

        java.util.List<java.lang.String> tokenize​(java.lang.String sentence)
        Breaks down the given sentence into a list of tokens that can be represented by embeddings.
        Parameters:
        sentence - the sentence to tokenize
        Returns:
        a List of tokens
      • buildSentence

        java.lang.String buildSentence​(java.util.List<java.lang.String> tokens)
        Combines a list of tokens to form a sentence.
        Parameters:
        tokens - the List of tokens
        Returns:
        the sentence built from the given tokens