Interface Tokenizer


  • public interface Tokenizer
    A tokenizer of morphological analysis.
    • Method Detail

      • tokenize

        List<Morpheme> tokenize​(Tokenizer.SplitMode mode,
                                String text)
        Tokenize a text. This method tokenizes a input text as a sentence. When the text is long, it required a lot of memory.
        Parameters:
        mode - a mode of splitting
        text - input text
        Returns:
        a result of tokenizing
      • tokenizeSentences

        Iterable<List<Morpheme>> tokenizeSentences​(Tokenizer.SplitMode mode,
                                                   String text)
        Tokenize sentences. This method divide a input text into sentences and tokenizes them.
        Parameters:
        mode - a mode of splitting
        text - input text
        Returns:
        a result of tokenizing
      • tokenizeSentences

        Iterable<List<Morpheme>> tokenizeSentences​(Tokenizer.SplitMode mode,
                                                   Reader input)
                                            throws IOException
        Tokenize sentences. This method reads a input text from input and divides it into sentences and tokenizes them.
        Parameters:
        mode - a mode of splitting
        input - a reader of input text
        Returns:
        a result of tokenizing
        Throws:
        IOException - if reading a stream is failed
      • setDumpOutput

        void setDumpOutput​(PrintStream output)
        Prints a lattice structure of analyzing.
        Parameters:
        output - an output of printing
      • dumpInternalStructures

        String dumpInternalStructures​(String text)
        Tokenize a text and dump the internal structures into a JSON string. This method tokenizes a input text as a sentence.
        Parameters:
        text - input text
        Returns:
        a JSON string