epic

preprocess

package preprocess

TODO

Linear Supertypes
AnyRef, Any
Ordering
  1. Alphabetic
  2. By inheritance
Inherited
  1. preprocess
  2. AnyRef
  3. Any
  1. Hide All
  2. Show all
Learn more about member selection
Visibility
  1. Public
  2. All

Type Members

  1. class JavaSentenceSegmenter extends SentenceSegmenter

    A Sentence Segmenter backed by Java's BreakIterator.

  2. class JavaWordTokenizer extends Tokenizer

    A Word Segmenter backed by Java's BreakIterator.

  3. class MLSentenceSegmenter extends SentenceSegmenter with Serializable

    Annotations
    @SerialVersionUID( 1L )
  4. class NewLineSentenceSegmenter extends SentenceSegmenter

    TODO move to chalk

  5. case class RegexSearchTokenizer(pattern: String) extends Tokenizer with Product with Serializable

    Finds all occurrences of the given pattern in the document.

  6. case class RegexSplitTokenizer(pattern: String) extends Tokenizer with Product with Serializable

    Splits the input document according to the given pattern.

  7. class SegmentingIterator extends Iterator[Span]

  8. trait SentenceSegmenter extends StringAnalysisFunction[Any, Sentence] with (String) ⇒ Iterable[String] with Serializable

  9. class StreamSentenceSegmenter extends AnyRef

    TODO

  10. trait Tokenizer extends StringAnalysisFunction[Sentence, Token] with Serializable with (String) ⇒ IndexedSeq[String]

    Abstract trait for tokenizers, which annotate sentence-segmented text with tokens.

  11. class TreebankTokenizer extends Tokenizer with Serializable

    Annotations
    @SerialVersionUID( 1L )
  12. class WhitespaceTokenizer extends RegexSplitTokenizer

    Tokenizes by splitting on the regular expression \s+.

Value Members

  1. object JavaSentenceSegmenter extends JavaSentenceSegmenter

  2. object JavaWordTokenizer extends JavaWordTokenizer

  3. object MLSentenceSegmenter extends Serializable

  4. object RegexSentenceSegmenter extends SentenceSegmenter

    A simple regex sentence segmenter.

  5. object SegmentSentences

  6. object StreamSentenceSegmenter

  7. object TextExtractor

    Just a simple thing for me to learn Tika

  8. object Textify

    TODO

  9. object TreebankTokenizer extends TreebankTokenizer

  10. object WhitespaceTokenizer extends Serializable

  11. def loadContent(url: URL): String

  12. def preprocess(file: File): IndexedSeq[IndexedSeq[String]]

  13. def preprocess(text: String): IndexedSeq[IndexedSeq[String]]

  14. def preprocess(url: URL): IndexedSeq[IndexedSeq[String]]

  15. def tokenize(sentence: String): IndexedSeq[String]

Inherited from AnyRef

Inherited from Any

Ungrouped