All Classes and Interfaces (linguistics 8.68.18 API)

Class

Description

AbstractDetector

CharacterClasses

Determines the class of a given character.

CharacterUtils provides a unified interface to Character-related operations to implement backwards compatible character operations.

CharacterUtils.CharacterBuffer

A simple IO buffer to use with CharacterUtils.fill(CharacterBuffer, Reader).

CharArrayMap<V>

A simple class that stores key Strings as char[]'s in a hash table.

A simple class that stores Strings as char[]'s in a hash table.

DefaultLanguageDetectorContextGenerator

Avoids using the unnecessarily slow NGramCharModel.

DetectionException

Exception that is thrown when detection fails.

Abstract superclass of all Detectors used for language and encoding detection.

An embedder converts a text string to a tensor

Embedder.Context

Embedder.FailingEmbedder

A class which splits consecutive word character sequences into overlapping character n-grams.

GramSplitter.Gram

An immutable start index and length pair

GramSplitter.GramSplitterIterator

A hint that can be given to a Detector.

A stemmer implementing the Kstem algorithm by Bob Krovetz.

LanguageDetectorFactory

Overrides the UrlCharSequenceNormalizer, which has a bad regex, until fixed: https://issues.apache.org/jira/browse/OPENNLP-1350

Factory of linguistic processors.

Linguistics.Component

LinguisticsCase

This class provides a case normalization operation to be used e.g.

This interface provides NFKC normalization of Strings through the underlying linguistics library.

OpenNlpLinguistics

Returns a linguistics implementation based on OpenNlp.

OpenNlpTokenizer

Tokenizer using OpenNlp

OpenStringBuilder

A StringBuilder that allows one to access the array.

ProcessingException

Exception class indicating that a fatal error occured during linguistic processing.

Interface providing segmentation, i.e.

Includes functionality for determining the langCode from a sample or from the encoding.

SimpleLinguistics

Factory of simple linguistic processor implementations.

SimpleNormalizer

SimpleTokenizer

A tokenizer which splits on whitespace, normalizes and transforms using the given implementations and stems using the kstem algorithm.

SimpleTokenType

SimpleTransformer

Converts all accented characters into their de-accented counterparts followed by their combining diacritics, then strips off the diacritics using a regex.

SpecialTokenRegistry

Immutable named lists of "special tokens" - strings which should override the normal tokenizer semantics and be tokenized into a single token.

An immutable list of special tokens - strings which should override the normal tokenizer semantics and be tokenized into a single token.

SpecialTokens.Token

An immutable special token

A list of strings which does not allow for duplicate elements.

Interface providing stemming of single words.

An enum of the stemming modes which can be requested.

A single token produced by the tokenizer.

Language-sensitive tokenization of a text string.

List of token scripts (e.g.

An enumeration of token types.

Interface for providers of text transformations such as accent removal.

UrlCharSequenceNormalizer

Modifies UrlCharSequenceNormalizer to avoid the bad email regex.

VespaCharSequenceNormalizer

Simple normalizer