All Classes and Interfaces (linguistics 8.291.13 API)

Class

Description

AbstractDetector

CharacterClasses

Determines the class of a given character.

CharacterUtils provides a unified interface to Character-related operations to implement backwards compatible character operations.

CharacterUtils.CharacterBuffer

A simple IO buffer to use with CharacterUtils.fill(CharacterBuffer, Reader).

CharArrayMap<V>

A simple class that stores key Strings as char[]'s in a hash table.

A simple class that stores Strings as char[]'s in a hash table.

DetectionException

Exception that is thrown when detection fails.

Abstract superclass of all Detectors used for language and encoding detection.

An embedder converts a text string to a tensor

Embedder.Context

Embedder.FailingEmbedder

Embedder.Runtime

Runtime that is injectable through Embedder constructor.

A class which splits consecutive word character sequences into overlapping character n-grams.

GramSplitter.Gram

An immutable start index and length pair

GramSplitter.GramSplitterIterator

A hint that can be given to a Detector.

A stemmer implementing the Kstem algorithm by Bob Krovetz.

Factory of linguistic processors.

Linguistics.Component

LinguisticsCase

This class provides a case normalization operation to be used e.g.

This interface provides NFKC normalization of Strings through the underlying linguistics library.

OpenStringBuilder

A StringBuilder that allows one to access the array.

ProcessingException

Exception class indicating that a fatal error occured during linguistic processing.

Interface providing segmentation, i.e.

Includes functionality for determining the langCode from a sample or from the encoding.

SimpleLinguistics

Factory of simple linguistic processor implementations.

SimpleNormalizer

SimpleTokenizer

A tokenizer which splits on whitespace, normalizes and transforms using the given implementations and stems using the kstem algorithm.

SimpleTokenType

SimpleTransformer

Converts all accented characters into their de-accented counterparts followed by their combining diacritics, then strips off the diacritics using a regex.

SpecialTokenRegistry

Immutable named lists of "special tokens" - strings which should override the normal tokenizer semantics and be tokenized into a single token.

An immutable list of special tokens - strings which should override the normal tokenizer semantics and be tokenized into a single token.

SpecialTokens.Token

An immutable special token

A list of strings which does not allow for duplicate elements.

Interface providing stemming of single words.

An enum of the stemming modes which can be requested.

A single token produced by the tokenizer.

Language-sensitive tokenization of a text string.

List of token scripts (e.g.

An enumeration of token types.

Interface for providers of text transformations such as accent removal.