Index
All Classes and Interfaces|All Packages|Constant Field Values|Serialized Form
A
- ABKHAZIAN - Enum constant in enum class com.yahoo.language.Language
-
Language tag "ab".
- AbstractDetector - Class in com.yahoo.language.detect
- AbstractDetector() - Constructor for class com.yahoo.language.detect.AbstractDetector
- accentDrop(String, Language) - Method in interface com.yahoo.language.process.Transformer
-
Remove accents from input text.
- accentDrop(String, Language) - Method in class com.yahoo.language.simple.SimpleTransformer
- add(int, String) - Method in class com.yahoo.language.process.StemList
- addComponent(Token) - Method in class com.yahoo.language.simple.SimpleToken
- addLanguage(String, DocumentFrequencyFile) - Method in class com.yahoo.language.significance.impl.SignificanceModelFile
- addModel(Path) - Method in class com.yahoo.language.significance.impl.DefaultSignificanceModelRegistry
- AFAR - Enum constant in enum class com.yahoo.language.Language
-
Language tag "aa".
- AFRIKAANS - Enum constant in enum class com.yahoo.language.Language
-
Language tag "af".
- ALBANIAN - Enum constant in enum class com.yahoo.language.Language
-
Language tag "sq".
- ALL - Enum constant in enum class com.yahoo.language.process.StemMode
- ALPHABETIC - Enum constant in enum class com.yahoo.language.process.TokenType
- AMHARIC - Enum constant in enum class com.yahoo.language.Language
-
Language tag "am".
- append(char) - Method in class com.yahoo.language.simple.kstem.OpenStringBuilder
- append(CharSequence) - Method in class com.yahoo.language.simple.kstem.OpenStringBuilder
- append(CharSequence, int, int) - Method in class com.yahoo.language.simple.kstem.OpenStringBuilder
- ARABIC - Enum constant in enum class com.yahoo.language.Language
-
Language tag "ar".
- ARABIC - Enum constant in enum class com.yahoo.language.process.TokenScript
- ARMENIAN - Enum constant in enum class com.yahoo.language.Language
-
Language tag "hy".
- ARMENIAN - Enum constant in enum class com.yahoo.language.process.TokenScript
- ASCII - Enum constant in enum class com.yahoo.language.process.TokenScript
- asMap() - Method in interface com.yahoo.language.process.Embedder
-
Returns this embedder instance as a map with the default embedder name
- asMap() - Method in class com.yahoo.language.process.SpecialTokens
-
Returns the tokens of this as an immutable map from token to replacement.
- asMap() - Method in interface com.yahoo.language.process.TextGenerator
- asMap(String) - Method in interface com.yahoo.language.process.Embedder
-
Returns this embedder instance as a map with the given name
- asMap(String) - Method in interface com.yahoo.language.process.TextGenerator
- ASSAMESE - Enum constant in enum class com.yahoo.language.Language
-
Language tag "as".
- AYMARA - Enum constant in enum class com.yahoo.language.Language
-
Language tag "ay".
- AZERBAIJANI - Enum constant in enum class com.yahoo.language.Language
-
Language tag "az".
B
- BASHKIR - Enum constant in enum class com.yahoo.language.Language
-
Language tag "ba".
- BASQUE - Enum constant in enum class com.yahoo.language.Language
-
Language tag "eu".
- BENGALI - Enum constant in enum class com.yahoo.language.Language
-
Language tag "bn".
- BENGALI - Enum constant in enum class com.yahoo.language.process.TokenScript
- BEST - Enum constant in enum class com.yahoo.language.process.StemMode
- BHUTANI - Enum constant in enum class com.yahoo.language.Language
-
Language tag "dz".
- BIHARI - Enum constant in enum class com.yahoo.language.Language
-
Language tag "bh".
- BISLAMA - Enum constant in enum class com.yahoo.language.Language
-
Language tag "bi".
- BRAILLE - Enum constant in enum class com.yahoo.language.process.TokenScript
- BRETON - Enum constant in enum class com.yahoo.language.Language
-
Language tag "br".
- buf - Variable in class com.yahoo.language.simple.kstem.OpenStringBuilder
- BUGINESE - Enum constant in enum class com.yahoo.language.Language
-
Language tag "bug".
- BUGINESE - Enum constant in enum class com.yahoo.language.process.TokenScript
- BUHID - Enum constant in enum class com.yahoo.language.process.TokenScript
- BULGARIAN - Enum constant in enum class com.yahoo.language.Language
-
Language tag "bg".
- BURMESE - Enum constant in enum class com.yahoo.language.Language
-
Language tag "my".
- BYELORUSSIAN - Enum constant in enum class com.yahoo.language.Language
-
Language tag "be".
C
- CAMBODIAN - Enum constant in enum class com.yahoo.language.Language
-
Language tag "km".
- CANADIAN - Enum constant in enum class com.yahoo.language.process.TokenScript
- capacity() - Method in class com.yahoo.language.simple.kstem.OpenStringBuilder
- CATALAN - Enum constant in enum class com.yahoo.language.Language
-
Language tag "ca".
- CHARACTER_CLASSES - Enum constant in enum class com.yahoo.language.Linguistics.Component
- CharacterClasses - Class in com.yahoo.language.process
-
Determines the class of a given character.
- CharacterClasses() - Constructor for class com.yahoo.language.process.CharacterClasses
- CharacterUtils - Class in com.yahoo.language.simple.kstem
-
CharacterUtils
provides a unified interface to Character-related operations to implement backwards compatible character operations. - CharacterUtils() - Constructor for class com.yahoo.language.simple.kstem.CharacterUtils
- CharacterUtils.CharacterBuffer - Class in com.yahoo.language.simple.kstem
-
A simple IO buffer to use with
CharacterUtils.fill(CharacterBuffer, Reader)
. - charAt(int) - Method in class com.yahoo.language.simple.kstem.OpenStringBuilder
- CHEROKEE - Enum constant in enum class com.yahoo.language.Language
-
Language tag "chr".
- CHEROKEE - Enum constant in enum class com.yahoo.language.process.TokenScript
- CHINESE - Enum constant in enum class com.yahoo.language.process.TokenScript
- CHINESE_SIMPLIFIED - Enum constant in enum class com.yahoo.language.Language
-
Language tag "zh-hans".
- CHINESE_TRADITIONAL - Enum constant in enum class com.yahoo.language.Language
-
Language tag "zh-hant".
- codePointAt(char[], int, int) - Method in class com.yahoo.language.simple.kstem.CharacterUtils
-
Returns the code point at the given index of the char array where only elements with index less than the limit are used.
- codePointAt(CharSequence, int) - Method in class com.yahoo.language.simple.kstem.CharacterUtils
-
Returns the code point at the given index of the
CharSequence
. - codePointCount(CharSequence) - Method in class com.yahoo.language.simple.kstem.CharacterUtils
-
Return the number of characters in
seq
. - com.yahoo.language - package com.yahoo.language
- com.yahoo.language.detect - package com.yahoo.language.detect
- com.yahoo.language.process - package com.yahoo.language.process
- com.yahoo.language.significance - package com.yahoo.language.significance
- com.yahoo.language.significance.impl - package com.yahoo.language.significance.impl
- com.yahoo.language.simple - package com.yahoo.language.simple
- com.yahoo.language.simple.kstem - package com.yahoo.language.simple.kstem
- COMMON - Enum constant in enum class com.yahoo.language.process.TokenScript
- compareTo(SpecialTokens.Token) - Method in class com.yahoo.language.process.SpecialTokens.Token
- computeCachedValueIfAbsent(Object, Supplier<? extends T>) - Method in class com.yahoo.language.process.Embedder.Context
-
Returns the cached value, or computes and caches it if not present.
- computeCachedValueIfAbsent(Object, Supplier<? extends T>) - Method in class com.yahoo.language.process.TextGenerator.Context
-
Returns the cached value, or computes and caches it if not present.
- Context(String) - Constructor for class com.yahoo.language.process.Embedder.Context
- Context(String) - Constructor for class com.yahoo.language.process.TextGenerator.Context
- Context(String, Map<Object, Object>) - Constructor for class com.yahoo.language.process.Embedder.Context
- Context(String, Map<Object, Object>) - Constructor for class com.yahoo.language.process.TextGenerator.Context
- COPTIC - Enum constant in enum class com.yahoo.language.Language
-
Language tag "cop".
- COPTIC - Enum constant in enum class com.yahoo.language.process.TokenScript
- copy() - Method in class com.yahoo.language.process.Embedder.Context
- copy() - Method in class com.yahoo.language.process.TextGenerator.Context
- corpusSize() - Method in record class com.yahoo.language.significance.DocumentFrequency
-
Returns the value of the
corpusSize
record component. - CORSICAN - Enum constant in enum class com.yahoo.language.Language
-
Language tag "co".
- CROATIAN - Enum constant in enum class com.yahoo.language.Language
-
Language tag "hr".
- CYPRIOT - Enum constant in enum class com.yahoo.language.process.TokenScript
- CYRILLIC - Enum constant in enum class com.yahoo.language.process.TokenScript
- CZECH - Enum constant in enum class com.yahoo.language.Language
-
Language tag "cs".
D
- DANISH - Enum constant in enum class com.yahoo.language.Language
-
Language tag "da".
- decode(List<Integer>, Embedder.Context) - Method in interface com.yahoo.language.process.Embedder
-
Converts the list of token id's into a text.
- DEFAULT - Enum constant in enum class com.yahoo.language.process.StemMode
- defaultEmbedderId - Static variable in interface com.yahoo.language.process.Embedder
-
Name of embedder when none is explicity given
- defaultGeneratorId - Static variable in interface com.yahoo.language.process.TextGenerator
- DefaultSignificanceModel - Class in com.yahoo.language.significance.impl
- DefaultSignificanceModel(DocumentFrequencyFile, String) - Constructor for class com.yahoo.language.significance.impl.DefaultSignificanceModel
- DefaultSignificanceModel(Path) - Constructor for class com.yahoo.language.significance.impl.DefaultSignificanceModel
- DefaultSignificanceModelRegistry - Class in com.yahoo.language.significance.impl
-
Default implementation of
SignificanceModelRegistry
. - DefaultSignificanceModelRegistry(SignificanceConfig) - Constructor for class com.yahoo.language.significance.impl.DefaultSignificanceModelRegistry
- DefaultSignificanceModelRegistry(List<Path>) - Constructor for class com.yahoo.language.significance.impl.DefaultSignificanceModelRegistry
- description() - Method in class com.yahoo.language.significance.impl.DocumentFrequencyFile
- description() - Method in class com.yahoo.language.significance.impl.SignificanceModelFile
- DESERET - Enum constant in enum class com.yahoo.language.process.TokenScript
- detect(byte[], int, int, Hint) - Method in interface com.yahoo.language.detect.Detector
-
Detects language and encoding of the supplied byte array, possibly using a language/encoding hint.
- detect(byte[], int, int, Hint) - Method in class com.yahoo.language.simple.SimpleDetector
- detect(String, Hint) - Method in class com.yahoo.language.detect.AbstractDetector
- detect(String, Hint) - Method in interface com.yahoo.language.detect.Detector
-
Detects language of the supplied String, possibly using a language hint.
- detect(String, Hint) - Method in class com.yahoo.language.simple.SimpleDetector
- detect(ByteBuffer, Hint) - Method in class com.yahoo.language.detect.AbstractDetector
- detect(ByteBuffer, Hint) - Method in interface com.yahoo.language.detect.Detector
-
Detects language and encoding of the supplied ByteBuffer, possibly using a language/encoding hint.
- detect(ByteBuffer, Hint) - Method in class com.yahoo.language.simple.SimpleDetector
- Detection - Class in com.yahoo.language.detect
- Detection(Language, String, boolean) - Constructor for class com.yahoo.language.detect.Detection
- DetectionException - Exception in com.yahoo.language.detect
-
Exception that is thrown when detection fails.
- DetectionException(String) - Constructor for exception com.yahoo.language.detect.DetectionException
- Detector - Interface in com.yahoo.language.detect
-
Abstract superclass of all Detectors used for language and encoding detection.
- DETECTOR - Enum constant in enum class com.yahoo.language.Linguistics.Component
- DEVANAGARI - Enum constant in enum class com.yahoo.language.process.TokenScript
- DIVEHI - Enum constant in enum class com.yahoo.language.Language
-
Language tag "div".
- documentCount() - Method in class com.yahoo.language.significance.impl.DocumentFrequencyFile
- documentFrequency(String) - Method in class com.yahoo.language.significance.impl.DefaultSignificanceModel
- documentFrequency(String) - Method in interface com.yahoo.language.significance.SignificanceModel
- DocumentFrequency - Record Class in com.yahoo.language.significance
- DocumentFrequency(long, long) - Constructor for record class com.yahoo.language.significance.DocumentFrequency
-
Creates an instance of a
DocumentFrequency
record class. - DocumentFrequencyFile - Class in com.yahoo.language.significance.impl
- DocumentFrequencyFile(String, long, Map<String, Long>) - Constructor for class com.yahoo.language.significance.impl.DocumentFrequencyFile
- DUTCH - Enum constant in enum class com.yahoo.language.Language
-
Language tag "nl".
E
- embed(String, Embedder.Context) - Method in interface com.yahoo.language.process.Embedder
-
Converts text into a list of token id's (a vector embedding)
- embed(String, Embedder.Context) - Method in class com.yahoo.language.process.Embedder.FailingEmbedder
- embed(String, Embedder.Context, TensorType) - Method in interface com.yahoo.language.process.Embedder
-
Converts text into tokens in a tensor.
- embed(String, Embedder.Context, TensorType) - Method in class com.yahoo.language.process.Embedder.FailingEmbedder
- Embedder - Interface in com.yahoo.language.process
-
An embedder converts a text string to a tensor
- Embedder.Context - Class in com.yahoo.language.process
- Embedder.FailingEmbedder - Class in com.yahoo.language.process
- Embedder.Runtime - Interface in com.yahoo.language.process
-
Runtime that is injectable through
Embedder
constructor. - empty() - Static method in class com.yahoo.language.process.SpecialTokens
- ENGLISH - Enum constant in enum class com.yahoo.language.Language
-
Language tag "en".
- equals(Linguistics) - Method in interface com.yahoo.language.Linguistics
-
Check if another instance is equivalent to this one
- equals(Linguistics) - Method in class com.yahoo.language.simple.SimpleLinguistics
- equals(Object) - Method in class com.yahoo.language.process.GramSplitter.Gram
- equals(Object) - Method in class com.yahoo.language.process.SpecialTokens.Token
- equals(Object) - Method in record class com.yahoo.language.significance.DocumentFrequency
-
Indicates whether some other object is "equal to" this one.
- equals(Object) - Method in class com.yahoo.language.simple.SimpleToken
- ESPERANTO - Enum constant in enum class com.yahoo.language.Language
-
Language tag "eo".
- ESTONIAN - Enum constant in enum class com.yahoo.language.Language
-
Language tag "et".
- ETHIOPIC - Enum constant in enum class com.yahoo.language.process.TokenScript
- extractFrom(GramSplitter.UnicodeString) - Method in class com.yahoo.language.process.GramSplitter.Gram
-
Returns this gram as a string from the input string
- extractFrom(String) - Method in class com.yahoo.language.process.GramSplitter.Gram
-
Returns this gram as a string from the input string
F
- FailingEmbedder() - Constructor for class com.yahoo.language.process.Embedder.FailingEmbedder
- FailingEmbedder(String) - Constructor for class com.yahoo.language.process.Embedder.FailingEmbedder
- FailingTextGenerator() - Constructor for class com.yahoo.language.process.TextGenerator.FailingTextGenerator
- FailingTextGenerator(String) - Constructor for class com.yahoo.language.process.TextGenerator.FailingTextGenerator
- FAROESE - Enum constant in enum class com.yahoo.language.Language
-
Language tag "fo".
- FIJI - Enum constant in enum class com.yahoo.language.Language
-
Language tag "fj".
- fill(CharacterUtils.CharacterBuffer, Reader) - Method in class com.yahoo.language.simple.kstem.CharacterUtils
-
Convenience method which calls
fill(buffer, reader, buffer.buffer.length)
. - fill(CharacterUtils.CharacterBuffer, Reader, int) - Method in class com.yahoo.language.simple.kstem.CharacterUtils
-
Fills the
CharacterUtils.CharacterBuffer
with characters read from the given readerReader
. - FINNISH - Enum constant in enum class com.yahoo.language.Language
-
Language tag "fi".
- flush() - Method in class com.yahoo.language.simple.kstem.OpenStringBuilder
- FRENCH - Enum constant in enum class com.yahoo.language.Language
-
Language tag "fr".
- frequencies() - Method in class com.yahoo.language.significance.impl.DocumentFrequencyFile
- frequency() - Method in record class com.yahoo.language.significance.DocumentFrequency
-
Returns the value of the
frequency
record component. - FRISIAN - Enum constant in enum class com.yahoo.language.Language
-
Language tag "fy".
- from(String) - Static method in enum class com.yahoo.language.Language
-
Returns the Language from a language tag
- fromEncoding(String) - Static method in enum class com.yahoo.language.Language
-
Returns the language from an encoding, or
Language.UNKNOWN
if it cannot be determined. - fromLanguageTag(String) - Static method in enum class com.yahoo.language.Language
-
Convenience method for calling
fromLocale(LocaleFactory.fromLanguageTag(languageTag))
. - fromLanguageTag(String) - Static method in class com.yahoo.language.LocaleFactory
-
Implements a simple parser for RFC5646 language tags.
- fromLocale(Locale) - Static method in enum class com.yahoo.language.Language
-
Returns the
Language
whoseLanguage.languageCode()
is equal tolocale.getLanguage()
, with the following additions: - fromStems(String, List<String>) - Static method in class com.yahoo.language.simple.SimpleToken
G
- GALICIAN - Enum constant in enum class com.yahoo.language.Language
-
Language tag "gl".
- generate(Prompt, TextGenerator.Context) - Method in class com.yahoo.language.process.TextGenerator.FailingTextGenerator
- generate(Prompt, TextGenerator.Context) - Method in interface com.yahoo.language.process.TextGenerator
- GEORGIAN - Enum constant in enum class com.yahoo.language.Language
-
Language tag "ka".
- GEORGIAN - Enum constant in enum class com.yahoo.language.process.TokenScript
- GERMAN - Enum constant in enum class com.yahoo.language.Language
-
Language tag "de".
- get(int) - Method in class com.yahoo.language.process.StemList
- getArray() - Method in class com.yahoo.language.simple.kstem.OpenStringBuilder
- getBuffer() - Method in class com.yahoo.language.simple.kstem.CharacterUtils.CharacterBuffer
-
Returns the internal buffer
- getCachedValue(Object) - Method in class com.yahoo.language.process.Embedder.Context
-
Returns a cached value, or null if not present.
- getCachedValue(Object) - Method in class com.yahoo.language.process.TextGenerator.Context
-
Returns a cached value, or null if not present.
- getCharacterClasses() - Method in interface com.yahoo.language.Linguistics
-
Returns a thread-unsafe character classes instance.
- getCharacterClasses() - Method in class com.yahoo.language.simple.SimpleLinguistics
- getCodePointCount() - Method in class com.yahoo.language.process.GramSplitter.Gram
- getComponent(int) - Method in interface com.yahoo.language.process.Token
-
Returns a component token of this
- getComponent(int) - Method in class com.yahoo.language.simple.SimpleToken
- getCountry() - Method in class com.yahoo.language.detect.Hint
- getDestination() - Method in class com.yahoo.language.process.Embedder.Context
-
Returns the name of the recipient of this tensor.
- getDestination() - Method in class com.yahoo.language.process.TextGenerator.Context
-
Returns the name of the recipient of the generated text.
- getDetector() - Method in interface com.yahoo.language.Linguistics
-
Returns a thread-unsafe detector.
- getDetector() - Method in class com.yahoo.language.simple.SimpleLinguistics
- getEmbedderId() - Method in class com.yahoo.language.process.Embedder.Context
-
Return the embedder id or 'unknown' if not set
- getEncoding() - Method in class com.yahoo.language.detect.Detection
- getEncodingName() - Method in class com.yahoo.language.detect.Detection
- getGeneratorId() - Method in class com.yahoo.language.process.TextGenerator.Context
-
Return the generator id or 'unknown' if not set
- getGramSplitter() - Method in interface com.yahoo.language.Linguistics
-
Returns a thread-unsafe gram splitter.
- getGramSplitter() - Method in class com.yahoo.language.simple.SimpleLinguistics
- getId() - Method in class com.yahoo.language.significance.impl.DefaultSignificanceModel
- getId() - Method in interface com.yahoo.language.significance.SignificanceModel
- getInstance() - Static method in class com.yahoo.language.simple.kstem.CharacterUtils
-
Returns a
CharacterUtils
implementation. - getLanguage() - Method in class com.yahoo.language.detect.Detection
- getLanguage() - Method in class com.yahoo.language.process.Embedder.Context
-
Returns the language of the text, or UNKNOWN (default) to use a language independent embedding
- getLanguage() - Method in class com.yahoo.language.process.TextGenerator.Context
-
Returns the language of the text, or UNKNOWN (default) to use a language independent generation
- getLength() - Method in class com.yahoo.language.simple.kstem.CharacterUtils.CharacterBuffer
-
Return the length of the data in the internal buffer starting at
CharacterUtils.CharacterBuffer.getOffset()
- getMarket() - Method in class com.yahoo.language.detect.Hint
- getModel(Language) - Method in class com.yahoo.language.significance.impl.DefaultSignificanceModelRegistry
- getModel(Language) - Method in interface com.yahoo.language.significance.SignificanceModelRegistry
- getNormalizer() - Method in interface com.yahoo.language.Linguistics
-
Returns a thread-unsafe normalizer.
- getNormalizer() - Method in class com.yahoo.language.simple.SimpleLinguistics
- getNumComponents() - Method in interface com.yahoo.language.process.Token
-
Returns the number of components, if this token is a compound word (e.g. german "kommunikationsfehler".
- getNumComponents() - Method in class com.yahoo.language.simple.SimpleToken
- getNumStems() - Method in interface com.yahoo.language.process.Token
-
Returns the number of stem forms available for this token.
- getNumStems() - Method in class com.yahoo.language.simple.SimpleToken
- getOffset() - Method in interface com.yahoo.language.process.Token
-
Returns the offset position of this token
- getOffset() - Method in class com.yahoo.language.simple.kstem.CharacterUtils.CharacterBuffer
-
Returns the data offset in the internal buffer.
- getOffset() - Method in class com.yahoo.language.simple.SimpleToken
- getOrig() - Method in interface com.yahoo.language.process.Token
-
Returns the original form of this token
- getOrig() - Method in class com.yahoo.language.simple.SimpleToken
- getScript() - Method in interface com.yahoo.language.process.Token
-
Returns the script of this token
- getScript() - Method in class com.yahoo.language.simple.SimpleToken
- getSegmenter() - Method in interface com.yahoo.language.Linguistics
-
Returns a thread-unsafe segmenter.
- getSegmenter() - Method in class com.yahoo.language.simple.SimpleLinguistics
- getSpecialTokens(String) - Method in class com.yahoo.language.process.SpecialTokenRegistry
-
Returns the list of special tokens for a given name.
- getStart() - Method in class com.yahoo.language.process.GramSplitter.Gram
- getStem(int) - Method in interface com.yahoo.language.process.Token
-
Returns the stem at position i
- getStem(int) - Method in class com.yahoo.language.simple.SimpleToken
- getStemmer() - Method in interface com.yahoo.language.Linguistics
-
Returns a thread-unsafe stemmer or lemmatizer.
- getStemmer() - Method in class com.yahoo.language.simple.SimpleLinguistics
- getTokenizer() - Method in interface com.yahoo.language.Linguistics
-
Returns a thread-unsafe tokenizer.
- getTokenizer() - Method in class com.yahoo.language.simple.SimpleLinguistics
- getTokenString() - Method in interface com.yahoo.language.process.Token
-
Returns the token string in a form suitable for indexing: The most lowercased variant of the most processed token form available, If called on a compound token this returns a lowercased form of the entire word.
- getTokenString() - Method in class com.yahoo.language.simple.SimpleToken
- getTransformer() - Method in interface com.yahoo.language.Linguistics
-
Returns a thread-unsafe transformer.
- getTransformer() - Method in class com.yahoo.language.simple.SimpleLinguistics
- getType() - Method in interface com.yahoo.language.process.Token
-
Returns the type of this token - word, space or punctuation etc.
- getType() - Method in class com.yahoo.language.simple.SimpleToken
- getValue() - Method in enum class com.yahoo.language.process.TokenType
-
Returns an int code for this type
- GLAGOLITIC - Enum constant in enum class com.yahoo.language.process.TokenScript
- GOTHIC - Enum constant in enum class com.yahoo.language.Language
-
Language tag "got".
- GOTHIC - Enum constant in enum class com.yahoo.language.process.TokenScript
- Gram(int, int) - Constructor for class com.yahoo.language.process.GramSplitter.Gram
- GRAM_SPLITTER - Enum constant in enum class com.yahoo.language.Linguistics.Component
- GramSplitter - Class in com.yahoo.language.process
-
A class which splits consecutive word character sequences into overlapping character n-grams.
- GramSplitter(CharacterClasses) - Constructor for class com.yahoo.language.process.GramSplitter
- GramSplitter.Gram - Class in com.yahoo.language.process
-
An immutable start index and length pair
- GramSplitter.GramSplitterIterator - Class in com.yahoo.language.process
- GramSplitterIterator(String, int, CharacterClasses) - Constructor for class com.yahoo.language.process.GramSplitter.GramSplitterIterator
- GREEK - Enum constant in enum class com.yahoo.language.Language
-
Language tag "el".
- GREEK - Enum constant in enum class com.yahoo.language.process.TokenScript
- GREENLANDIC - Enum constant in enum class com.yahoo.language.Language
-
Language tag "kl".
- GUARANI - Enum constant in enum class com.yahoo.language.Language
-
Language tag "gn".
- guessEncoding(byte[]) - Method in class com.yahoo.language.simple.SimpleDetector
- guessEncoding(byte[], int, int) - Method in class com.yahoo.language.simple.SimpleDetector
- guessLanguage(byte[], int, int) - Method in class com.yahoo.language.simple.SimpleDetector
- guessLanguage(String) - Method in class com.yahoo.language.simple.SimpleDetector
- GUJARATI - Enum constant in enum class com.yahoo.language.Language
-
Language tag "gu".
- GUJARATI - Enum constant in enum class com.yahoo.language.process.TokenScript
- GURMUKHI - Enum constant in enum class com.yahoo.language.process.TokenScript
H
- HAN - Enum constant in enum class com.yahoo.language.process.TokenScript
- HANGUL - Enum constant in enum class com.yahoo.language.process.TokenScript
- HANUNOO - Enum constant in enum class com.yahoo.language.process.TokenScript
- hashCode() - Method in class com.yahoo.language.process.GramSplitter.Gram
- hashCode() - Method in class com.yahoo.language.process.SpecialTokens.Token
- hashCode() - Method in record class com.yahoo.language.significance.DocumentFrequency
-
Returns a hash code value for this object.
- hashCode() - Method in class com.yahoo.language.simple.SimpleToken
- hasNext() - Method in class com.yahoo.language.process.GramSplitter.GramSplitterIterator
- HAUSA - Enum constant in enum class com.yahoo.language.Language
-
Language tag "ha".
- HEBREW - Enum constant in enum class com.yahoo.language.Language
-
Language tag "he".
- HEBREW - Enum constant in enum class com.yahoo.language.process.TokenScript
- HINDI - Enum constant in enum class com.yahoo.language.Language
-
Language tag "hi".
- Hint - Class in com.yahoo.language.detect
-
A hint that can be given to a
Detector
. - HIRAGANA - Enum constant in enum class com.yahoo.language.process.TokenScript
- HUNGARIAN - Enum constant in enum class com.yahoo.language.Language
-
Language tag "hu".
I
- ICELANDIC - Enum constant in enum class com.yahoo.language.Language
-
Language tag "is".
- id() - Method in class com.yahoo.language.significance.impl.SignificanceModelFile
- INDEXABLE_SYMBOL - Enum constant in enum class com.yahoo.language.process.TokenType
- INDONESIAN - Enum constant in enum class com.yahoo.language.Language
-
Language tag "id".
- INHERITED - Enum constant in enum class com.yahoo.language.process.TokenScript
- INTERLINGUA - Enum constant in enum class com.yahoo.language.Language
-
Language tag "ia".
- INTERLINGUE - Enum constant in enum class com.yahoo.language.Language
-
Language tag "ie".
- INUKTITUT - Enum constant in enum class com.yahoo.language.Language
-
Language tag "iu".
- INUPIAK - Enum constant in enum class com.yahoo.language.Language
-
Language tag "ik".
- IRISH - Enum constant in enum class com.yahoo.language.Language
-
Language tag "ga".
- isCjk() - Method in enum class com.yahoo.language.Language
-
Returns whether this is a "cjk" language.
- isDigit(int) - Method in class com.yahoo.language.process.CharacterClasses
-
Returns true for code points which should be considered digits - same as java.lang.Character.isDigit
- isIndexable() - Method in interface com.yahoo.language.process.Token
-
Whether this token should be indexed
- isIndexable() - Method in enum class com.yahoo.language.process.TokenType
-
Marker for whether this type of token can be indexed for search.
- isIndexable() - Method in class com.yahoo.language.simple.SimpleToken
- isLatin(int) - Method in class com.yahoo.language.process.CharacterClasses
-
Returns true if this is a latin character
- isLatinDigit(int) - Method in class com.yahoo.language.process.CharacterClasses
-
Returns true if this is a latin digit (other digits are not consistently parsed into numbers by Java)
- isLetter(int) - Method in class com.yahoo.language.process.CharacterClasses
-
Returns true for code points which are letters in unicode 3 or 4, plus some additional characters which are useful to view as letters even though not defined as such in unicode.
- isLetterOrDigit(int) - Method in class com.yahoo.language.process.CharacterClasses
-
Convenience, returns isLetter(c) || isDigit(c)
- isLocal() - Method in class com.yahoo.language.detect.Detection
- isSpecialToken() - Method in interface com.yahoo.language.process.Token
-
Returns whether this is an instance of a declared special token (e.g. c++)
- isSpecialToken() - Method in class com.yahoo.language.simple.SimpleToken
- isSymbol(int) - Method in class com.yahoo.language.process.CharacterClasses
-
Returns true if the character is in the class "other symbol" - emojis etc.
- ITALIAN - Enum constant in enum class com.yahoo.language.Language
-
Language tag "it".
J
- JAPANESE - Enum constant in enum class com.yahoo.language.Language
-
Language tag "ja".
- JAVANESE - Enum constant in enum class com.yahoo.language.Language
-
Language tag "jw".
K
- KANNADA - Enum constant in enum class com.yahoo.language.Language
-
Language tag "kn".
- KANNADA - Enum constant in enum class com.yahoo.language.process.TokenScript
- KASHMIRI - Enum constant in enum class com.yahoo.language.Language
-
Language tag "ks".
- KATAKANA - Enum constant in enum class com.yahoo.language.process.TokenScript
- KAZAKH - Enum constant in enum class com.yahoo.language.Language
-
Language tag "kk".
- KHAROSHTHI - Enum constant in enum class com.yahoo.language.process.TokenScript
- KHMER - Enum constant in enum class com.yahoo.language.process.TokenScript
- KINYARWANDA - Enum constant in enum class com.yahoo.language.Language
-
Language tag "rw".
- KIRGHIZ - Enum constant in enum class com.yahoo.language.Language
-
Language tag "ky".
- KIRUNDI - Enum constant in enum class com.yahoo.language.Language
-
Language tag "rn".
- KOREAN - Enum constant in enum class com.yahoo.language.Language
-
Language tag "ko".
- KStemmer - Class in com.yahoo.language.simple.kstem
-
A stemmer implementing the Kstem algorithm by Bob Krovetz.
- KStemmer() - Constructor for class com.yahoo.language.simple.kstem.KStemmer
- KURDISH - Enum constant in enum class com.yahoo.language.Language
-
Language tag "ku".
L
- Language - Enum Class in com.yahoo.language
- languageCode() - Method in enum class com.yahoo.language.Language
- languages() - Method in class com.yahoo.language.significance.impl.SignificanceModelFile
- LAO - Enum constant in enum class com.yahoo.language.process.TokenScript
- LAOTHIAN - Enum constant in enum class com.yahoo.language.Language
-
Language tag "lo".
- LATIN - Enum constant in enum class com.yahoo.language.Language
-
Language tag "la".
- LATIN - Enum constant in enum class com.yahoo.language.process.TokenScript
- LATVIAN - Enum constant in enum class com.yahoo.language.Language
-
Language tag "lv".
- len - Variable in class com.yahoo.language.simple.kstem.OpenStringBuilder
- length() - Method in class com.yahoo.language.simple.kstem.OpenStringBuilder
- LIMBU - Enum constant in enum class com.yahoo.language.process.TokenScript
- LINEARB - Enum constant in enum class com.yahoo.language.process.TokenScript
- LINGALA - Enum constant in enum class com.yahoo.language.Language
-
Language tag "ln".
- Linguistics - Interface in com.yahoo.language
-
Factory of linguistic processors.
- Linguistics.Component - Enum Class in com.yahoo.language
- LinguisticsCase - Class in com.yahoo.language
-
This class provides a case normalization operation to be used e.g. when document search should be case-insensitive.
- LinguisticsCase() - Constructor for class com.yahoo.language.LinguisticsCase
- LITHUANIAN - Enum constant in enum class com.yahoo.language.Language
-
Language tag "lt".
- LocaleFactory - Class in com.yahoo.language
M
- MACEDONIAN - Enum constant in enum class com.yahoo.language.Language
-
Language tag "mk".
- MALAGASY - Enum constant in enum class com.yahoo.language.Language
-
Language tag "mg".
- MALAY - Enum constant in enum class com.yahoo.language.Language
-
Language tag "ms".
- MALAYALAM - Enum constant in enum class com.yahoo.language.Language
-
Language tag "ml".
- MALAYALAM - Enum constant in enum class com.yahoo.language.process.TokenScript
- MALTESE - Enum constant in enum class com.yahoo.language.Language
-
Language tag "mt".
- MANIPURI - Enum constant in enum class com.yahoo.language.Language
-
Language tag "mni".
- MAORI - Enum constant in enum class com.yahoo.language.Language
-
Language tag "mi".
- MARATHI - Enum constant in enum class com.yahoo.language.Language
-
Language tag "mr".
- MARKER - Enum constant in enum class com.yahoo.language.process.TokenType
- MOLDAVIAN - Enum constant in enum class com.yahoo.language.Language
-
Language tag "mo".
- MONGOLIAN - Enum constant in enum class com.yahoo.language.Language
-
Language tag "mn".
- MONGOLIAN - Enum constant in enum class com.yahoo.language.process.TokenScript
- MUNDA - Enum constant in enum class com.yahoo.language.Language
-
Language tag "mun".
- MYANMAR - Enum constant in enum class com.yahoo.language.process.TokenScript
N
- name() - Method in class com.yahoo.language.process.SpecialTokens
-
Returns the name of this special tokens list
- NAURU - Enum constant in enum class com.yahoo.language.Language
-
Language tag "na".
- NEPALI - Enum constant in enum class com.yahoo.language.Language
-
Language tag "ne".
- newCharacterBuffer(int) - Static method in class com.yahoo.language.simple.kstem.CharacterUtils
-
Creates a new
CharacterUtils.CharacterBuffer
and allocates achar[]
of the given bufferSize. - newCountryHint(String) - Static method in class com.yahoo.language.detect.Hint
- newInstance(String, String) - Static method in class com.yahoo.language.detect.Hint
- newMarketHint(String) - Static method in class com.yahoo.language.detect.Hint
- next() - Method in class com.yahoo.language.process.GramSplitter.GramSplitterIterator
- NONE - Enum constant in enum class com.yahoo.language.process.StemMode
- normalize(String) - Method in interface com.yahoo.language.process.Normalizer
-
NFKC normalizes a String.
- normalize(String) - Method in class com.yahoo.language.simple.SimpleNormalizer
- Normalizer - Interface in com.yahoo.language.process
-
This interface provides NFKC normalization of Strings through the underlying linguistics library.
- NORMALIZER - Enum constant in enum class com.yahoo.language.Linguistics.Component
- NORWEGIAN_BOKMAL - Enum constant in enum class com.yahoo.language.Language
-
Language tag "nb".
- NORWEGIAN_NYNORSK - Enum constant in enum class com.yahoo.language.Language
-
Language tag "nn".
- NUMERIC - Enum constant in enum class com.yahoo.language.process.TokenType
O
- OCCITAN - Enum constant in enum class com.yahoo.language.Language
-
Language tag "oc".
- offsetByCodePoints(char[], int, int, int, int) - Method in class com.yahoo.language.simple.kstem.CharacterUtils
-
Return the index within
buf[start:start+count]
which is byoffset
code points fromindex
. - OGHAM - Enum constant in enum class com.yahoo.language.process.TokenScript
- OLDITALIC - Enum constant in enum class com.yahoo.language.process.TokenScript
- OLDPERSIAN - Enum constant in enum class com.yahoo.language.process.TokenScript
- OpenStringBuilder - Class in com.yahoo.language.simple.kstem
-
A StringBuilder that allows one to access the array.
- OpenStringBuilder() - Constructor for class com.yahoo.language.simple.kstem.OpenStringBuilder
- OpenStringBuilder(int) - Constructor for class com.yahoo.language.simple.kstem.OpenStringBuilder
- ORIYA - Enum constant in enum class com.yahoo.language.Language
-
Language tag "or".
- ORIYA - Enum constant in enum class com.yahoo.language.process.TokenScript
- OROMO - Enum constant in enum class com.yahoo.language.Language
-
Language tag "om".
- OSMANYA - Enum constant in enum class com.yahoo.language.process.TokenScript
P
- PASHTO - Enum constant in enum class com.yahoo.language.Language
-
Language tag "ps".
- PERSIAN - Enum constant in enum class com.yahoo.language.Language
-
Language tag "fa".
- POLISH - Enum constant in enum class com.yahoo.language.Language
-
Language tag "pl".
- PORTUGUESE - Enum constant in enum class com.yahoo.language.Language
-
Language tag "pt".
- ProcessingException - Exception in com.yahoo.language.process
-
Exception class indicating that a fatal error occured during linguistic processing.
- ProcessingException(String) - Constructor for exception com.yahoo.language.process.ProcessingException
- ProcessingException(String, Throwable) - Constructor for exception com.yahoo.language.process.ProcessingException
- PUNCTUATION - Enum constant in enum class com.yahoo.language.process.TokenType
- PUNJABI - Enum constant in enum class com.yahoo.language.Language
-
Language tag "pa".
- putCachedValue(Object, Object) - Method in class com.yahoo.language.process.Embedder.Context
- putCachedValue(Object, Object) - Method in class com.yahoo.language.process.TextGenerator.Context
Q
R
- remove() - Method in class com.yahoo.language.process.GramSplitter.GramSplitterIterator
- remove(int) - Method in class com.yahoo.language.process.StemList
- replacement() - Method in class com.yahoo.language.process.SpecialTokens.Token
-
Returns the token to replace occurrences of this by, which equals token() unless this has a replacement.
- reserve(int) - Method in class com.yahoo.language.simple.kstem.OpenStringBuilder
- reset() - Method in class com.yahoo.language.simple.kstem.CharacterUtils.CharacterBuffer
-
Resets the CharacterBuffer.
- reset() - Method in class com.yahoo.language.simple.kstem.OpenStringBuilder
- resize(int) - Method in class com.yahoo.language.simple.kstem.OpenStringBuilder
- RHAETO_ROMANCE - Enum constant in enum class com.yahoo.language.Language
-
Language tag "rm".
- ROMANIAN - Enum constant in enum class com.yahoo.language.Language
-
Language tag "ro".
- RUNIC - Enum constant in enum class com.yahoo.language.process.TokenScript
- RUSSIAN - Enum constant in enum class com.yahoo.language.Language
-
Language tag "ru".
S
- SAMOAN - Enum constant in enum class com.yahoo.language.Language
-
Language tag "sm".
- sampleEmbeddingLatency(double, Embedder.Context) - Method in interface com.yahoo.language.process.Embedder.Runtime
-
Add a sample embedding latency to this
- sampleSequenceLength(long, Embedder.Context) - Method in interface com.yahoo.language.process.Embedder.Runtime
-
Add a sample embedding length to this
- SANGHO - Enum constant in enum class com.yahoo.language.Language
-
Language tag "sg".
- SANSKRIT - Enum constant in enum class com.yahoo.language.Language
-
Language tag "sa".
- SCOTS_GAELIC - Enum constant in enum class com.yahoo.language.Language
-
Language tag "gd".
- segment(String, Language) - Method in interface com.yahoo.language.process.Segmenter
-
Returns a list of segments produced from a string.
- segment(String, Language) - Method in class com.yahoo.language.process.SegmenterImpl
- Segmenter - Interface in com.yahoo.language.process
-
A segmenter splits a string into separate segments (such as words) without applying any further processing (such as stemming) on each segment.
- SEGMENTER - Enum constant in enum class com.yahoo.language.Linguistics.Component
- SegmenterImpl - Class in com.yahoo.language.process
- SegmenterImpl(Tokenizer) - Constructor for class com.yahoo.language.process.SegmenterImpl
- SERBIAN - Enum constant in enum class com.yahoo.language.Language
-
Language tag "sr".
- SERBO_CROATIAN - Enum constant in enum class com.yahoo.language.Language
-
Language tag "s".
- SESOTHO - Enum constant in enum class com.yahoo.language.Language
-
Language tag "st".
- set(char[], int) - Method in class com.yahoo.language.simple.kstem.OpenStringBuilder
- set(int, String) - Method in class com.yahoo.language.process.StemList
- setCharAt(int, char) - Method in class com.yahoo.language.simple.kstem.OpenStringBuilder
- setDestination(String) - Method in class com.yahoo.language.process.Embedder.Context
-
Sets the name of the recipient of this tensor.
- setDestination(String) - Method in class com.yahoo.language.process.TextGenerator.Context
-
Sets the name of the recipient of the generated text.
- setEmbedderId(String) - Method in class com.yahoo.language.process.Embedder.Context
-
Sets the embedder id
- setGeneratorId(String) - Method in class com.yahoo.language.process.TextGenerator.Context
-
Sets the generator id
- setLanguage(Language) - Method in class com.yahoo.language.process.Embedder.Context
-
Sets the language of the text, or UNKNOWN to use language independent embedding
- setLanguage(Language) - Method in class com.yahoo.language.process.TextGenerator.Context
-
Sets the language of the text, or UNKNOWN to use language independent generation
- setLength(int) - Method in class com.yahoo.language.simple.kstem.OpenStringBuilder
- setOffset(long) - Method in class com.yahoo.language.simple.SimpleToken
- setScript(TokenScript) - Method in class com.yahoo.language.simple.SimpleToken
- setSpecialToken(boolean) - Method in class com.yahoo.language.simple.SimpleToken
- SETSWANA - Enum constant in enum class com.yahoo.language.Language
-
Language tag "tn".
- setTokenString(String) - Method in class com.yahoo.language.simple.SimpleToken
- setType(TokenType) - Method in class com.yahoo.language.simple.SimpleToken
- SHAVIAN - Enum constant in enum class com.yahoo.language.process.TokenScript
- SHONA - Enum constant in enum class com.yahoo.language.Language
-
Language tag "sn".
- SHORTEST - Enum constant in enum class com.yahoo.language.process.StemMode
- SICHUAN_YI - Enum constant in enum class com.yahoo.language.Language
-
Language tag "ii".
- SignificanceModel - Interface in com.yahoo.language.significance
- SignificanceModelFile - Class in com.yahoo.language.significance.impl
- SignificanceModelFile(String, String, String, HashMap<String, DocumentFrequencyFile>) - Constructor for class com.yahoo.language.significance.impl.SignificanceModelFile
- SignificanceModelRegistry - Interface in com.yahoo.language.significance
- SimpleDetector - Class in com.yahoo.language.simple
-
Includes functionality for determining the langCode from a sample or from the encoding.
- SimpleDetector() - Constructor for class com.yahoo.language.simple.SimpleDetector
- SimpleLinguistics - Class in com.yahoo.language.simple
-
Factory of simple linguistic processor implementations.
- SimpleLinguistics() - Constructor for class com.yahoo.language.simple.SimpleLinguistics
- SimpleNormalizer - Class in com.yahoo.language.simple
- SimpleNormalizer() - Constructor for class com.yahoo.language.simple.SimpleNormalizer
- SimpleToken - Class in com.yahoo.language.simple
- SimpleToken(String) - Constructor for class com.yahoo.language.simple.SimpleToken
- SimpleToken(String, String) - Constructor for class com.yahoo.language.simple.SimpleToken
- SimpleTokenizer - Class in com.yahoo.language.simple
-
A tokenizer which splits on whitespace, normalizes and transforms using the given implementations and stems using the kstem algorithm.
- SimpleTokenizer() - Constructor for class com.yahoo.language.simple.SimpleTokenizer
- SimpleTokenizer(Normalizer) - Constructor for class com.yahoo.language.simple.SimpleTokenizer
- SimpleTokenizer(Normalizer, Transformer) - Constructor for class com.yahoo.language.simple.SimpleTokenizer
- SimpleTokenizer(Normalizer, Transformer, SpecialTokenRegistry) - Constructor for class com.yahoo.language.simple.SimpleTokenizer
- SimpleTokenType - Class in com.yahoo.language.simple
- SimpleTokenType() - Constructor for class com.yahoo.language.simple.SimpleTokenType
- SimpleTransformer - Class in com.yahoo.language.simple
-
Converts all accented characters into their de-accented counterparts followed by their combining diacritics, then strips off the diacritics using a regex.
- SimpleTransformer() - Constructor for class com.yahoo.language.simple.SimpleTransformer
- SINDHI - Enum constant in enum class com.yahoo.language.Language
-
Language tag "sd".
- SINHALA - Enum constant in enum class com.yahoo.language.process.TokenScript
- SINHALESE - Enum constant in enum class com.yahoo.language.Language
-
Language tag "si".
- SISWATI - Enum constant in enum class com.yahoo.language.Language
-
Language tag "ss".
- size() - Method in class com.yahoo.language.process.StemList
- size() - Method in class com.yahoo.language.simple.kstem.OpenStringBuilder
- SLOVAK - Enum constant in enum class com.yahoo.language.Language
-
Language tag "sk".
- SLOVENIAN - Enum constant in enum class com.yahoo.language.Language
-
Language tag "sl".
- SOMALI - Enum constant in enum class com.yahoo.language.Language
-
Language tag "so".
- SPACE - Enum constant in enum class com.yahoo.language.process.TokenType
- SPANISH - Enum constant in enum class com.yahoo.language.Language
-
Language tag "es".
- SpecialTokenRegistry - Class in com.yahoo.language.process
-
Immutable named lists of "special tokens" - strings which should override the normal tokenizer semantics and be tokenized into a single token.
- SpecialTokenRegistry() - Constructor for class com.yahoo.language.process.SpecialTokenRegistry
-
Creates an empty special token registry
- SpecialTokenRegistry(SpecialtokensConfig) - Constructor for class com.yahoo.language.process.SpecialTokenRegistry
-
Create a special token registry from a configuration object.
- SpecialTokenRegistry(List<SpecialTokens>) - Constructor for class com.yahoo.language.process.SpecialTokenRegistry
- SpecialTokens - Class in com.yahoo.language.process
-
An immutable list of special tokens - strings which should override the normal tokenizer semantics and be tokenized into a single token.
- SpecialTokens(String, List<SpecialTokens.Token>) - Constructor for class com.yahoo.language.process.SpecialTokens
- SpecialTokens.Token - Class in com.yahoo.language.process
-
An immutable special token
- split(String, int) - Method in class com.yahoo.language.process.GramSplitter
-
Splits the input into grams of size n and returns an iterator over grams represented as [start index,length] pairs into the input string.
- stem(String) - Method in class com.yahoo.language.simple.kstem.KStemmer
- stem(String, Language, StemMode, boolean) - Method in interface com.yahoo.language.process.Stemmer
-
Stem input according to specified stemming mode.
- stem(String, Language, StemMode, boolean) - Method in class com.yahoo.language.process.StemmerImpl
- stem(String, StemMode, Language) - Method in interface com.yahoo.language.process.Stemmer
-
Stem input according to specified stemming mode.
- stem(String, StemMode, Language) - Method in class com.yahoo.language.process.StemmerImpl
- StemList - Class in com.yahoo.language.process
-
A list of strings which does not allow for duplicate elements.
- StemList() - Constructor for class com.yahoo.language.process.StemList
- StemList(String...) - Constructor for class com.yahoo.language.process.StemList
- Stemmer - Interface in com.yahoo.language.process
-
Interface providing stemming of single words.
- STEMMER - Enum constant in enum class com.yahoo.language.Linguistics.Component
- StemmerImpl - Class in com.yahoo.language.process
- StemmerImpl(Tokenizer) - Constructor for class com.yahoo.language.process.StemmerImpl
- StemMode - Enum Class in com.yahoo.language.process
-
An enum of the stemming modes which can be requested.
- subSequence(int, int) - Method in class com.yahoo.language.simple.kstem.OpenStringBuilder
- SUNDANESE - Enum constant in enum class com.yahoo.language.Language
-
Language tag "su".
- SWAHILI - Enum constant in enum class com.yahoo.language.Language
-
Language tag "sw".
- SWEDISH - Enum constant in enum class com.yahoo.language.Language
-
Language tag "sv".
- SYLOTINAGRI - Enum constant in enum class com.yahoo.language.process.TokenScript
- SYMBOL - Enum constant in enum class com.yahoo.language.process.TokenType
- SYRIAC - Enum constant in enum class com.yahoo.language.Language
-
Language tag "syr".
- SYRIAC - Enum constant in enum class com.yahoo.language.process.TokenScript
T
- TAGALOG - Enum constant in enum class com.yahoo.language.Language
-
Language tag "fil".
- TAGALOG - Enum constant in enum class com.yahoo.language.process.TokenScript
- TAGBANWA - Enum constant in enum class com.yahoo.language.process.TokenScript
- TAILE - Enum constant in enum class com.yahoo.language.process.TokenScript
- TAILUE - Enum constant in enum class com.yahoo.language.process.TokenScript
- TAJIK - Enum constant in enum class com.yahoo.language.Language
-
Language tag "tg".
- TAMIL - Enum constant in enum class com.yahoo.language.Language
-
Language tag "ta".
- TAMIL - Enum constant in enum class com.yahoo.language.process.TokenScript
- TATAR - Enum constant in enum class com.yahoo.language.Language
-
Language tag "tt".
- TELUGU - Enum constant in enum class com.yahoo.language.Language
-
Language tag "te".
- TELUGU - Enum constant in enum class com.yahoo.language.process.TokenScript
- testInstance() - Static method in interface com.yahoo.language.process.Embedder.Runtime
- TextGenerator - Interface in com.yahoo.language.process
-
Generates text given a prompt.
- TextGenerator.Context - Class in com.yahoo.language.process
- TextGenerator.FailingTextGenerator - Class in com.yahoo.language.process
- THAANA - Enum constant in enum class com.yahoo.language.process.TokenScript
- THAI - Enum constant in enum class com.yahoo.language.Language
-
Language tag "th".
- THAI - Enum constant in enum class com.yahoo.language.process.TokenScript
- throwsOnUse - Static variable in interface com.yahoo.language.process.Embedder
-
An instance of this which throws IllegalStateException if attempted used
- throwsOnUse - Static variable in interface com.yahoo.language.process.TextGenerator
- TIBETAN - Enum constant in enum class com.yahoo.language.Language
-
Language tag "bo".
- TIBETAN - Enum constant in enum class com.yahoo.language.process.TokenScript
- TIFINAGH - Enum constant in enum class com.yahoo.language.process.TokenScript
- TIGRINYA - Enum constant in enum class com.yahoo.language.Language
-
Language tag "ti".
- toCharArray() - Method in class com.yahoo.language.simple.kstem.OpenStringBuilder
- toChars(int[], int, int, char[], int) - Method in class com.yahoo.language.simple.kstem.CharacterUtils
-
Converts a sequence of unicode code points to a sequence of Java characters.
- toCodePoints(char[], int, int, int[], int) - Method in class com.yahoo.language.simple.kstem.CharacterUtils
-
Converts a sequence of Java characters to a sequence of unicode code points.
- toDetailString() - Method in class com.yahoo.language.simple.SimpleToken
- toExtractedList() - Method in class com.yahoo.language.process.GramSplitter.GramSplitterIterator
-
Convenience list which splits the remaining items in this iterator into a list of gram strings
- token() - Method in class com.yahoo.language.process.SpecialTokens.Token
-
Returns the special token
- Token - Interface in com.yahoo.language.process
-
A single token produced by the tokenizer.
- Token(String) - Constructor for class com.yahoo.language.process.SpecialTokens.Token
-
Creates a special token
- Token(String, String) - Constructor for class com.yahoo.language.process.SpecialTokens.Token
-
Creates a special token which will be represented by the given replacement token
- tokenize(String, boolean) - Method in class com.yahoo.language.process.SpecialTokens
-
Returns the special token starting at the start of the given string, or null if no special token starts at this string
- tokenize(String, Language, StemMode, boolean) - Method in interface com.yahoo.language.process.Tokenizer
-
Returns the tokens produced from an input string under the rules of the given Language and additional options
- tokenize(String, Language, StemMode, boolean) - Method in class com.yahoo.language.simple.SimpleTokenizer
-
Tokenize the input, applying the transform of this to each token string.
- tokenize(String, Function<String, String>) - Method in class com.yahoo.language.simple.SimpleTokenizer
-
Tokenize the input, and apply the given transform to each token string.
- Tokenizer - Interface in com.yahoo.language.process
-
Language-sensitive tokenization of a text string.
- TOKENIZER - Enum constant in enum class com.yahoo.language.Linguistics.Component
- TokenScript - Enum Class in com.yahoo.language.process
-
List of token scripts (e.g. latin, japanese, chinese, etc.) which may warrant different linguistics treatment.
- TokenType - Enum Class in com.yahoo.language.process
-
An enumeration of token types.
- toLowerCase(char[], int, int) - Method in class com.yahoo.language.simple.kstem.CharacterUtils
-
Converts each unicode codepoint to lowerCase via
Character.toLowerCase(int)
starting at the given offset. - toLowerCase(String) - Static method in class com.yahoo.language.LinguisticsCase
-
The lower casing method to use in Vespa when doing language independent processing of natural language data.
- TONGA - Enum constant in enum class com.yahoo.language.Language
-
Language tag "to".
- toString() - Method in class com.yahoo.language.process.SpecialTokens.Token
- toString() - Method in record class com.yahoo.language.significance.DocumentFrequency
-
Returns a string representation of this record class.
- toString() - Method in class com.yahoo.language.simple.kstem.OpenStringBuilder
- toString() - Method in class com.yahoo.language.simple.SimpleToken
- toUpperCase(char[], int, int) - Method in class com.yahoo.language.simple.kstem.CharacterUtils
-
Converts each unicode codepoint to UpperCase via
Character.toUpperCase(int)
starting at the given offset. - Transformer - Interface in com.yahoo.language.process
-
Interface for providers of text transformations such as accent removal.
- TRANSFORMER - Enum constant in enum class com.yahoo.language.Linguistics.Component
- TSONGA - Enum constant in enum class com.yahoo.language.Language
-
Language tag "ts".
- TURKISH - Enum constant in enum class com.yahoo.language.Language
-
Language tag "tr".
- TURKMEN - Enum constant in enum class com.yahoo.language.Language
-
Language tag "tk".
- TWI - Enum constant in enum class com.yahoo.language.Language
-
Language tag "tw".
U
- UGARITIC - Enum constant in enum class com.yahoo.language.Language
-
Language tag "uga".
- UGARITIC - Enum constant in enum class com.yahoo.language.process.TokenScript
- UIGHUR - Enum constant in enum class com.yahoo.language.Language
-
Language tag "ug".
- UKRAINIAN - Enum constant in enum class com.yahoo.language.Language
-
Language tag "uk".
- UNKNOWN - Enum constant in enum class com.yahoo.language.Language
-
Language tag "un".
- UNKNOWN - Enum constant in enum class com.yahoo.language.process.TokenScript
- UNKNOWN - Enum constant in enum class com.yahoo.language.process.TokenType
- unsafeWrite(char) - Method in class com.yahoo.language.simple.kstem.OpenStringBuilder
- unsafeWrite(char[], int, int) - Method in class com.yahoo.language.simple.kstem.OpenStringBuilder
- URDU - Enum constant in enum class com.yahoo.language.Language
-
Language tag "ur".
- UZBEK - Enum constant in enum class com.yahoo.language.Language
-
Language tag "uz".
V
- valueOf(int) - Static method in enum class com.yahoo.language.process.TokenType
-
Translates this from the int code representation returned from
TokenType.getValue()
- valueOf(int) - Static method in class com.yahoo.language.simple.SimpleTokenType
- valueOf(String) - Static method in enum class com.yahoo.language.Language
-
Returns the enum constant of this class with the specified name.
- valueOf(String) - Static method in enum class com.yahoo.language.Linguistics.Component
-
Returns the enum constant of this class with the specified name.
- valueOf(String) - Static method in enum class com.yahoo.language.process.StemMode
-
Returns the enum constant of this class with the specified name.
- valueOf(String) - Static method in enum class com.yahoo.language.process.TokenScript
-
Returns the enum constant of this class with the specified name.
- valueOf(String) - Static method in enum class com.yahoo.language.process.TokenType
-
Returns the enum constant of this class with the specified name.
- values() - Static method in enum class com.yahoo.language.Language
-
Returns an array containing the constants of this enum class, in the order they are declared.
- values() - Static method in enum class com.yahoo.language.Linguistics.Component
-
Returns an array containing the constants of this enum class, in the order they are declared.
- values() - Static method in enum class com.yahoo.language.process.StemMode
-
Returns an array containing the constants of this enum class, in the order they are declared.
- values() - Static method in enum class com.yahoo.language.process.TokenScript
-
Returns an array containing the constants of this enum class, in the order they are declared.
- values() - Static method in enum class com.yahoo.language.process.TokenType
-
Returns an array containing the constants of this enum class, in the order they are declared.
- version() - Method in class com.yahoo.language.significance.impl.SignificanceModelFile
- VIETNAMESE - Enum constant in enum class com.yahoo.language.Language
-
Language tag "vi".
- VIETNAMESE - Enum constant in enum class com.yahoo.language.process.TokenScript
- VOLAPUK - Enum constant in enum class com.yahoo.language.Language
-
Language tag "vo".
W
- WELSH - Enum constant in enum class com.yahoo.language.Language
-
Language tag "cy".
- WOLOF - Enum constant in enum class com.yahoo.language.Language
-
Language tag "wo".
- write(char) - Method in class com.yahoo.language.simple.kstem.OpenStringBuilder
- write(char[]) - Method in class com.yahoo.language.simple.kstem.OpenStringBuilder
- write(char[], int, int) - Method in class com.yahoo.language.simple.kstem.OpenStringBuilder
- write(int) - Method in class com.yahoo.language.simple.kstem.OpenStringBuilder
- write(OpenStringBuilder) - Method in class com.yahoo.language.simple.kstem.OpenStringBuilder
- write(String) - Method in class com.yahoo.language.simple.kstem.OpenStringBuilder
X
Y
- YI - Enum constant in enum class com.yahoo.language.process.TokenScript
- YIDDISH - Enum constant in enum class com.yahoo.language.Language
-
Language tag "yi".
- YORUBA - Enum constant in enum class com.yahoo.language.Language
-
Language tag "yo".
Z
- ZHUANG - Enum constant in enum class com.yahoo.language.Language
-
Language tag "za".
- ZULU - Enum constant in enum class com.yahoo.language.Language
-
Language tag "zu".
All Classes and Interfaces|All Packages|Constant Field Values|Serialized Form