A B C D E F G H I J K L M N O P Q R S T U V W X Y Z 
All Classes All Packages

A

ABKHAZIAN - com.yahoo.language.Language
Language tag "ab".
AbstractDetector - Class in com.yahoo.language.detect
 
AbstractDetector() - Constructor for class com.yahoo.language.detect.AbstractDetector
 
accentDrop(String, Language) - Method in interface com.yahoo.language.process.Transformer
Remove accents from input text.
ACCEPT_LANGUAGE_FIELD_NUMBER - Static variable in class sentencepiece.SentencepieceModel.TrainerSpec
 
add(int, String) - Method in class com.yahoo.language.process.StemList
 
ADD_DUMMY_PREFIX_FIELD_NUMBER - Static variable in class sentencepiece.SentencepieceModel.NormalizerSpec
 
addAcceptLanguage(String) - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
List of the languages this model can accept.
addAcceptLanguageBytes(ByteString) - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
List of the languages this model can accept.
addAllAcceptLanguage(Iterable<String>) - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
List of the languages this model can accept.
addAllControlSymbols(Iterable<String>) - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
///////////////////////////////////////////////////////////////// Vocabulary management Defines control symbols used as an indicator to change the behavior of the decoder.
addAllInput(Iterable<String>) - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
///////////////////////////////////////////////////////////////// General parameters Input corpus files.
addAllPieces(Iterable<? extends SentencepieceModel.ModelProto.SentencePiece>) - Method in class sentencepiece.SentencepieceModel.ModelProto.Builder
Sentence pieces with scores.
addAllSamples(Iterable<? extends SentencepieceModel.SelfTestData.Sample>) - Method in class sentencepiece.SentencepieceModel.SelfTestData.Builder
repeated .sentencepiece.SelfTestData.Sample samples = 1;
addAllUserDefinedSymbols(Iterable<String>) - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
Defines user defined symbols.
addControlSymbols(String) - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
///////////////////////////////////////////////////////////////// Vocabulary management Defines control symbols used as an indicator to change the behavior of the decoder.
addControlSymbolsBytes(ByteString) - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
///////////////////////////////////////////////////////////////// Vocabulary management Defines control symbols used as an indicator to change the behavior of the decoder.
addDefaultModel(Path) - Method in class com.yahoo.language.sentencepiece.SentencePieceEncoder.Builder
Adds the model that will be used if the language is unknown, OR only one model is specified.
addExtension(GeneratedMessage.GeneratedExtension<SentencepieceModel.ModelProto, List<Type>>, Type) - Method in class sentencepiece.SentencepieceModel.ModelProto.Builder
 
addExtension(GeneratedMessage.GeneratedExtension<SentencepieceModel.ModelProto.SentencePiece, List<Type>>, Type) - Method in class sentencepiece.SentencepieceModel.ModelProto.SentencePiece.Builder
 
addExtension(GeneratedMessage.GeneratedExtension<SentencepieceModel.NormalizerSpec, List<Type>>, Type) - Method in class sentencepiece.SentencepieceModel.NormalizerSpec.Builder
 
addExtension(GeneratedMessage.GeneratedExtension<SentencepieceModel.SelfTestData, List<Type>>, Type) - Method in class sentencepiece.SentencepieceModel.SelfTestData.Builder
 
addExtension(GeneratedMessage.GeneratedExtension<SentencepieceModel.TrainerSpec, List<Type>>, Type) - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
 
addInput(String) - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
///////////////////////////////////////////////////////////////// General parameters Input corpus files.
addInputBytes(ByteString) - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
///////////////////////////////////////////////////////////////// General parameters Input corpus files.
addModel(Language, Path) - Method in class com.yahoo.language.sentencepiece.SentencePieceEncoder.Builder
 
addPieces(int, SentencepieceModel.ModelProto.SentencePiece) - Method in class sentencepiece.SentencepieceModel.ModelProto.Builder
Sentence pieces with scores.
addPieces(int, SentencepieceModel.ModelProto.SentencePiece.Builder) - Method in class sentencepiece.SentencepieceModel.ModelProto.Builder
Sentence pieces with scores.
addPieces(SentencepieceModel.ModelProto.SentencePiece) - Method in class sentencepiece.SentencepieceModel.ModelProto.Builder
Sentence pieces with scores.
addPieces(SentencepieceModel.ModelProto.SentencePiece.Builder) - Method in class sentencepiece.SentencepieceModel.ModelProto.Builder
Sentence pieces with scores.
addPiecesBuilder() - Method in class sentencepiece.SentencepieceModel.ModelProto.Builder
Sentence pieces with scores.
addPiecesBuilder(int) - Method in class sentencepiece.SentencepieceModel.ModelProto.Builder
Sentence pieces with scores.
addRepeatedField(Descriptors.FieldDescriptor, Object) - Method in class sentencepiece.SentencepieceModel.ModelProto.Builder
 
addRepeatedField(Descriptors.FieldDescriptor, Object) - Method in class sentencepiece.SentencepieceModel.ModelProto.SentencePiece.Builder
 
addRepeatedField(Descriptors.FieldDescriptor, Object) - Method in class sentencepiece.SentencepieceModel.NormalizerSpec.Builder
 
addRepeatedField(Descriptors.FieldDescriptor, Object) - Method in class sentencepiece.SentencepieceModel.SelfTestData.Builder
 
addRepeatedField(Descriptors.FieldDescriptor, Object) - Method in class sentencepiece.SentencepieceModel.SelfTestData.Sample.Builder
 
addRepeatedField(Descriptors.FieldDescriptor, Object) - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
 
addSamples(int, SentencepieceModel.SelfTestData.Sample) - Method in class sentencepiece.SentencepieceModel.SelfTestData.Builder
repeated .sentencepiece.SelfTestData.Sample samples = 1;
addSamples(int, SentencepieceModel.SelfTestData.Sample.Builder) - Method in class sentencepiece.SentencepieceModel.SelfTestData.Builder
repeated .sentencepiece.SelfTestData.Sample samples = 1;
addSamples(SentencepieceModel.SelfTestData.Sample) - Method in class sentencepiece.SentencepieceModel.SelfTestData.Builder
repeated .sentencepiece.SelfTestData.Sample samples = 1;
addSamples(SentencepieceModel.SelfTestData.Sample.Builder) - Method in class sentencepiece.SentencepieceModel.SelfTestData.Builder
repeated .sentencepiece.SelfTestData.Sample samples = 1;
addSamplesBuilder() - Method in class sentencepiece.SentencepieceModel.SelfTestData.Builder
repeated .sentencepiece.SelfTestData.Sample samples = 1;
addSamplesBuilder(int) - Method in class sentencepiece.SentencepieceModel.SelfTestData.Builder
repeated .sentencepiece.SelfTestData.Sample samples = 1;
addUserDefinedSymbols(String) - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
Defines user defined symbols.
addUserDefinedSymbolsBytes(ByteString) - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
Defines user defined symbols.
AFAR - com.yahoo.language.Language
Language tag "aa".
AFRIKAANS - com.yahoo.language.Language
Language tag "af".
ALBANIAN - com.yahoo.language.Language
Language tag "sq".
ALL - com.yahoo.language.process.StemMode
 
ALLOW_WHITESPACE_ONLY_PIECES_FIELD_NUMBER - Static variable in class sentencepiece.SentencepieceModel.TrainerSpec
 
ALPHABETIC - com.yahoo.language.process.TokenType
 
AMHARIC - com.yahoo.language.Language
Language tag "am".
ARABIC - com.yahoo.language.Language
Language tag "ar".
ARABIC - com.yahoo.language.process.TokenScript
 
ARMENIAN - com.yahoo.language.Language
Language tag "hy".
ARMENIAN - com.yahoo.language.process.TokenScript
 
ASCII - com.yahoo.language.process.TokenScript
 
asMap() - Method in class com.yahoo.language.process.SpecialTokens
Returns the tokens of this as an immutable map from token to replacement.
ASSAMESE - com.yahoo.language.Language
Language tag "as".
AYMARA - com.yahoo.language.Language
Language tag "ay".
AZERBAIJANI - com.yahoo.language.Language
Language tag "az".

B

BASHKIR - com.yahoo.language.Language
Language tag "ba".
BASQUE - com.yahoo.language.Language
Language tag "eu".
BENGALI - com.yahoo.language.Language
Language tag "bn".
BENGALI - com.yahoo.language.process.TokenScript
 
BEST - com.yahoo.language.process.StemMode
 
BHUTANI - com.yahoo.language.Language
Language tag "dz".
BIHARI - com.yahoo.language.Language
Language tag "bh".
BISLAMA - com.yahoo.language.Language
Language tag "bi".
BOS_ID_FIELD_NUMBER - Static variable in class sentencepiece.SentencepieceModel.TrainerSpec
 
BOS_PIECE_FIELD_NUMBER - Static variable in class sentencepiece.SentencepieceModel.TrainerSpec
 
BPE - sentencepiece.SentencepieceModel.TrainerSpec.ModelType
Byte Pair Encoding
BPE_VALUE - Static variable in enum sentencepiece.SentencepieceModel.TrainerSpec.ModelType
Byte Pair Encoding
BRAILLE - com.yahoo.language.process.TokenScript
 
BRETON - com.yahoo.language.Language
Language tag "br".
BUGINESE - com.yahoo.language.Language
Language tag "bug".
BUGINESE - com.yahoo.language.process.TokenScript
 
BUHID - com.yahoo.language.process.TokenScript
 
build() - Method in class com.yahoo.language.sentencepiece.SentencePieceConfig.Builder
 
build() - Method in class com.yahoo.language.sentencepiece.SentencePieceConfig.Model.Builder
 
build() - Method in class com.yahoo.language.sentencepiece.SentencePieceEncoder.Builder
 
build() - Method in class sentencepiece.SentencepieceModel.ModelProto.Builder
 
build() - Method in class sentencepiece.SentencepieceModel.ModelProto.SentencePiece.Builder
 
build() - Method in class sentencepiece.SentencepieceModel.NormalizerSpec.Builder
 
build() - Method in class sentencepiece.SentencepieceModel.SelfTestData.Builder
 
build() - Method in class sentencepiece.SentencepieceModel.SelfTestData.Sample.Builder
 
build() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
 
Builder() - Constructor for class com.yahoo.language.sentencepiece.SentencePieceConfig.Builder
 
Builder() - Constructor for class com.yahoo.language.sentencepiece.SentencePieceConfig.Model.Builder
 
Builder() - Constructor for class com.yahoo.language.sentencepiece.SentencePieceEncoder.Builder
 
Builder(SentencePieceConfig) - Constructor for class com.yahoo.language.sentencepiece.SentencePieceConfig.Builder
 
Builder(SentencePieceConfig.Model) - Constructor for class com.yahoo.language.sentencepiece.SentencePieceConfig.Model.Builder
 
buildPartial() - Method in class sentencepiece.SentencepieceModel.ModelProto.Builder
 
buildPartial() - Method in class sentencepiece.SentencepieceModel.ModelProto.SentencePiece.Builder
 
buildPartial() - Method in class sentencepiece.SentencepieceModel.NormalizerSpec.Builder
 
buildPartial() - Method in class sentencepiece.SentencepieceModel.SelfTestData.Builder
 
buildPartial() - Method in class sentencepiece.SentencepieceModel.SelfTestData.Sample.Builder
 
buildPartial() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
 
BULGARIAN - com.yahoo.language.Language
Language tag "bg".
BURMESE - com.yahoo.language.Language
Language tag "my".
BYELORUSSIAN - com.yahoo.language.Language
Language tag "be".
BYTE - sentencepiece.SentencepieceModel.ModelProto.SentencePiece.Type
Typical usage of USER_DEFINED symbol is placeholder.
BYTE_FALLBACK_FIELD_NUMBER - Static variable in class sentencepiece.SentencepieceModel.TrainerSpec
 
BYTE_VALUE - Static variable in enum sentencepiece.SentencepieceModel.ModelProto.SentencePiece.Type
Typical usage of USER_DEFINED symbol is placeholder.

C

CAMBODIAN - com.yahoo.language.Language
Language tag "km".
CANADIAN - com.yahoo.language.process.TokenScript
 
CATALAN - com.yahoo.language.Language
Language tag "ca".
CHAR - sentencepiece.SentencepieceModel.TrainerSpec.ModelType
tokenizes into character sequence
CHAR_VALUE - Static variable in enum sentencepiece.SentencepieceModel.TrainerSpec.ModelType
tokenizes into character sequence
CHARACTER_CLASSES - com.yahoo.language.Linguistics.Component
 
CHARACTER_COVERAGE_FIELD_NUMBER - Static variable in class sentencepiece.SentencepieceModel.TrainerSpec
 
CharacterClasses - Class in com.yahoo.language.process
Determines the class of a given character.
CharacterClasses() - Constructor for class com.yahoo.language.process.CharacterClasses
 
CHEROKEE - com.yahoo.language.Language
Language tag "chr".
CHEROKEE - com.yahoo.language.process.TokenScript
 
CHINESE - com.yahoo.language.process.TokenScript
 
CHINESE_SIMPLIFIED - com.yahoo.language.Language
Language tag "zh-hans".
CHINESE_TRADITIONAL - com.yahoo.language.Language
Language tag "zh-hant".
clear() - Method in class sentencepiece.SentencepieceModel.ModelProto.Builder
 
clear() - Method in class sentencepiece.SentencepieceModel.ModelProto.SentencePiece.Builder
 
clear() - Method in class sentencepiece.SentencepieceModel.NormalizerSpec.Builder
 
clear() - Method in class sentencepiece.SentencepieceModel.SelfTestData.Builder
 
clear() - Method in class sentencepiece.SentencepieceModel.SelfTestData.Sample.Builder
 
clear() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
 
clearAcceptLanguage() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
List of the languages this model can accept.
clearAddDummyPrefix() - Method in class sentencepiece.SentencepieceModel.NormalizerSpec.Builder
Adds dummy whitespace at the beginning of text in order to treat "world" in "world" and "hello world" in the same way.
clearAllowWhitespaceOnlyPieces() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
Allows pieces that only contain whitespaces instead of appearing only as prefix or suffix of other pieces.
clearBosId() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
<s>
clearBosPiece() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
optional string bos_piece = 46 [default = "<s>"];
clearByteFallback() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
Decomposes unknown pieces into UTF-8 bytes.
clearCharacterCoverage() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
///////////////////////////////////////////////////////////////// Training parameters.
clearControlSymbols() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
///////////////////////////////////////////////////////////////// Vocabulary management Defines control symbols used as an indicator to change the behavior of the decoder.
clearDenormalizerSpec() - Method in class sentencepiece.SentencepieceModel.ModelProto.Builder
Spec for text de-normalization.
clearEosId() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
</s>
clearEosPiece() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
optional string eos_piece = 47 [default = "</s>"];
clearEscapeWhitespaces() - Method in class sentencepiece.SentencepieceModel.NormalizerSpec.Builder
Replaces whitespace with meta symbol.
clearExpected() - Method in class sentencepiece.SentencepieceModel.SelfTestData.Sample.Builder
optional string expected = 2;
clearExtension(GeneratedMessage.GeneratedExtension<SentencepieceModel.ModelProto, ?>) - Method in class sentencepiece.SentencepieceModel.ModelProto.Builder
 
clearExtension(GeneratedMessage.GeneratedExtension<SentencepieceModel.ModelProto.SentencePiece, ?>) - Method in class sentencepiece.SentencepieceModel.ModelProto.SentencePiece.Builder
 
clearExtension(GeneratedMessage.GeneratedExtension<SentencepieceModel.NormalizerSpec, ?>) - Method in class sentencepiece.SentencepieceModel.NormalizerSpec.Builder
 
clearExtension(GeneratedMessage.GeneratedExtension<SentencepieceModel.SelfTestData, ?>) - Method in class sentencepiece.SentencepieceModel.SelfTestData.Builder
 
clearExtension(GeneratedMessage.GeneratedExtension<SentencepieceModel.TrainerSpec, ?>) - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
 
clearField(Descriptors.FieldDescriptor) - Method in class sentencepiece.SentencepieceModel.ModelProto.Builder
 
clearField(Descriptors.FieldDescriptor) - Method in class sentencepiece.SentencepieceModel.ModelProto.SentencePiece.Builder
 
clearField(Descriptors.FieldDescriptor) - Method in class sentencepiece.SentencepieceModel.NormalizerSpec.Builder
 
clearField(Descriptors.FieldDescriptor) - Method in class sentencepiece.SentencepieceModel.SelfTestData.Builder
 
clearField(Descriptors.FieldDescriptor) - Method in class sentencepiece.SentencepieceModel.SelfTestData.Sample.Builder
 
clearField(Descriptors.FieldDescriptor) - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
 
clearHardVocabLimit() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
`vocab_size` is treated as hard limit.
clearInput() - Method in class sentencepiece.SentencepieceModel.SelfTestData.Sample.Builder
optional string input = 1;
clearInput() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
///////////////////////////////////////////////////////////////// General parameters Input corpus files.
clearInputFormat() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
Input corpus format: "text": one-sentence-per-line text format (default) "tsv": sentence <tab> freq
clearInputSentenceSize() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
Maximum size of sentences the trainer loads from `input` parameter.
clearMaxSentenceLength() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
The maximum sentence length in byte.
clearMaxSentencepieceLength() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
///////////////////////////////////////////////////////////////// SentencePiece parameters which control the shapes of sentence piece.
clearMiningSentenceSize() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
Deprecated.
clearModelPrefix() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
Output model file prefix.
clearModelType() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
optional .sentencepiece.TrainerSpec.ModelType model_type = 3 [default = UNIGRAM];
clearName() - Method in class sentencepiece.SentencepieceModel.NormalizerSpec.Builder
name of normalization rule.
clearNormalizationRuleTsv() - Method in class sentencepiece.SentencepieceModel.NormalizerSpec.Builder
Custom normalization rule file in TSV format.
clearNormalizerSpec() - Method in class sentencepiece.SentencepieceModel.ModelProto.Builder
Spec for text normalization.
clearNumSubIterations() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
Number of EM sub iterations.
clearNumThreads() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
Number of threads in the training.
clearOneof(Descriptors.OneofDescriptor) - Method in class sentencepiece.SentencepieceModel.ModelProto.Builder
 
clearOneof(Descriptors.OneofDescriptor) - Method in class sentencepiece.SentencepieceModel.ModelProto.SentencePiece.Builder
 
clearOneof(Descriptors.OneofDescriptor) - Method in class sentencepiece.SentencepieceModel.NormalizerSpec.Builder
 
clearOneof(Descriptors.OneofDescriptor) - Method in class sentencepiece.SentencepieceModel.SelfTestData.Builder
 
clearOneof(Descriptors.OneofDescriptor) - Method in class sentencepiece.SentencepieceModel.SelfTestData.Sample.Builder
 
clearOneof(Descriptors.OneofDescriptor) - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
 
clearPadId() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
<pad> (padding)
clearPadPiece() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
optional string pad_piece = 48 [default = "<pad>"];
clearPiece() - Method in class sentencepiece.SentencepieceModel.ModelProto.SentencePiece.Builder
piece must not be empty.
clearPieces() - Method in class sentencepiece.SentencepieceModel.ModelProto.Builder
Sentence pieces with scores.
clearPrecompiledCharsmap() - Method in class sentencepiece.SentencepieceModel.NormalizerSpec.Builder
Pre-compiled normalization rule created by Builder::GetPrecompiledCharsMap() or Builder::CompileCharsMap() method.
clearRemoveExtraWhitespaces() - Method in class sentencepiece.SentencepieceModel.NormalizerSpec.Builder
Removes leading, trailing, and duplicate internal whitespace.
clearRequiredChars() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
Defines required characters.
clearSamples() - Method in class sentencepiece.SentencepieceModel.SelfTestData.Builder
repeated .sentencepiece.SelfTestData.Sample samples = 1;
clearScore() - Method in class sentencepiece.SentencepieceModel.ModelProto.SentencePiece.Builder
optional float score = 2;
clearSeedSentencepieceSize() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
The size of seed sentencepieces.
clearSelfTestData() - Method in class sentencepiece.SentencepieceModel.ModelProto.Builder
Stores sample input and its expected segmentation to verify the model.
clearSelfTestSampleSize() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
Size of self-test samples, which are encoded in the model file.
clearShrinkingFactor() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
In every EM sub-iterations, keeps top `shrinking_factor` * `current sentencepieces size` with respect to the loss of the sentence piece.
clearShuffleInputSentence() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
optional bool shuffle_input_sentence = 19 [default = true];
clearSplitByNumber() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
When `split_by_number` is true, put a boundary between number and non-number transition.
clearSplitByUnicodeScript() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
Uses Unicode script to split sentence pieces.
clearSplitByWhitespace() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
Use a white space to split sentence pieces.
clearSplitDigits() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
Split all digits (0-9) into separate pieces.
clearTrainerSpec() - Method in class sentencepiece.SentencepieceModel.ModelProto.Builder
Spec used to generate this model file.
clearTrainExtremelyLargeCorpus() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
Increase bit depth to allow unigram model training on large (>10M sentences) corpora.
clearTrainingSentenceSize() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
Deprecated.
clearTreatWhitespaceAsSuffix() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
Adds whitespace symbol (_) as a suffix instead of prefix.
clearType() - Method in class sentencepiece.SentencepieceModel.ModelProto.SentencePiece.Builder
optional .sentencepiece.ModelProto.SentencePiece.Type type = 3 [default = NORMAL];
clearUnkId() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
///////////////////////////////////////////////////////////////// Reserved special meta tokens.
clearUnkPiece() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
optional string unk_piece = 45 [default = "<unk>"];
clearUnkSurface() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
Encodes <unk> into U+2047 (DOUBLE QUESTION MARK), since this character can be useful both for user and developer.
clearUseAllVocab() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
use all symbols for vocab extraction.
clearUserDefinedSymbols() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
Defines user defined symbols.
clearVocabSize() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
Vocabulary size.
clearVocabularyOutputPieceScore() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
When creating the vocabulary file, defines whether or not to additionally output the score for each piece.
clone() - Method in class sentencepiece.SentencepieceModel.ModelProto.Builder
 
clone() - Method in class sentencepiece.SentencepieceModel.ModelProto.SentencePiece.Builder
 
clone() - Method in class sentencepiece.SentencepieceModel.NormalizerSpec.Builder
 
clone() - Method in class sentencepiece.SentencepieceModel.SelfTestData.Builder
 
clone() - Method in class sentencepiece.SentencepieceModel.SelfTestData.Sample.Builder
 
clone() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
 
collapseUnknowns() - Method in class com.yahoo.language.sentencepiece.SentencePieceConfig
 
collapseUnknowns(boolean) - Method in class com.yahoo.language.sentencepiece.SentencePieceConfig.Builder
 
com.yahoo.language - package com.yahoo.language
 
com.yahoo.language.detect - package com.yahoo.language.detect
 
com.yahoo.language.process - package com.yahoo.language.process
 
com.yahoo.language.sentencepiece - package com.yahoo.language.sentencepiece
 
COMMON - com.yahoo.language.process.TokenScript
 
compareTo(SpecialTokens.Token) - Method in class com.yahoo.language.process.SpecialTokens.Token
 
CONFIG_DEF_MD5 - Static variable in class com.yahoo.language.sentencepiece.SentencePieceConfig
 
CONFIG_DEF_NAME - Static variable in class com.yahoo.language.sentencepiece.SentencePieceConfig
 
CONFIG_DEF_NAMESPACE - Static variable in class com.yahoo.language.sentencepiece.SentencePieceConfig
 
CONFIG_DEF_SCHEMA - Static variable in class com.yahoo.language.sentencepiece.SentencePieceConfig
 
CONFIG_DEF_VERSION - Static variable in class com.yahoo.language.sentencepiece.SentencePieceConfig
 
CONTROL - sentencepiece.SentencepieceModel.ModelProto.SentencePiece.Type
control symbols.
CONTROL_SYMBOLS_FIELD_NUMBER - Static variable in class sentencepiece.SentencepieceModel.TrainerSpec
 
CONTROL_VALUE - Static variable in enum sentencepiece.SentencepieceModel.ModelProto.SentencePiece.Type
control symbols.
COPTIC - com.yahoo.language.Language
Language tag "cop".
COPTIC - com.yahoo.language.process.TokenScript
 
CORSICAN - com.yahoo.language.Language
Language tag "co".
CROATIAN - com.yahoo.language.Language
Language tag "hr".
CYPRIOT - com.yahoo.language.process.TokenScript
 
CYRILLIC - com.yahoo.language.process.TokenScript
 
CZECH - com.yahoo.language.Language
Language tag "cs".

D

DANISH - com.yahoo.language.Language
Language tag "da".
DEFAULT - com.yahoo.language.process.StemMode
 
DENORMALIZER_SPEC_FIELD_NUMBER - Static variable in class sentencepiece.SentencepieceModel.ModelProto
 
DESERET - com.yahoo.language.process.TokenScript
 
detect(byte[], int, int, Hint) - Method in interface com.yahoo.language.detect.Detector
Detects language and encoding of the supplied byte array, possibly using a language/encoding hint.
detect(String, Hint) - Method in class com.yahoo.language.detect.AbstractDetector
 
detect(String, Hint) - Method in interface com.yahoo.language.detect.Detector
Detects language of the supplied String, possibly using a language hint.
detect(ByteBuffer, Hint) - Method in class com.yahoo.language.detect.AbstractDetector
 
detect(ByteBuffer, Hint) - Method in interface com.yahoo.language.detect.Detector
Detects language and encoding of the supplied ByteBuffer, possibly using a language/encoding hint.
Detection - Class in com.yahoo.language.detect
 
Detection(Language, String, boolean) - Constructor for class com.yahoo.language.detect.Detection
 
DetectionException - Exception in com.yahoo.language.detect
Exception that is thrown when detection fails.
DetectionException(String) - Constructor for exception com.yahoo.language.detect.DetectionException
 
Detector - Interface in com.yahoo.language.detect
Abstract superclass of all Detectors used for language and encoding detection.
DETECTOR - com.yahoo.language.Linguistics.Component
 
DEVANAGARI - com.yahoo.language.process.TokenScript
 
dispatchGetConfig(ConfigInstance.Producer) - Method in class com.yahoo.language.sentencepiece.SentencePieceConfig.Builder
 
DIVEHI - com.yahoo.language.Language
Language tag "div".
doSetValue(String) - Method in class com.yahoo.language.sentencepiece.SentencePieceConfig.Scoring
 
DUTCH - com.yahoo.language.Language
Language tag "nl".

E

empty() - Static method in class com.yahoo.language.process.SpecialTokens
 
encode(String, Language) - Method in interface com.yahoo.language.process.Encoder
Encodes text into tokens in a list of ids.
encode(String, Language) - Method in class com.yahoo.language.process.Encoder.FailingEncoder
 
encode(String, Language) - Method in class com.yahoo.language.sentencepiece.SentencePieceEncoder
Segments the given text into token segments using the SentencePiece algorithm and returns the segment ids.
encode(String, Language, TensorType) - Method in interface com.yahoo.language.process.Encoder
Encodes text into tokens in a tensor.
encode(String, Language, TensorType) - Method in class com.yahoo.language.process.Encoder.FailingEncoder
 
encode(String, Language, TensorType) - Method in class com.yahoo.language.sentencepiece.SentencePieceEncoder
Encodes directly to a tensor.
Encoder - Interface in com.yahoo.language.process
An encoder converts a text string to a tensor or list of tokens
Encoder.FailingEncoder - Class in com.yahoo.language.process
 
ENGLISH - com.yahoo.language.Language
Language tag "en".
EOS_ID_FIELD_NUMBER - Static variable in class sentencepiece.SentencepieceModel.TrainerSpec
 
EOS_PIECE_FIELD_NUMBER - Static variable in class sentencepiece.SentencepieceModel.TrainerSpec
 
equals(Linguistics) - Method in interface com.yahoo.language.Linguistics
Check if another instance is equivalent to this one
equals(Object) - Method in class com.yahoo.language.process.GramSplitter.Gram
 
equals(Object) - Method in class com.yahoo.language.process.SpecialTokens.Token
 
equals(Object) - Method in class sentencepiece.SentencepieceModel.ModelProto
 
equals(Object) - Method in class sentencepiece.SentencepieceModel.ModelProto.SentencePiece
 
equals(Object) - Method in class sentencepiece.SentencepieceModel.NormalizerSpec
 
equals(Object) - Method in class sentencepiece.SentencepieceModel.SelfTestData
 
equals(Object) - Method in class sentencepiece.SentencepieceModel.SelfTestData.Sample
 
equals(Object) - Method in class sentencepiece.SentencepieceModel.TrainerSpec
 
ESCAPE_WHITESPACES_FIELD_NUMBER - Static variable in class sentencepiece.SentencepieceModel.NormalizerSpec
 
ESPERANTO - com.yahoo.language.Language
Language tag "eo".
ESTONIAN - com.yahoo.language.Language
Language tag "et".
ETHIOPIC - com.yahoo.language.process.TokenScript
 
EXPECTED_FIELD_NUMBER - Static variable in class sentencepiece.SentencepieceModel.SelfTestData.Sample
 
extractFrom(GramSplitter.UnicodeString) - Method in class com.yahoo.language.process.GramSplitter.Gram
Returns this gram as a string from the input string
extractFrom(String) - Method in class com.yahoo.language.process.GramSplitter.Gram
Returns this gram as a string from the input string

F

FailingEncoder() - Constructor for class com.yahoo.language.process.Encoder.FailingEncoder
 
FAROESE - com.yahoo.language.Language
Language tag "fo".
fewestSegments - com.yahoo.language.sentencepiece.Scoring
Find the segmentation that has the fewest segments, resolve ties by score sum
fewestSegments - com.yahoo.language.sentencepiece.SentencePieceConfig.Scoring.Enum
 
fewestSegments - Static variable in class com.yahoo.language.sentencepiece.SentencePieceConfig.Scoring
 
FIJI - com.yahoo.language.Language
Language tag "fj".
FINNISH - com.yahoo.language.Language
Language tag "fi".
forNumber(int) - Static method in enum sentencepiece.SentencepieceModel.ModelProto.SentencePiece.Type
 
forNumber(int) - Static method in enum sentencepiece.SentencepieceModel.TrainerSpec.ModelType
 
FRENCH - com.yahoo.language.Language
Language tag "fr".
FRISIAN - com.yahoo.language.Language
Language tag "fy".
fromEncoding(String) - Static method in enum com.yahoo.language.Language
Returns the language from an encoding, or Language.UNKNOWN if it cannot be determined.
fromLanguageTag(String) - Static method in enum com.yahoo.language.Language
Convenience method for calling fromLocale(LocaleFactory.fromLanguageTag(languageTag)).
fromLanguageTag(String) - Static method in class com.yahoo.language.LocaleFactory
Implements a simple parser for RFC5646 language tags.
fromLocale(Locale) - Static method in enum com.yahoo.language.Language
Returns the Language whose Language.languageCode() is equal to locale.getLanguage(), with the following additions:

G

GALICIAN - com.yahoo.language.Language
Language tag "gl".
GEORGIAN - com.yahoo.language.Language
Language tag "ka".
GEORGIAN - com.yahoo.language.process.TokenScript
 
GERMAN - com.yahoo.language.Language
Language tag "de".
get(int) - Method in class com.yahoo.language.process.StemList
 
getAcceptLanguage(int) - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
List of the languages this model can accept.
getAcceptLanguage(int) - Method in class sentencepiece.SentencepieceModel.TrainerSpec
List of the languages this model can accept.
getAcceptLanguage(int) - Method in interface sentencepiece.SentencepieceModel.TrainerSpecOrBuilder
List of the languages this model can accept.
getAcceptLanguageBytes(int) - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
List of the languages this model can accept.
getAcceptLanguageBytes(int) - Method in class sentencepiece.SentencepieceModel.TrainerSpec
List of the languages this model can accept.
getAcceptLanguageBytes(int) - Method in interface sentencepiece.SentencepieceModel.TrainerSpecOrBuilder
List of the languages this model can accept.
getAcceptLanguageCount() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
List of the languages this model can accept.
getAcceptLanguageCount() - Method in class sentencepiece.SentencepieceModel.TrainerSpec
List of the languages this model can accept.
getAcceptLanguageCount() - Method in interface sentencepiece.SentencepieceModel.TrainerSpecOrBuilder
List of the languages this model can accept.
getAcceptLanguageList() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
List of the languages this model can accept.
getAcceptLanguageList() - Method in class sentencepiece.SentencepieceModel.TrainerSpec
List of the languages this model can accept.
getAcceptLanguageList() - Method in interface sentencepiece.SentencepieceModel.TrainerSpecOrBuilder
List of the languages this model can accept.
getAddDummyPrefix() - Method in class sentencepiece.SentencepieceModel.NormalizerSpec.Builder
Adds dummy whitespace at the beginning of text in order to treat "world" in "world" and "hello world" in the same way.
getAddDummyPrefix() - Method in class sentencepiece.SentencepieceModel.NormalizerSpec
Adds dummy whitespace at the beginning of text in order to treat "world" in "world" and "hello world" in the same way.
getAddDummyPrefix() - Method in interface sentencepiece.SentencepieceModel.NormalizerSpecOrBuilder
Adds dummy whitespace at the beginning of text in order to treat "world" in "world" and "hello world" in the same way.
getAllowWhitespaceOnlyPieces() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
Allows pieces that only contain whitespaces instead of appearing only as prefix or suffix of other pieces.
getAllowWhitespaceOnlyPieces() - Method in class sentencepiece.SentencepieceModel.TrainerSpec
Allows pieces that only contain whitespaces instead of appearing only as prefix or suffix of other pieces.
getAllowWhitespaceOnlyPieces() - Method in interface sentencepiece.SentencepieceModel.TrainerSpecOrBuilder
Allows pieces that only contain whitespaces instead of appearing only as prefix or suffix of other pieces.
getApplyOnRestart() - Method in class com.yahoo.language.sentencepiece.SentencePieceConfig.Builder
 
getBosId() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
<s>
getBosId() - Method in class sentencepiece.SentencepieceModel.TrainerSpec
<s>
getBosId() - Method in interface sentencepiece.SentencepieceModel.TrainerSpecOrBuilder
<s>
getBosPiece() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
optional string bos_piece = 46 [default = "<s>"];
getBosPiece() - Method in class sentencepiece.SentencepieceModel.TrainerSpec
optional string bos_piece = 46 [default = "<s>"];
getBosPiece() - Method in interface sentencepiece.SentencepieceModel.TrainerSpecOrBuilder
optional string bos_piece = 46 [default = "<s>"];
getBosPieceBytes() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
optional string bos_piece = 46 [default = "<s>"];
getBosPieceBytes() - Method in class sentencepiece.SentencepieceModel.TrainerSpec
optional string bos_piece = 46 [default = "<s>"];
getBosPieceBytes() - Method in interface sentencepiece.SentencepieceModel.TrainerSpecOrBuilder
optional string bos_piece = 46 [default = "<s>"];
getByteFallback() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
Decomposes unknown pieces into UTF-8 bytes.
getByteFallback() - Method in class sentencepiece.SentencepieceModel.TrainerSpec
Decomposes unknown pieces into UTF-8 bytes.
getByteFallback() - Method in interface sentencepiece.SentencepieceModel.TrainerSpecOrBuilder
Decomposes unknown pieces into UTF-8 bytes.
getCharacterClasses() - Method in interface com.yahoo.language.Linguistics
Returns a thread-unsafe character classes instance.
getCharacterCoverage() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
///////////////////////////////////////////////////////////////// Training parameters.
getCharacterCoverage() - Method in class sentencepiece.SentencepieceModel.TrainerSpec
///////////////////////////////////////////////////////////////// Training parameters.
getCharacterCoverage() - Method in interface sentencepiece.SentencepieceModel.TrainerSpecOrBuilder
///////////////////////////////////////////////////////////////// Training parameters.
getCodePointCount() - Method in class com.yahoo.language.process.GramSplitter.Gram
 
getCollapseUnknowns() - Method in class com.yahoo.language.sentencepiece.SentencePieceEncoder.Builder
 
getComponent(int) - Method in interface com.yahoo.language.process.Token
Returns a component token of this
getConfig(SentencePieceConfig.Builder) - Method in interface com.yahoo.language.sentencepiece.SentencePieceConfig.Producer
 
getControlSymbols(int) - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
///////////////////////////////////////////////////////////////// Vocabulary management Defines control symbols used as an indicator to change the behavior of the decoder.
getControlSymbols(int) - Method in class sentencepiece.SentencepieceModel.TrainerSpec
///////////////////////////////////////////////////////////////// Vocabulary management Defines control symbols used as an indicator to change the behavior of the decoder.
getControlSymbols(int) - Method in interface sentencepiece.SentencepieceModel.TrainerSpecOrBuilder
///////////////////////////////////////////////////////////////// Vocabulary management Defines control symbols used as an indicator to change the behavior of the decoder.
getControlSymbolsBytes(int) - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
///////////////////////////////////////////////////////////////// Vocabulary management Defines control symbols used as an indicator to change the behavior of the decoder.
getControlSymbolsBytes(int) - Method in class sentencepiece.SentencepieceModel.TrainerSpec
///////////////////////////////////////////////////////////////// Vocabulary management Defines control symbols used as an indicator to change the behavior of the decoder.
getControlSymbolsBytes(int) - Method in interface sentencepiece.SentencepieceModel.TrainerSpecOrBuilder
///////////////////////////////////////////////////////////////// Vocabulary management Defines control symbols used as an indicator to change the behavior of the decoder.
getControlSymbolsCount() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
///////////////////////////////////////////////////////////////// Vocabulary management Defines control symbols used as an indicator to change the behavior of the decoder.
getControlSymbolsCount() - Method in class sentencepiece.SentencepieceModel.TrainerSpec
///////////////////////////////////////////////////////////////// Vocabulary management Defines control symbols used as an indicator to change the behavior of the decoder.
getControlSymbolsCount() - Method in interface sentencepiece.SentencepieceModel.TrainerSpecOrBuilder
///////////////////////////////////////////////////////////////// Vocabulary management Defines control symbols used as an indicator to change the behavior of the decoder.
getControlSymbolsList() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
///////////////////////////////////////////////////////////////// Vocabulary management Defines control symbols used as an indicator to change the behavior of the decoder.
getControlSymbolsList() - Method in class sentencepiece.SentencepieceModel.TrainerSpec
///////////////////////////////////////////////////////////////// Vocabulary management Defines control symbols used as an indicator to change the behavior of the decoder.
getControlSymbolsList() - Method in interface sentencepiece.SentencepieceModel.TrainerSpecOrBuilder
///////////////////////////////////////////////////////////////// Vocabulary management Defines control symbols used as an indicator to change the behavior of the decoder.
getCountry() - Method in class com.yahoo.language.detect.Hint
 
getDefaultInstance() - Static method in class sentencepiece.SentencepieceModel.ModelProto
 
getDefaultInstance() - Static method in class sentencepiece.SentencepieceModel.ModelProto.SentencePiece
 
getDefaultInstance() - Static method in class sentencepiece.SentencepieceModel.NormalizerSpec
 
getDefaultInstance() - Static method in class sentencepiece.SentencepieceModel.SelfTestData
 
getDefaultInstance() - Static method in class sentencepiece.SentencepieceModel.SelfTestData.Sample
 
getDefaultInstance() - Static method in class sentencepiece.SentencepieceModel.TrainerSpec
 
getDefaultInstanceForType() - Method in class sentencepiece.SentencepieceModel.ModelProto.Builder
 
getDefaultInstanceForType() - Method in class sentencepiece.SentencepieceModel.ModelProto
 
getDefaultInstanceForType() - Method in class sentencepiece.SentencepieceModel.ModelProto.SentencePiece.Builder
 
getDefaultInstanceForType() - Method in class sentencepiece.SentencepieceModel.ModelProto.SentencePiece
 
getDefaultInstanceForType() - Method in class sentencepiece.SentencepieceModel.NormalizerSpec.Builder
 
getDefaultInstanceForType() - Method in class sentencepiece.SentencepieceModel.NormalizerSpec
 
getDefaultInstanceForType() - Method in class sentencepiece.SentencepieceModel.SelfTestData.Builder
 
getDefaultInstanceForType() - Method in class sentencepiece.SentencepieceModel.SelfTestData
 
getDefaultInstanceForType() - Method in class sentencepiece.SentencepieceModel.SelfTestData.Sample.Builder
 
getDefaultInstanceForType() - Method in class sentencepiece.SentencepieceModel.SelfTestData.Sample
 
getDefaultInstanceForType() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
 
getDefaultInstanceForType() - Method in class sentencepiece.SentencepieceModel.TrainerSpec
 
getDefMd5() - Method in class com.yahoo.language.sentencepiece.SentencePieceConfig.Builder
 
getDefMd5() - Static method in class com.yahoo.language.sentencepiece.SentencePieceConfig
 
getDefName() - Method in class com.yahoo.language.sentencepiece.SentencePieceConfig.Builder
 
getDefName() - Static method in class com.yahoo.language.sentencepiece.SentencePieceConfig
 
getDefNamespace() - Method in class com.yahoo.language.sentencepiece.SentencePieceConfig.Builder
 
getDefNamespace() - Static method in class com.yahoo.language.sentencepiece.SentencePieceConfig
 
getDefVersion() - Static method in class com.yahoo.language.sentencepiece.SentencePieceConfig
 
getDenormalizerSpec() - Method in class sentencepiece.SentencepieceModel.ModelProto.Builder
Spec for text de-normalization.
getDenormalizerSpec() - Method in class sentencepiece.SentencepieceModel.ModelProto
Spec for text de-normalization.
getDenormalizerSpec() - Method in interface sentencepiece.SentencepieceModel.ModelProtoOrBuilder
Spec for text de-normalization.
getDenormalizerSpecBuilder() - Method in class sentencepiece.SentencepieceModel.ModelProto.Builder
Spec for text de-normalization.
getDenormalizerSpecOrBuilder() - Method in class sentencepiece.SentencepieceModel.ModelProto.Builder
Spec for text de-normalization.
getDenormalizerSpecOrBuilder() - Method in class sentencepiece.SentencepieceModel.ModelProto
Spec for text de-normalization.
getDenormalizerSpecOrBuilder() - Method in interface sentencepiece.SentencepieceModel.ModelProtoOrBuilder
Spec for text de-normalization.
getDescriptor() - Static method in class sentencepiece.SentencepieceModel
 
getDescriptor() - Static method in class sentencepiece.SentencepieceModel.ModelProto.Builder
 
getDescriptor() - Static method in class sentencepiece.SentencepieceModel.ModelProto
 
getDescriptor() - Static method in class sentencepiece.SentencepieceModel.ModelProto.SentencePiece.Builder
 
getDescriptor() - Static method in class sentencepiece.SentencepieceModel.ModelProto.SentencePiece
 
getDescriptor() - Static method in enum sentencepiece.SentencepieceModel.ModelProto.SentencePiece.Type
 
getDescriptor() - Static method in class sentencepiece.SentencepieceModel.NormalizerSpec.Builder
 
getDescriptor() - Static method in class sentencepiece.SentencepieceModel.NormalizerSpec
 
getDescriptor() - Static method in class sentencepiece.SentencepieceModel.SelfTestData.Builder
 
getDescriptor() - Static method in class sentencepiece.SentencepieceModel.SelfTestData
 
getDescriptor() - Static method in class sentencepiece.SentencepieceModel.SelfTestData.Sample.Builder
 
getDescriptor() - Static method in class sentencepiece.SentencepieceModel.SelfTestData.Sample
 
getDescriptor() - Static method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
 
getDescriptor() - Static method in class sentencepiece.SentencepieceModel.TrainerSpec
 
getDescriptor() - Static method in enum sentencepiece.SentencepieceModel.TrainerSpec.ModelType
 
getDescriptorForType() - Method in class sentencepiece.SentencepieceModel.ModelProto.Builder
 
getDescriptorForType() - Method in class sentencepiece.SentencepieceModel.ModelProto.SentencePiece.Builder
 
getDescriptorForType() - Method in enum sentencepiece.SentencepieceModel.ModelProto.SentencePiece.Type
 
getDescriptorForType() - Method in class sentencepiece.SentencepieceModel.NormalizerSpec.Builder
 
getDescriptorForType() - Method in class sentencepiece.SentencepieceModel.SelfTestData.Builder
 
getDescriptorForType() - Method in class sentencepiece.SentencepieceModel.SelfTestData.Sample.Builder
 
getDescriptorForType() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
 
getDescriptorForType() - Method in enum sentencepiece.SentencepieceModel.TrainerSpec.ModelType
 
getDetector() - Method in interface com.yahoo.language.Linguistics
Returns a thread-unsafe detector.
getEncoding() - Method in class com.yahoo.language.detect.Detection
 
getEncodingName() - Method in class com.yahoo.language.detect.Detection
 
getEosId() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
</s>
getEosId() - Method in class sentencepiece.SentencepieceModel.TrainerSpec
</s>
getEosId() - Method in interface sentencepiece.SentencepieceModel.TrainerSpecOrBuilder
</s>
getEosPiece() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
optional string eos_piece = 47 [default = "</s>"];
getEosPiece() - Method in class sentencepiece.SentencepieceModel.TrainerSpec
optional string eos_piece = 47 [default = "</s>"];
getEosPiece() - Method in interface sentencepiece.SentencepieceModel.TrainerSpecOrBuilder
optional string eos_piece = 47 [default = "</s>"];
getEosPieceBytes() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
optional string eos_piece = 47 [default = "</s>"];
getEosPieceBytes() - Method in class sentencepiece.SentencepieceModel.TrainerSpec
optional string eos_piece = 47 [default = "</s>"];
getEosPieceBytes() - Method in interface sentencepiece.SentencepieceModel.TrainerSpecOrBuilder
optional string eos_piece = 47 [default = "</s>"];
getEscapeWhitespaces() - Method in class sentencepiece.SentencepieceModel.NormalizerSpec.Builder
Replaces whitespace with meta symbol.
getEscapeWhitespaces() - Method in class sentencepiece.SentencepieceModel.NormalizerSpec
Replaces whitespace with meta symbol.
getEscapeWhitespaces() - Method in interface sentencepiece.SentencepieceModel.NormalizerSpecOrBuilder
Replaces whitespace with meta symbol.
getExpected() - Method in class sentencepiece.SentencepieceModel.SelfTestData.Sample.Builder
optional string expected = 2;
getExpected() - Method in class sentencepiece.SentencepieceModel.SelfTestData.Sample
optional string expected = 2;
getExpected() - Method in interface sentencepiece.SentencepieceModel.SelfTestData.SampleOrBuilder
optional string expected = 2;
getExpectedBytes() - Method in class sentencepiece.SentencepieceModel.SelfTestData.Sample.Builder
optional string expected = 2;
getExpectedBytes() - Method in class sentencepiece.SentencepieceModel.SelfTestData.Sample
optional string expected = 2;
getExpectedBytes() - Method in interface sentencepiece.SentencepieceModel.SelfTestData.SampleOrBuilder
optional string expected = 2;
getGramSplitter() - Method in interface com.yahoo.language.Linguistics
Returns a thread-unsafe gram splitter.
getHardVocabLimit() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
`vocab_size` is treated as hard limit.
getHardVocabLimit() - Method in class sentencepiece.SentencepieceModel.TrainerSpec
`vocab_size` is treated as hard limit.
getHardVocabLimit() - Method in interface sentencepiece.SentencepieceModel.TrainerSpecOrBuilder
`vocab_size` is treated as hard limit.
getInput() - Method in class sentencepiece.SentencepieceModel.SelfTestData.Sample.Builder
optional string input = 1;
getInput() - Method in class sentencepiece.SentencepieceModel.SelfTestData.Sample
optional string input = 1;
getInput() - Method in interface sentencepiece.SentencepieceModel.SelfTestData.SampleOrBuilder
optional string input = 1;
getInput(int) - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
///////////////////////////////////////////////////////////////// General parameters Input corpus files.
getInput(int) - Method in class sentencepiece.SentencepieceModel.TrainerSpec
///////////////////////////////////////////////////////////////// General parameters Input corpus files.
getInput(int) - Method in interface sentencepiece.SentencepieceModel.TrainerSpecOrBuilder
///////////////////////////////////////////////////////////////// General parameters Input corpus files.
getInputBytes() - Method in class sentencepiece.SentencepieceModel.SelfTestData.Sample.Builder
optional string input = 1;
getInputBytes() - Method in class sentencepiece.SentencepieceModel.SelfTestData.Sample
optional string input = 1;
getInputBytes() - Method in interface sentencepiece.SentencepieceModel.SelfTestData.SampleOrBuilder
optional string input = 1;
getInputBytes(int) - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
///////////////////////////////////////////////////////////////// General parameters Input corpus files.
getInputBytes(int) - Method in class sentencepiece.SentencepieceModel.TrainerSpec
///////////////////////////////////////////////////////////////// General parameters Input corpus files.
getInputBytes(int) - Method in interface sentencepiece.SentencepieceModel.TrainerSpecOrBuilder
///////////////////////////////////////////////////////////////// General parameters Input corpus files.
getInputCount() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
///////////////////////////////////////////////////////////////// General parameters Input corpus files.
getInputCount() - Method in class sentencepiece.SentencepieceModel.TrainerSpec
///////////////////////////////////////////////////////////////// General parameters Input corpus files.
getInputCount() - Method in interface sentencepiece.SentencepieceModel.TrainerSpecOrBuilder
///////////////////////////////////////////////////////////////// General parameters Input corpus files.
getInputFormat() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
Input corpus format: "text": one-sentence-per-line text format (default) "tsv": sentence <tab> freq
getInputFormat() - Method in class sentencepiece.SentencepieceModel.TrainerSpec
Input corpus format: "text": one-sentence-per-line text format (default) "tsv": sentence <tab> freq
getInputFormat() - Method in interface sentencepiece.SentencepieceModel.TrainerSpecOrBuilder
Input corpus format: "text": one-sentence-per-line text format (default) "tsv": sentence <tab> freq
getInputFormatBytes() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
Input corpus format: "text": one-sentence-per-line text format (default) "tsv": sentence <tab> freq
getInputFormatBytes() - Method in class sentencepiece.SentencepieceModel.TrainerSpec
Input corpus format: "text": one-sentence-per-line text format (default) "tsv": sentence <tab> freq
getInputFormatBytes() - Method in interface sentencepiece.SentencepieceModel.TrainerSpecOrBuilder
Input corpus format: "text": one-sentence-per-line text format (default) "tsv": sentence <tab> freq
getInputList() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
///////////////////////////////////////////////////////////////// General parameters Input corpus files.
getInputList() - Method in class sentencepiece.SentencepieceModel.TrainerSpec
///////////////////////////////////////////////////////////////// General parameters Input corpus files.
getInputList() - Method in interface sentencepiece.SentencepieceModel.TrainerSpecOrBuilder
///////////////////////////////////////////////////////////////// General parameters Input corpus files.
getInputSentenceSize() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
Maximum size of sentences the trainer loads from `input` parameter.
getInputSentenceSize() - Method in class sentencepiece.SentencepieceModel.TrainerSpec
Maximum size of sentences the trainer loads from `input` parameter.
getInputSentenceSize() - Method in interface sentencepiece.SentencepieceModel.TrainerSpecOrBuilder
Maximum size of sentences the trainer loads from `input` parameter.
getLanguage() - Method in class com.yahoo.language.detect.Detection
 
getMarket() - Method in class com.yahoo.language.detect.Hint
 
getMaxSentenceLength() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
The maximum sentence length in byte.
getMaxSentenceLength() - Method in class sentencepiece.SentencepieceModel.TrainerSpec
The maximum sentence length in byte.
getMaxSentenceLength() - Method in interface sentencepiece.SentencepieceModel.TrainerSpecOrBuilder
The maximum sentence length in byte.
getMaxSentencepieceLength() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
///////////////////////////////////////////////////////////////// SentencePiece parameters which control the shapes of sentence piece.
getMaxSentencepieceLength() - Method in class sentencepiece.SentencepieceModel.TrainerSpec
///////////////////////////////////////////////////////////////// SentencePiece parameters which control the shapes of sentence piece.
getMaxSentencepieceLength() - Method in interface sentencepiece.SentencepieceModel.TrainerSpecOrBuilder
///////////////////////////////////////////////////////////////// SentencePiece parameters which control the shapes of sentence piece.
getMiningSentenceSize() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
Deprecated.
getMiningSentenceSize() - Method in class sentencepiece.SentencepieceModel.TrainerSpec
Deprecated.
getMiningSentenceSize() - Method in interface sentencepiece.SentencepieceModel.TrainerSpecOrBuilder
Deprecated.
getModelPrefix() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
Output model file prefix.
getModelPrefix() - Method in class sentencepiece.SentencepieceModel.TrainerSpec
Output model file prefix.
getModelPrefix() - Method in interface sentencepiece.SentencepieceModel.TrainerSpecOrBuilder
Output model file prefix.
getModelPrefixBytes() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
Output model file prefix.
getModelPrefixBytes() - Method in class sentencepiece.SentencepieceModel.TrainerSpec
Output model file prefix.
getModelPrefixBytes() - Method in interface sentencepiece.SentencepieceModel.TrainerSpecOrBuilder
Output model file prefix.
getModels() - Method in class com.yahoo.language.sentencepiece.SentencePieceEncoder.Builder
 
getModelType() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
optional .sentencepiece.TrainerSpec.ModelType model_type = 3 [default = UNIGRAM];
getModelType() - Method in class sentencepiece.SentencepieceModel.TrainerSpec
optional .sentencepiece.TrainerSpec.ModelType model_type = 3 [default = UNIGRAM];
getModelType() - Method in interface sentencepiece.SentencepieceModel.TrainerSpecOrBuilder
optional .sentencepiece.TrainerSpec.ModelType model_type = 3 [default = UNIGRAM];
getName() - Method in class sentencepiece.SentencepieceModel.NormalizerSpec.Builder
name of normalization rule.
getName() - Method in class sentencepiece.SentencepieceModel.NormalizerSpec
name of normalization rule.
getName() - Method in interface sentencepiece.SentencepieceModel.NormalizerSpecOrBuilder
name of normalization rule.
getNameBytes() - Method in class sentencepiece.SentencepieceModel.NormalizerSpec.Builder
name of normalization rule.
getNameBytes() - Method in class sentencepiece.SentencepieceModel.NormalizerSpec
name of normalization rule.
getNameBytes() - Method in interface sentencepiece.SentencepieceModel.NormalizerSpecOrBuilder
name of normalization rule.
getNormalizationRuleTsv() - Method in class sentencepiece.SentencepieceModel.NormalizerSpec.Builder
Custom normalization rule file in TSV format.
getNormalizationRuleTsv() - Method in class sentencepiece.SentencepieceModel.NormalizerSpec
Custom normalization rule file in TSV format.
getNormalizationRuleTsv() - Method in interface sentencepiece.SentencepieceModel.NormalizerSpecOrBuilder
Custom normalization rule file in TSV format.
getNormalizationRuleTsvBytes() - Method in class sentencepiece.SentencepieceModel.NormalizerSpec.Builder
Custom normalization rule file in TSV format.
getNormalizationRuleTsvBytes() - Method in class sentencepiece.SentencepieceModel.NormalizerSpec
Custom normalization rule file in TSV format.
getNormalizationRuleTsvBytes() - Method in interface sentencepiece.SentencepieceModel.NormalizerSpecOrBuilder
Custom normalization rule file in TSV format.
getNormalizer() - Method in interface com.yahoo.language.Linguistics
Returns a thread-unsafe normalizer.
getNormalizerSpec() - Method in class sentencepiece.SentencepieceModel.ModelProto.Builder
Spec for text normalization.
getNormalizerSpec() - Method in class sentencepiece.SentencepieceModel.ModelProto
Spec for text normalization.
getNormalizerSpec() - Method in interface sentencepiece.SentencepieceModel.ModelProtoOrBuilder
Spec for text normalization.
getNormalizerSpecBuilder() - Method in class sentencepiece.SentencepieceModel.ModelProto.Builder
Spec for text normalization.
getNormalizerSpecOrBuilder() - Method in class sentencepiece.SentencepieceModel.ModelProto.Builder
Spec for text normalization.
getNormalizerSpecOrBuilder() - Method in class sentencepiece.SentencepieceModel.ModelProto
Spec for text normalization.
getNormalizerSpecOrBuilder() - Method in interface sentencepiece.SentencepieceModel.ModelProtoOrBuilder
Spec for text normalization.
getNumber() - Method in enum sentencepiece.SentencepieceModel.ModelProto.SentencePiece.Type
 
getNumber() - Method in enum sentencepiece.SentencepieceModel.TrainerSpec.ModelType
 
getNumComponents() - Method in interface com.yahoo.language.process.Token
Returns the number of components, if this token is a compound word (e.g.
getNumStems() - Method in interface com.yahoo.language.process.Token
Returns the number of stem forms available for this token.
getNumSubIterations() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
Number of EM sub iterations.
getNumSubIterations() - Method in class sentencepiece.SentencepieceModel.TrainerSpec
Number of EM sub iterations.
getNumSubIterations() - Method in interface sentencepiece.SentencepieceModel.TrainerSpecOrBuilder
Number of EM sub iterations.
getNumThreads() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
Number of threads in the training.
getNumThreads() - Method in class sentencepiece.SentencepieceModel.TrainerSpec
Number of threads in the training.
getNumThreads() - Method in interface sentencepiece.SentencepieceModel.TrainerSpecOrBuilder
Number of threads in the training.
getOffset() - Method in interface com.yahoo.language.process.Token
Returns the offset position of this token
getOrig() - Method in interface com.yahoo.language.process.Token
Returns the original form of this token
getPadId() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
<pad> (padding)
getPadId() - Method in class sentencepiece.SentencepieceModel.TrainerSpec
<pad> (padding)
getPadId() - Method in interface sentencepiece.SentencepieceModel.TrainerSpecOrBuilder
<pad> (padding)
getPadPiece() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
optional string pad_piece = 48 [default = "<pad>"];
getPadPiece() - Method in class sentencepiece.SentencepieceModel.TrainerSpec
optional string pad_piece = 48 [default = "<pad>"];
getPadPiece() - Method in interface sentencepiece.SentencepieceModel.TrainerSpecOrBuilder
optional string pad_piece = 48 [default = "<pad>"];
getPadPieceBytes() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
optional string pad_piece = 48 [default = "<pad>"];
getPadPieceBytes() - Method in class sentencepiece.SentencepieceModel.TrainerSpec
optional string pad_piece = 48 [default = "<pad>"];
getPadPieceBytes() - Method in interface sentencepiece.SentencepieceModel.TrainerSpecOrBuilder
optional string pad_piece = 48 [default = "<pad>"];
getParserForType() - Method in class sentencepiece.SentencepieceModel.ModelProto
 
getParserForType() - Method in class sentencepiece.SentencepieceModel.ModelProto.SentencePiece
 
getParserForType() - Method in class sentencepiece.SentencepieceModel.NormalizerSpec
 
getParserForType() - Method in class sentencepiece.SentencepieceModel.SelfTestData
 
getParserForType() - Method in class sentencepiece.SentencepieceModel.SelfTestData.Sample
 
getParserForType() - Method in class sentencepiece.SentencepieceModel.TrainerSpec
 
getPiece() - Method in class sentencepiece.SentencepieceModel.ModelProto.SentencePiece.Builder
piece must not be empty.
getPiece() - Method in class sentencepiece.SentencepieceModel.ModelProto.SentencePiece
piece must not be empty.
getPiece() - Method in interface sentencepiece.SentencepieceModel.ModelProto.SentencePieceOrBuilder
piece must not be empty.
getPieceBytes() - Method in class sentencepiece.SentencepieceModel.ModelProto.SentencePiece.Builder
piece must not be empty.
getPieceBytes() - Method in class sentencepiece.SentencepieceModel.ModelProto.SentencePiece
piece must not be empty.
getPieceBytes() - Method in interface sentencepiece.SentencepieceModel.ModelProto.SentencePieceOrBuilder
piece must not be empty.
getPieces(int) - Method in class sentencepiece.SentencepieceModel.ModelProto.Builder
Sentence pieces with scores.
getPieces(int) - Method in class sentencepiece.SentencepieceModel.ModelProto
Sentence pieces with scores.
getPieces(int) - Method in interface sentencepiece.SentencepieceModel.ModelProtoOrBuilder
Sentence pieces with scores.
getPiecesBuilder(int) - Method in class sentencepiece.SentencepieceModel.ModelProto.Builder
Sentence pieces with scores.
getPiecesBuilderList() - Method in class sentencepiece.SentencepieceModel.ModelProto.Builder
Sentence pieces with scores.
getPiecesCount() - Method in class sentencepiece.SentencepieceModel.ModelProto.Builder
Sentence pieces with scores.
getPiecesCount() - Method in class sentencepiece.SentencepieceModel.ModelProto
Sentence pieces with scores.
getPiecesCount() - Method in interface sentencepiece.SentencepieceModel.ModelProtoOrBuilder
Sentence pieces with scores.
getPiecesList() - Method in class sentencepiece.SentencepieceModel.ModelProto.Builder
Sentence pieces with scores.
getPiecesList() - Method in class sentencepiece.SentencepieceModel.ModelProto
Sentence pieces with scores.
getPiecesList() - Method in interface sentencepiece.SentencepieceModel.ModelProtoOrBuilder
Sentence pieces with scores.
getPiecesOrBuilder(int) - Method in class sentencepiece.SentencepieceModel.ModelProto.Builder
Sentence pieces with scores.
getPiecesOrBuilder(int) - Method in class sentencepiece.SentencepieceModel.ModelProto
Sentence pieces with scores.
getPiecesOrBuilder(int) - Method in interface sentencepiece.SentencepieceModel.ModelProtoOrBuilder
Sentence pieces with scores.
getPiecesOrBuilderList() - Method in class sentencepiece.SentencepieceModel.ModelProto.Builder
Sentence pieces with scores.
getPiecesOrBuilderList() - Method in class sentencepiece.SentencepieceModel.ModelProto
Sentence pieces with scores.
getPiecesOrBuilderList() - Method in interface sentencepiece.SentencepieceModel.ModelProtoOrBuilder
Sentence pieces with scores.
getPrecompiledCharsmap() - Method in class sentencepiece.SentencepieceModel.NormalizerSpec.Builder
Pre-compiled normalization rule created by Builder::GetPrecompiledCharsMap() or Builder::CompileCharsMap() method.
getPrecompiledCharsmap() - Method in class sentencepiece.SentencepieceModel.NormalizerSpec
Pre-compiled normalization rule created by Builder::GetPrecompiledCharsMap() or Builder::CompileCharsMap() method.
getPrecompiledCharsmap() - Method in interface sentencepiece.SentencepieceModel.NormalizerSpecOrBuilder
Pre-compiled normalization rule created by Builder::GetPrecompiledCharsMap() or Builder::CompileCharsMap() method.
getRemoveExtraWhitespaces() - Method in class sentencepiece.SentencepieceModel.NormalizerSpec.Builder
Removes leading, trailing, and duplicate internal whitespace.
getRemoveExtraWhitespaces() - Method in class sentencepiece.SentencepieceModel.NormalizerSpec
Removes leading, trailing, and duplicate internal whitespace.
getRemoveExtraWhitespaces() - Method in interface sentencepiece.SentencepieceModel.NormalizerSpecOrBuilder
Removes leading, trailing, and duplicate internal whitespace.
getReplacementTerm(String) - Method in interface com.yahoo.language.process.Tokenizer
Deprecated.
replacements are already applied in tokens returned by tokenize
getRequiredChars() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
Defines required characters.
getRequiredChars() - Method in class sentencepiece.SentencepieceModel.TrainerSpec
Defines required characters.
getRequiredChars() - Method in interface sentencepiece.SentencepieceModel.TrainerSpecOrBuilder
Defines required characters.
getRequiredCharsBytes() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
Defines required characters.
getRequiredCharsBytes() - Method in class sentencepiece.SentencepieceModel.TrainerSpec
Defines required characters.
getRequiredCharsBytes() - Method in interface sentencepiece.SentencepieceModel.TrainerSpecOrBuilder
Defines required characters.
getSamples(int) - Method in class sentencepiece.SentencepieceModel.SelfTestData.Builder
repeated .sentencepiece.SelfTestData.Sample samples = 1;
getSamples(int) - Method in class sentencepiece.SentencepieceModel.SelfTestData
repeated .sentencepiece.SelfTestData.Sample samples = 1;
getSamples(int) - Method in interface sentencepiece.SentencepieceModel.SelfTestDataOrBuilder
repeated .sentencepiece.SelfTestData.Sample samples = 1;
getSamplesBuilder(int) - Method in class sentencepiece.SentencepieceModel.SelfTestData.Builder
repeated .sentencepiece.SelfTestData.Sample samples = 1;
getSamplesBuilderList() - Method in class sentencepiece.SentencepieceModel.SelfTestData.Builder
repeated .sentencepiece.SelfTestData.Sample samples = 1;
getSamplesCount() - Method in class sentencepiece.SentencepieceModel.SelfTestData.Builder
repeated .sentencepiece.SelfTestData.Sample samples = 1;
getSamplesCount() - Method in class sentencepiece.SentencepieceModel.SelfTestData
repeated .sentencepiece.SelfTestData.Sample samples = 1;
getSamplesCount() - Method in interface sentencepiece.SentencepieceModel.SelfTestDataOrBuilder
repeated .sentencepiece.SelfTestData.Sample samples = 1;
getSamplesList() - Method in class sentencepiece.SentencepieceModel.SelfTestData.Builder
repeated .sentencepiece.SelfTestData.Sample samples = 1;
getSamplesList() - Method in class sentencepiece.SentencepieceModel.SelfTestData
repeated .sentencepiece.SelfTestData.Sample samples = 1;
getSamplesList() - Method in interface sentencepiece.SentencepieceModel.SelfTestDataOrBuilder
repeated .sentencepiece.SelfTestData.Sample samples = 1;
getSamplesOrBuilder(int) - Method in class sentencepiece.SentencepieceModel.SelfTestData.Builder
repeated .sentencepiece.SelfTestData.Sample samples = 1;
getSamplesOrBuilder(int) - Method in class sentencepiece.SentencepieceModel.SelfTestData
repeated .sentencepiece.SelfTestData.Sample samples = 1;
getSamplesOrBuilder(int) - Method in interface sentencepiece.SentencepieceModel.SelfTestDataOrBuilder
repeated .sentencepiece.SelfTestData.Sample samples = 1;
getSamplesOrBuilderList() - Method in class sentencepiece.SentencepieceModel.SelfTestData.Builder
repeated .sentencepiece.SelfTestData.Sample samples = 1;
getSamplesOrBuilderList() - Method in class sentencepiece.SentencepieceModel.SelfTestData
repeated .sentencepiece.SelfTestData.Sample samples = 1;
getSamplesOrBuilderList() - Method in interface sentencepiece.SentencepieceModel.SelfTestDataOrBuilder
repeated .sentencepiece.SelfTestData.Sample samples = 1;
getScore() - Method in class sentencepiece.SentencepieceModel.ModelProto.SentencePiece.Builder
optional float score = 2;
getScore() - Method in class sentencepiece.SentencepieceModel.ModelProto.SentencePiece
optional float score = 2;
getScore() - Method in interface sentencepiece.SentencepieceModel.ModelProto.SentencePieceOrBuilder
optional float score = 2;
getScoring() - Method in class com.yahoo.language.sentencepiece.SentencePieceEncoder.Builder
 
getScript() - Method in interface com.yahoo.language.process.Token
Returns the script of this token
getSeedSentencepieceSize() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
The size of seed sentencepieces.
getSeedSentencepieceSize() - Method in class sentencepiece.SentencepieceModel.TrainerSpec
The size of seed sentencepieces.
getSeedSentencepieceSize() - Method in interface sentencepiece.SentencepieceModel.TrainerSpecOrBuilder
The size of seed sentencepieces.
getSegmenter() - Method in interface com.yahoo.language.Linguistics
Returns a thread-unsafe segmenter.
getSelfTestData() - Method in class sentencepiece.SentencepieceModel.ModelProto.Builder
Stores sample input and its expected segmentation to verify the model.
getSelfTestData() - Method in class sentencepiece.SentencepieceModel.ModelProto
Stores sample input and its expected segmentation to verify the model.
getSelfTestData() - Method in interface sentencepiece.SentencepieceModel.ModelProtoOrBuilder
Stores sample input and its expected segmentation to verify the model.
getSelfTestDataBuilder() - Method in class sentencepiece.SentencepieceModel.ModelProto.Builder
Stores sample input and its expected segmentation to verify the model.
getSelfTestDataOrBuilder() - Method in class sentencepiece.SentencepieceModel.ModelProto.Builder
Stores sample input and its expected segmentation to verify the model.
getSelfTestDataOrBuilder() - Method in class sentencepiece.SentencepieceModel.ModelProto
Stores sample input and its expected segmentation to verify the model.
getSelfTestDataOrBuilder() - Method in interface sentencepiece.SentencepieceModel.ModelProtoOrBuilder
Stores sample input and its expected segmentation to verify the model.
getSelfTestSampleSize() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
Size of self-test samples, which are encoded in the model file.
getSelfTestSampleSize() - Method in class sentencepiece.SentencepieceModel.TrainerSpec
Size of self-test samples, which are encoded in the model file.
getSelfTestSampleSize() - Method in interface sentencepiece.SentencepieceModel.TrainerSpecOrBuilder
Size of self-test samples, which are encoded in the model file.
getSerializedSize() - Method in class sentencepiece.SentencepieceModel.ModelProto
 
getSerializedSize() - Method in class sentencepiece.SentencepieceModel.ModelProto.SentencePiece
 
getSerializedSize() - Method in class sentencepiece.SentencepieceModel.NormalizerSpec
 
getSerializedSize() - Method in class sentencepiece.SentencepieceModel.SelfTestData
 
getSerializedSize() - Method in class sentencepiece.SentencepieceModel.SelfTestData.Sample
 
getSerializedSize() - Method in class sentencepiece.SentencepieceModel.TrainerSpec
 
getShrinkingFactor() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
In every EM sub-iterations, keeps top `shrinking_factor` * `current sentencepieces size` with respect to the loss of the sentence piece.
getShrinkingFactor() - Method in class sentencepiece.SentencepieceModel.TrainerSpec
In every EM sub-iterations, keeps top `shrinking_factor` * `current sentencepieces size` with respect to the loss of the sentence piece.
getShrinkingFactor() - Method in interface sentencepiece.SentencepieceModel.TrainerSpecOrBuilder
In every EM sub-iterations, keeps top `shrinking_factor` * `current sentencepieces size` with respect to the loss of the sentence piece.
getShuffleInputSentence() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
optional bool shuffle_input_sentence = 19 [default = true];
getShuffleInputSentence() - Method in class sentencepiece.SentencepieceModel.TrainerSpec
optional bool shuffle_input_sentence = 19 [default = true];
getShuffleInputSentence() - Method in interface sentencepiece.SentencepieceModel.TrainerSpecOrBuilder
optional bool shuffle_input_sentence = 19 [default = true];
getSpecialTokens(String) - Method in class com.yahoo.language.process.SpecialTokenRegistry
Returns the list of special tokens for a given name.
getSplitByNumber() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
When `split_by_number` is true, put a boundary between number and non-number transition.
getSplitByNumber() - Method in class sentencepiece.SentencepieceModel.TrainerSpec
When `split_by_number` is true, put a boundary between number and non-number transition.
getSplitByNumber() - Method in interface sentencepiece.SentencepieceModel.TrainerSpecOrBuilder
When `split_by_number` is true, put a boundary between number and non-number transition.
getSplitByUnicodeScript() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
Uses Unicode script to split sentence pieces.
getSplitByUnicodeScript() - Method in class sentencepiece.SentencepieceModel.TrainerSpec
Uses Unicode script to split sentence pieces.
getSplitByUnicodeScript() - Method in interface sentencepiece.SentencepieceModel.TrainerSpecOrBuilder
Uses Unicode script to split sentence pieces.
getSplitByWhitespace() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
Use a white space to split sentence pieces.
getSplitByWhitespace() - Method in class sentencepiece.SentencepieceModel.TrainerSpec
Use a white space to split sentence pieces.
getSplitByWhitespace() - Method in interface sentencepiece.SentencepieceModel.TrainerSpecOrBuilder
Use a white space to split sentence pieces.
getSplitDigits() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
Split all digits (0-9) into separate pieces.
getSplitDigits() - Method in class sentencepiece.SentencepieceModel.TrainerSpec
Split all digits (0-9) into separate pieces.
getSplitDigits() - Method in interface sentencepiece.SentencepieceModel.TrainerSpecOrBuilder
Split all digits (0-9) into separate pieces.
getStart() - Method in class com.yahoo.language.process.GramSplitter.Gram
 
getStem(int) - Method in interface com.yahoo.language.process.Token
Returns the stem at position i
getStemmer() - Method in interface com.yahoo.language.Linguistics
Returns a thread-unsafe stemmer or lemmatizer.
getTokenizer() - Method in interface com.yahoo.language.Linguistics
Returns a thread-unsafe tokenizer.
getTokenString() - Method in interface com.yahoo.language.process.Token
Returns the token string in a form suitable for indexing: The most lowercased variant of the most processed token form available, If called on a compound token this returns a lowercased form of the entire word.
getTrainerSpec() - Method in class sentencepiece.SentencepieceModel.ModelProto.Builder
Spec used to generate this model file.
getTrainerSpec() - Method in class sentencepiece.SentencepieceModel.ModelProto
Spec used to generate this model file.
getTrainerSpec() - Method in interface sentencepiece.SentencepieceModel.ModelProtoOrBuilder
Spec used to generate this model file.
getTrainerSpecBuilder() - Method in class sentencepiece.SentencepieceModel.ModelProto.Builder
Spec used to generate this model file.
getTrainerSpecOrBuilder() - Method in class sentencepiece.SentencepieceModel.ModelProto.Builder
Spec used to generate this model file.
getTrainerSpecOrBuilder() - Method in class sentencepiece.SentencepieceModel.ModelProto
Spec used to generate this model file.
getTrainerSpecOrBuilder() - Method in interface sentencepiece.SentencepieceModel.ModelProtoOrBuilder
Spec used to generate this model file.
getTrainExtremelyLargeCorpus() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
Increase bit depth to allow unigram model training on large (>10M sentences) corpora.
getTrainExtremelyLargeCorpus() - Method in class sentencepiece.SentencepieceModel.TrainerSpec
Increase bit depth to allow unigram model training on large (>10M sentences) corpora.
getTrainExtremelyLargeCorpus() - Method in interface sentencepiece.SentencepieceModel.TrainerSpecOrBuilder
Increase bit depth to allow unigram model training on large (>10M sentences) corpora.
getTrainingSentenceSize() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
Deprecated.
getTrainingSentenceSize() - Method in class sentencepiece.SentencepieceModel.TrainerSpec
Deprecated.
getTrainingSentenceSize() - Method in interface sentencepiece.SentencepieceModel.TrainerSpecOrBuilder
Deprecated.
getTransformer() - Method in interface com.yahoo.language.Linguistics
Returns a thread-unsafe transformer.
getTreatWhitespaceAsSuffix() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
Adds whitespace symbol (_) as a suffix instead of prefix.
getTreatWhitespaceAsSuffix() - Method in class sentencepiece.SentencepieceModel.TrainerSpec
Adds whitespace symbol (_) as a suffix instead of prefix.
getTreatWhitespaceAsSuffix() - Method in interface sentencepiece.SentencepieceModel.TrainerSpecOrBuilder
Adds whitespace symbol (_) as a suffix instead of prefix.
getType() - Method in interface com.yahoo.language.process.Token
Returns the type of this token - word, space or punctuation etc.
getType() - Method in class sentencepiece.SentencepieceModel.ModelProto.SentencePiece.Builder
optional .sentencepiece.ModelProto.SentencePiece.Type type = 3 [default = NORMAL];
getType() - Method in class sentencepiece.SentencepieceModel.ModelProto.SentencePiece
optional .sentencepiece.ModelProto.SentencePiece.Type type = 3 [default = NORMAL];
getType() - Method in interface sentencepiece.SentencepieceModel.ModelProto.SentencePieceOrBuilder
optional .sentencepiece.ModelProto.SentencePiece.Type type = 3 [default = NORMAL];
getUnkId() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
///////////////////////////////////////////////////////////////// Reserved special meta tokens.
getUnkId() - Method in class sentencepiece.SentencepieceModel.TrainerSpec
///////////////////////////////////////////////////////////////// Reserved special meta tokens.
getUnkId() - Method in interface sentencepiece.SentencepieceModel.TrainerSpecOrBuilder
///////////////////////////////////////////////////////////////// Reserved special meta tokens.
getUnknownFields() - Method in class sentencepiece.SentencepieceModel.ModelProto
 
getUnknownFields() - Method in class sentencepiece.SentencepieceModel.ModelProto.SentencePiece
 
getUnknownFields() - Method in class sentencepiece.SentencepieceModel.NormalizerSpec
 
getUnknownFields() - Method in class sentencepiece.SentencepieceModel.SelfTestData
 
getUnknownFields() - Method in class sentencepiece.SentencepieceModel.SelfTestData.Sample
 
getUnknownFields() - Method in class sentencepiece.SentencepieceModel.TrainerSpec
 
getUnkPiece() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
optional string unk_piece = 45 [default = "<unk>"];
getUnkPiece() - Method in class sentencepiece.SentencepieceModel.TrainerSpec
optional string unk_piece = 45 [default = "<unk>"];
getUnkPiece() - Method in interface sentencepiece.SentencepieceModel.TrainerSpecOrBuilder
optional string unk_piece = 45 [default = "<unk>"];
getUnkPieceBytes() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
optional string unk_piece = 45 [default = "<unk>"];
getUnkPieceBytes() - Method in class sentencepiece.SentencepieceModel.TrainerSpec
optional string unk_piece = 45 [default = "<unk>"];
getUnkPieceBytes() - Method in interface sentencepiece.SentencepieceModel.TrainerSpecOrBuilder
optional string unk_piece = 45 [default = "<unk>"];
getUnkSurface() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
Encodes <unk> into U+2047 (DOUBLE QUESTION MARK), since this character can be useful both for user and developer.
getUnkSurface() - Method in class sentencepiece.SentencepieceModel.TrainerSpec
Encodes <unk> into U+2047 (DOUBLE QUESTION MARK), since this character can be useful both for user and developer.
getUnkSurface() - Method in interface sentencepiece.SentencepieceModel.TrainerSpecOrBuilder
Encodes <unk> into U+2047 (DOUBLE QUESTION MARK), since this character can be useful both for user and developer.
getUnkSurfaceBytes() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
Encodes <unk> into U+2047 (DOUBLE QUESTION MARK), since this character can be useful both for user and developer.
getUnkSurfaceBytes() - Method in class sentencepiece.SentencepieceModel.TrainerSpec
Encodes <unk> into U+2047 (DOUBLE QUESTION MARK), since this character can be useful both for user and developer.
getUnkSurfaceBytes() - Method in interface sentencepiece.SentencepieceModel.TrainerSpecOrBuilder
Encodes <unk> into U+2047 (DOUBLE QUESTION MARK), since this character can be useful both for user and developer.
getUseAllVocab() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
use all symbols for vocab extraction.
getUseAllVocab() - Method in class sentencepiece.SentencepieceModel.TrainerSpec
use all symbols for vocab extraction.
getUseAllVocab() - Method in interface sentencepiece.SentencepieceModel.TrainerSpecOrBuilder
use all symbols for vocab extraction.
getUserDefinedSymbols(int) - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
Defines user defined symbols.
getUserDefinedSymbols(int) - Method in class sentencepiece.SentencepieceModel.TrainerSpec
Defines user defined symbols.
getUserDefinedSymbols(int) - Method in interface sentencepiece.SentencepieceModel.TrainerSpecOrBuilder
Defines user defined symbols.
getUserDefinedSymbolsBytes(int) - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
Defines user defined symbols.
getUserDefinedSymbolsBytes(int) - Method in class sentencepiece.SentencepieceModel.TrainerSpec
Defines user defined symbols.
getUserDefinedSymbolsBytes(int) - Method in interface sentencepiece.SentencepieceModel.TrainerSpecOrBuilder
Defines user defined symbols.
getUserDefinedSymbolsCount() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
Defines user defined symbols.
getUserDefinedSymbolsCount() - Method in class sentencepiece.SentencepieceModel.TrainerSpec
Defines user defined symbols.
getUserDefinedSymbolsCount() - Method in interface sentencepiece.SentencepieceModel.TrainerSpecOrBuilder
Defines user defined symbols.
getUserDefinedSymbolsList() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
Defines user defined symbols.
getUserDefinedSymbolsList() - Method in class sentencepiece.SentencepieceModel.TrainerSpec
Defines user defined symbols.
getUserDefinedSymbolsList() - Method in interface sentencepiece.SentencepieceModel.TrainerSpecOrBuilder
Defines user defined symbols.
getValue() - Method in enum com.yahoo.language.process.TokenType
Returns an int code for this type
getValueDescriptor() - Method in enum sentencepiece.SentencepieceModel.ModelProto.SentencePiece.Type
 
getValueDescriptor() - Method in enum sentencepiece.SentencepieceModel.TrainerSpec.ModelType
 
getVocabSize() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
Vocabulary size.
getVocabSize() - Method in class sentencepiece.SentencepieceModel.TrainerSpec
Vocabulary size.
getVocabSize() - Method in interface sentencepiece.SentencepieceModel.TrainerSpecOrBuilder
Vocabulary size.
getVocabularyOutputPieceScore() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
When creating the vocabulary file, defines whether or not to additionally output the score for each piece.
getVocabularyOutputPieceScore() - Method in class sentencepiece.SentencepieceModel.TrainerSpec
When creating the vocabulary file, defines whether or not to additionally output the score for each piece.
getVocabularyOutputPieceScore() - Method in interface sentencepiece.SentencepieceModel.TrainerSpecOrBuilder
When creating the vocabulary file, defines whether or not to additionally output the score for each piece.
GLAGOLITIC - com.yahoo.language.process.TokenScript
 
GOTHIC - com.yahoo.language.Language
Language tag "got".
GOTHIC - com.yahoo.language.process.TokenScript
 
Gram(int, int) - Constructor for class com.yahoo.language.process.GramSplitter.Gram
 
GRAM_SPLITTER - com.yahoo.language.Linguistics.Component
 
GramSplitter - Class in com.yahoo.language.process
A class which splits consecutive word character sequences into overlapping character n-grams.
GramSplitter(CharacterClasses) - Constructor for class com.yahoo.language.process.GramSplitter
 
GramSplitter.Gram - Class in com.yahoo.language.process
An immutable start index and length pair
GramSplitter.GramSplitterIterator - Class in com.yahoo.language.process
 
GramSplitterIterator(String, int, CharacterClasses) - Constructor for class com.yahoo.language.process.GramSplitter.GramSplitterIterator
 
GREEK - com.yahoo.language.Language
Language tag "el".
GREEK - com.yahoo.language.process.TokenScript
 
GREENLANDIC - com.yahoo.language.Language
Language tag "kl".
GUARANI - com.yahoo.language.Language
Language tag "gn".
GUJARATI - com.yahoo.language.Language
Language tag "gu".
GUJARATI - com.yahoo.language.process.TokenScript
 
GURMUKHI - com.yahoo.language.process.TokenScript
 

H

HAN - com.yahoo.language.process.TokenScript
 
HANGUL - com.yahoo.language.process.TokenScript
 
HANUNOO - com.yahoo.language.process.TokenScript
 
HARD_VOCAB_LIMIT_FIELD_NUMBER - Static variable in class sentencepiece.SentencepieceModel.TrainerSpec
 
hasAddDummyPrefix() - Method in class sentencepiece.SentencepieceModel.NormalizerSpec.Builder
Adds dummy whitespace at the beginning of text in order to treat "world" in "world" and "hello world" in the same way.
hasAddDummyPrefix() - Method in class sentencepiece.SentencepieceModel.NormalizerSpec
Adds dummy whitespace at the beginning of text in order to treat "world" in "world" and "hello world" in the same way.
hasAddDummyPrefix() - Method in interface sentencepiece.SentencepieceModel.NormalizerSpecOrBuilder
Adds dummy whitespace at the beginning of text in order to treat "world" in "world" and "hello world" in the same way.
hasAllowWhitespaceOnlyPieces() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
Allows pieces that only contain whitespaces instead of appearing only as prefix or suffix of other pieces.
hasAllowWhitespaceOnlyPieces() - Method in class sentencepiece.SentencepieceModel.TrainerSpec
Allows pieces that only contain whitespaces instead of appearing only as prefix or suffix of other pieces.
hasAllowWhitespaceOnlyPieces() - Method in interface sentencepiece.SentencepieceModel.TrainerSpecOrBuilder
Allows pieces that only contain whitespaces instead of appearing only as prefix or suffix of other pieces.
hasBosId() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
<s>
hasBosId() - Method in class sentencepiece.SentencepieceModel.TrainerSpec
<s>
hasBosId() - Method in interface sentencepiece.SentencepieceModel.TrainerSpecOrBuilder
<s>
hasBosPiece() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
optional string bos_piece = 46 [default = "<s>"];
hasBosPiece() - Method in class sentencepiece.SentencepieceModel.TrainerSpec
optional string bos_piece = 46 [default = "<s>"];
hasBosPiece() - Method in interface sentencepiece.SentencepieceModel.TrainerSpecOrBuilder
optional string bos_piece = 46 [default = "<s>"];
hasByteFallback() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
Decomposes unknown pieces into UTF-8 bytes.
hasByteFallback() - Method in class sentencepiece.SentencepieceModel.TrainerSpec
Decomposes unknown pieces into UTF-8 bytes.
hasByteFallback() - Method in interface sentencepiece.SentencepieceModel.TrainerSpecOrBuilder
Decomposes unknown pieces into UTF-8 bytes.
hasCharacterCoverage() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
///////////////////////////////////////////////////////////////// Training parameters.
hasCharacterCoverage() - Method in class sentencepiece.SentencepieceModel.TrainerSpec
///////////////////////////////////////////////////////////////// Training parameters.
hasCharacterCoverage() - Method in interface sentencepiece.SentencepieceModel.TrainerSpecOrBuilder
///////////////////////////////////////////////////////////////// Training parameters.
hasDenormalizerSpec() - Method in class sentencepiece.SentencepieceModel.ModelProto.Builder
Spec for text de-normalization.
hasDenormalizerSpec() - Method in class sentencepiece.SentencepieceModel.ModelProto
Spec for text de-normalization.
hasDenormalizerSpec() - Method in interface sentencepiece.SentencepieceModel.ModelProtoOrBuilder
Spec for text de-normalization.
hasEosId() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
</s>
hasEosId() - Method in class sentencepiece.SentencepieceModel.TrainerSpec
</s>
hasEosId() - Method in interface sentencepiece.SentencepieceModel.TrainerSpecOrBuilder
</s>
hasEosPiece() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
optional string eos_piece = 47 [default = "</s>"];
hasEosPiece() - Method in class sentencepiece.SentencepieceModel.TrainerSpec
optional string eos_piece = 47 [default = "</s>"];
hasEosPiece() - Method in interface sentencepiece.SentencepieceModel.TrainerSpecOrBuilder
optional string eos_piece = 47 [default = "</s>"];
hasEscapeWhitespaces() - Method in class sentencepiece.SentencepieceModel.NormalizerSpec.Builder
Replaces whitespace with meta symbol.
hasEscapeWhitespaces() - Method in class sentencepiece.SentencepieceModel.NormalizerSpec
Replaces whitespace with meta symbol.
hasEscapeWhitespaces() - Method in interface sentencepiece.SentencepieceModel.NormalizerSpecOrBuilder
Replaces whitespace with meta symbol.
hasExpected() - Method in class sentencepiece.SentencepieceModel.SelfTestData.Sample.Builder
optional string expected = 2;
hasExpected() - Method in class sentencepiece.SentencepieceModel.SelfTestData.Sample
optional string expected = 2;
hasExpected() - Method in interface sentencepiece.SentencepieceModel.SelfTestData.SampleOrBuilder
optional string expected = 2;
hasHardVocabLimit() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
`vocab_size` is treated as hard limit.
hasHardVocabLimit() - Method in class sentencepiece.SentencepieceModel.TrainerSpec
`vocab_size` is treated as hard limit.
hasHardVocabLimit() - Method in interface sentencepiece.SentencepieceModel.TrainerSpecOrBuilder
`vocab_size` is treated as hard limit.
hashCode() - Method in class com.yahoo.language.process.GramSplitter.Gram
 
hashCode() - Method in class com.yahoo.language.process.SpecialTokens.Token
 
hashCode() - Method in class sentencepiece.SentencepieceModel.ModelProto
 
hashCode() - Method in class sentencepiece.SentencepieceModel.ModelProto.SentencePiece
 
hashCode() - Method in class sentencepiece.SentencepieceModel.NormalizerSpec
 
hashCode() - Method in class sentencepiece.SentencepieceModel.SelfTestData
 
hashCode() - Method in class sentencepiece.SentencepieceModel.SelfTestData.Sample
 
hashCode() - Method in class sentencepiece.SentencepieceModel.TrainerSpec
 
hasInput() - Method in class sentencepiece.SentencepieceModel.SelfTestData.Sample.Builder
optional string input = 1;
hasInput() - Method in class sentencepiece.SentencepieceModel.SelfTestData.Sample
optional string input = 1;
hasInput() - Method in interface sentencepiece.SentencepieceModel.SelfTestData.SampleOrBuilder
optional string input = 1;
hasInputFormat() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
Input corpus format: "text": one-sentence-per-line text format (default) "tsv": sentence <tab> freq
hasInputFormat() - Method in class sentencepiece.SentencepieceModel.TrainerSpec
Input corpus format: "text": one-sentence-per-line text format (default) "tsv": sentence <tab> freq
hasInputFormat() - Method in interface sentencepiece.SentencepieceModel.TrainerSpecOrBuilder
Input corpus format: "text": one-sentence-per-line text format (default) "tsv": sentence <tab> freq
hasInputSentenceSize() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
Maximum size of sentences the trainer loads from `input` parameter.
hasInputSentenceSize() - Method in class sentencepiece.SentencepieceModel.TrainerSpec
Maximum size of sentences the trainer loads from `input` parameter.
hasInputSentenceSize() - Method in interface sentencepiece.SentencepieceModel.TrainerSpecOrBuilder
Maximum size of sentences the trainer loads from `input` parameter.
hasMaxSentenceLength() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
The maximum sentence length in byte.
hasMaxSentenceLength() - Method in class sentencepiece.SentencepieceModel.TrainerSpec
The maximum sentence length in byte.
hasMaxSentenceLength() - Method in interface sentencepiece.SentencepieceModel.TrainerSpecOrBuilder
The maximum sentence length in byte.
hasMaxSentencepieceLength() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
///////////////////////////////////////////////////////////////// SentencePiece parameters which control the shapes of sentence piece.
hasMaxSentencepieceLength() - Method in class sentencepiece.SentencepieceModel.TrainerSpec
///////////////////////////////////////////////////////////////// SentencePiece parameters which control the shapes of sentence piece.
hasMaxSentencepieceLength() - Method in interface sentencepiece.SentencepieceModel.TrainerSpecOrBuilder
///////////////////////////////////////////////////////////////// SentencePiece parameters which control the shapes of sentence piece.
hasMiningSentenceSize() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
Deprecated.
hasMiningSentenceSize() - Method in class sentencepiece.SentencepieceModel.TrainerSpec
Deprecated.
hasMiningSentenceSize() - Method in interface sentencepiece.SentencepieceModel.TrainerSpecOrBuilder
Deprecated.
hasModelPrefix() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
Output model file prefix.
hasModelPrefix() - Method in class sentencepiece.SentencepieceModel.TrainerSpec
Output model file prefix.
hasModelPrefix() - Method in interface sentencepiece.SentencepieceModel.TrainerSpecOrBuilder
Output model file prefix.
hasModelType() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
optional .sentencepiece.TrainerSpec.ModelType model_type = 3 [default = UNIGRAM];
hasModelType() - Method in class sentencepiece.SentencepieceModel.TrainerSpec
optional .sentencepiece.TrainerSpec.ModelType model_type = 3 [default = UNIGRAM];
hasModelType() - Method in interface sentencepiece.SentencepieceModel.TrainerSpecOrBuilder
optional .sentencepiece.TrainerSpec.ModelType model_type = 3 [default = UNIGRAM];
hasName() - Method in class sentencepiece.SentencepieceModel.NormalizerSpec.Builder
name of normalization rule.
hasName() - Method in class sentencepiece.SentencepieceModel.NormalizerSpec
name of normalization rule.
hasName() - Method in interface sentencepiece.SentencepieceModel.NormalizerSpecOrBuilder
name of normalization rule.
hasNext() - Method in class com.yahoo.language.process.GramSplitter.GramSplitterIterator
 
hasNormalizationRuleTsv() - Method in class sentencepiece.SentencepieceModel.NormalizerSpec.Builder
Custom normalization rule file in TSV format.
hasNormalizationRuleTsv() - Method in class sentencepiece.SentencepieceModel.NormalizerSpec
Custom normalization rule file in TSV format.
hasNormalizationRuleTsv() - Method in interface sentencepiece.SentencepieceModel.NormalizerSpecOrBuilder
Custom normalization rule file in TSV format.
hasNormalizerSpec() - Method in class sentencepiece.SentencepieceModel.ModelProto.Builder
Spec for text normalization.
hasNormalizerSpec() - Method in class sentencepiece.SentencepieceModel.ModelProto
Spec for text normalization.
hasNormalizerSpec() - Method in interface sentencepiece.SentencepieceModel.ModelProtoOrBuilder
Spec for text normalization.
hasNumSubIterations() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
Number of EM sub iterations.
hasNumSubIterations() - Method in class sentencepiece.SentencepieceModel.TrainerSpec
Number of EM sub iterations.
hasNumSubIterations() - Method in interface sentencepiece.SentencepieceModel.TrainerSpecOrBuilder
Number of EM sub iterations.
hasNumThreads() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
Number of threads in the training.
hasNumThreads() - Method in class sentencepiece.SentencepieceModel.TrainerSpec
Number of threads in the training.
hasNumThreads() - Method in interface sentencepiece.SentencepieceModel.TrainerSpecOrBuilder
Number of threads in the training.
hasPadId() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
<pad> (padding)
hasPadId() - Method in class sentencepiece.SentencepieceModel.TrainerSpec
<pad> (padding)
hasPadId() - Method in interface sentencepiece.SentencepieceModel.TrainerSpecOrBuilder
<pad> (padding)
hasPadPiece() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
optional string pad_piece = 48 [default = "<pad>"];
hasPadPiece() - Method in class sentencepiece.SentencepieceModel.TrainerSpec
optional string pad_piece = 48 [default = "<pad>"];
hasPadPiece() - Method in interface sentencepiece.SentencepieceModel.TrainerSpecOrBuilder
optional string pad_piece = 48 [default = "<pad>"];
hasPiece() - Method in class sentencepiece.SentencepieceModel.ModelProto.SentencePiece.Builder
piece must not be empty.
hasPiece() - Method in class sentencepiece.SentencepieceModel.ModelProto.SentencePiece
piece must not be empty.
hasPiece() - Method in interface sentencepiece.SentencepieceModel.ModelProto.SentencePieceOrBuilder
piece must not be empty.
hasPrecompiledCharsmap() - Method in class sentencepiece.SentencepieceModel.NormalizerSpec.Builder
Pre-compiled normalization rule created by Builder::GetPrecompiledCharsMap() or Builder::CompileCharsMap() method.
hasPrecompiledCharsmap() - Method in class sentencepiece.SentencepieceModel.NormalizerSpec
Pre-compiled normalization rule created by Builder::GetPrecompiledCharsMap() or Builder::CompileCharsMap() method.
hasPrecompiledCharsmap() - Method in interface sentencepiece.SentencepieceModel.NormalizerSpecOrBuilder
Pre-compiled normalization rule created by Builder::GetPrecompiledCharsMap() or Builder::CompileCharsMap() method.
hasRemoveExtraWhitespaces() - Method in class sentencepiece.SentencepieceModel.NormalizerSpec.Builder
Removes leading, trailing, and duplicate internal whitespace.
hasRemoveExtraWhitespaces() - Method in class sentencepiece.SentencepieceModel.NormalizerSpec
Removes leading, trailing, and duplicate internal whitespace.
hasRemoveExtraWhitespaces() - Method in interface sentencepiece.SentencepieceModel.NormalizerSpecOrBuilder
Removes leading, trailing, and duplicate internal whitespace.
hasRequiredChars() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
Defines required characters.
hasRequiredChars() - Method in class sentencepiece.SentencepieceModel.TrainerSpec
Defines required characters.
hasRequiredChars() - Method in interface sentencepiece.SentencepieceModel.TrainerSpecOrBuilder
Defines required characters.
hasScore() - Method in class sentencepiece.SentencepieceModel.ModelProto.SentencePiece.Builder
optional float score = 2;
hasScore() - Method in class sentencepiece.SentencepieceModel.ModelProto.SentencePiece
optional float score = 2;
hasScore() - Method in interface sentencepiece.SentencepieceModel.ModelProto.SentencePieceOrBuilder
optional float score = 2;
hasSeedSentencepieceSize() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
The size of seed sentencepieces.
hasSeedSentencepieceSize() - Method in class sentencepiece.SentencepieceModel.TrainerSpec
The size of seed sentencepieces.
hasSeedSentencepieceSize() - Method in interface sentencepiece.SentencepieceModel.TrainerSpecOrBuilder
The size of seed sentencepieces.
hasSelfTestData() - Method in class sentencepiece.SentencepieceModel.ModelProto.Builder
Stores sample input and its expected segmentation to verify the model.
hasSelfTestData() - Method in class sentencepiece.SentencepieceModel.ModelProto
Stores sample input and its expected segmentation to verify the model.
hasSelfTestData() - Method in interface sentencepiece.SentencepieceModel.ModelProtoOrBuilder
Stores sample input and its expected segmentation to verify the model.
hasSelfTestSampleSize() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
Size of self-test samples, which are encoded in the model file.
hasSelfTestSampleSize() - Method in class sentencepiece.SentencepieceModel.TrainerSpec
Size of self-test samples, which are encoded in the model file.
hasSelfTestSampleSize() - Method in interface sentencepiece.SentencepieceModel.TrainerSpecOrBuilder
Size of self-test samples, which are encoded in the model file.
hasShrinkingFactor() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
In every EM sub-iterations, keeps top `shrinking_factor` * `current sentencepieces size` with respect to the loss of the sentence piece.
hasShrinkingFactor() - Method in class sentencepiece.SentencepieceModel.TrainerSpec
In every EM sub-iterations, keeps top `shrinking_factor` * `current sentencepieces size` with respect to the loss of the sentence piece.
hasShrinkingFactor() - Method in interface sentencepiece.SentencepieceModel.TrainerSpecOrBuilder
In every EM sub-iterations, keeps top `shrinking_factor` * `current sentencepieces size` with respect to the loss of the sentence piece.
hasShuffleInputSentence() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
optional bool shuffle_input_sentence = 19 [default = true];
hasShuffleInputSentence() - Method in class sentencepiece.SentencepieceModel.TrainerSpec
optional bool shuffle_input_sentence = 19 [default = true];
hasShuffleInputSentence() - Method in interface sentencepiece.SentencepieceModel.TrainerSpecOrBuilder
optional bool shuffle_input_sentence = 19 [default = true];
hasSplitByNumber() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
When `split_by_number` is true, put a boundary between number and non-number transition.
hasSplitByNumber() - Method in class sentencepiece.SentencepieceModel.TrainerSpec
When `split_by_number` is true, put a boundary between number and non-number transition.
hasSplitByNumber() - Method in interface sentencepiece.SentencepieceModel.TrainerSpecOrBuilder
When `split_by_number` is true, put a boundary between number and non-number transition.
hasSplitByUnicodeScript() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
Uses Unicode script to split sentence pieces.
hasSplitByUnicodeScript() - Method in class sentencepiece.SentencepieceModel.TrainerSpec
Uses Unicode script to split sentence pieces.
hasSplitByUnicodeScript() - Method in interface sentencepiece.SentencepieceModel.TrainerSpecOrBuilder
Uses Unicode script to split sentence pieces.
hasSplitByWhitespace() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
Use a white space to split sentence pieces.
hasSplitByWhitespace() - Method in class sentencepiece.SentencepieceModel.TrainerSpec
Use a white space to split sentence pieces.
hasSplitByWhitespace() - Method in interface sentencepiece.SentencepieceModel.TrainerSpecOrBuilder
Use a white space to split sentence pieces.
hasSplitDigits() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
Split all digits (0-9) into separate pieces.
hasSplitDigits() - Method in class sentencepiece.SentencepieceModel.TrainerSpec
Split all digits (0-9) into separate pieces.
hasSplitDigits() - Method in interface sentencepiece.SentencepieceModel.TrainerSpecOrBuilder
Split all digits (0-9) into separate pieces.
hasTrainerSpec() - Method in class sentencepiece.SentencepieceModel.ModelProto.Builder
Spec used to generate this model file.
hasTrainerSpec() - Method in class sentencepiece.SentencepieceModel.ModelProto
Spec used to generate this model file.
hasTrainerSpec() - Method in interface sentencepiece.SentencepieceModel.ModelProtoOrBuilder
Spec used to generate this model file.
hasTrainExtremelyLargeCorpus() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
Increase bit depth to allow unigram model training on large (>10M sentences) corpora.
hasTrainExtremelyLargeCorpus() - Method in class sentencepiece.SentencepieceModel.TrainerSpec
Increase bit depth to allow unigram model training on large (>10M sentences) corpora.
hasTrainExtremelyLargeCorpus() - Method in interface sentencepiece.SentencepieceModel.TrainerSpecOrBuilder
Increase bit depth to allow unigram model training on large (>10M sentences) corpora.
hasTrainingSentenceSize() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
Deprecated.
hasTrainingSentenceSize() - Method in class sentencepiece.SentencepieceModel.TrainerSpec
Deprecated.
hasTrainingSentenceSize() - Method in interface sentencepiece.SentencepieceModel.TrainerSpecOrBuilder
Deprecated.
hasTreatWhitespaceAsSuffix() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
Adds whitespace symbol (_) as a suffix instead of prefix.
hasTreatWhitespaceAsSuffix() - Method in class sentencepiece.SentencepieceModel.TrainerSpec
Adds whitespace symbol (_) as a suffix instead of prefix.
hasTreatWhitespaceAsSuffix() - Method in interface sentencepiece.SentencepieceModel.TrainerSpecOrBuilder
Adds whitespace symbol (_) as a suffix instead of prefix.
hasType() - Method in class sentencepiece.SentencepieceModel.ModelProto.SentencePiece.Builder
optional .sentencepiece.ModelProto.SentencePiece.Type type = 3 [default = NORMAL];
hasType() - Method in class sentencepiece.SentencepieceModel.ModelProto.SentencePiece
optional .sentencepiece.ModelProto.SentencePiece.Type type = 3 [default = NORMAL];
hasType() - Method in interface sentencepiece.SentencepieceModel.ModelProto.SentencePieceOrBuilder
optional .sentencepiece.ModelProto.SentencePiece.Type type = 3 [default = NORMAL];
hasUnkId() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
///////////////////////////////////////////////////////////////// Reserved special meta tokens.
hasUnkId() - Method in class sentencepiece.SentencepieceModel.TrainerSpec
///////////////////////////////////////////////////////////////// Reserved special meta tokens.
hasUnkId() - Method in interface sentencepiece.SentencepieceModel.TrainerSpecOrBuilder
///////////////////////////////////////////////////////////////// Reserved special meta tokens.
hasUnkPiece() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
optional string unk_piece = 45 [default = "<unk>"];
hasUnkPiece() - Method in class sentencepiece.SentencepieceModel.TrainerSpec
optional string unk_piece = 45 [default = "<unk>"];
hasUnkPiece() - Method in interface sentencepiece.SentencepieceModel.TrainerSpecOrBuilder
optional string unk_piece = 45 [default = "<unk>"];
hasUnkSurface() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
Encodes <unk> into U+2047 (DOUBLE QUESTION MARK), since this character can be useful both for user and developer.
hasUnkSurface() - Method in class sentencepiece.SentencepieceModel.TrainerSpec
Encodes <unk> into U+2047 (DOUBLE QUESTION MARK), since this character can be useful both for user and developer.
hasUnkSurface() - Method in interface sentencepiece.SentencepieceModel.TrainerSpecOrBuilder
Encodes <unk> into U+2047 (DOUBLE QUESTION MARK), since this character can be useful both for user and developer.
hasUseAllVocab() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
use all symbols for vocab extraction.
hasUseAllVocab() - Method in class sentencepiece.SentencepieceModel.TrainerSpec
use all symbols for vocab extraction.
hasUseAllVocab() - Method in interface sentencepiece.SentencepieceModel.TrainerSpecOrBuilder
use all symbols for vocab extraction.
hasVocabSize() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
Vocabulary size.
hasVocabSize() - Method in class sentencepiece.SentencepieceModel.TrainerSpec
Vocabulary size.
hasVocabSize() - Method in interface sentencepiece.SentencepieceModel.TrainerSpecOrBuilder
Vocabulary size.
hasVocabularyOutputPieceScore() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
When creating the vocabulary file, defines whether or not to additionally output the score for each piece.
hasVocabularyOutputPieceScore() - Method in class sentencepiece.SentencepieceModel.TrainerSpec
When creating the vocabulary file, defines whether or not to additionally output the score for each piece.
hasVocabularyOutputPieceScore() - Method in interface sentencepiece.SentencepieceModel.TrainerSpecOrBuilder
When creating the vocabulary file, defines whether or not to additionally output the score for each piece.
HAUSA - com.yahoo.language.Language
Language tag "ha".
HEBREW - com.yahoo.language.Language
Language tag "he".
HEBREW - com.yahoo.language.process.TokenScript
 
highestScore - com.yahoo.language.sentencepiece.Scoring
Find the segmentation that has the highest score
highestScore - com.yahoo.language.sentencepiece.SentencePieceConfig.Scoring.Enum
 
highestScore - Static variable in class com.yahoo.language.sentencepiece.SentencePieceConfig.Scoring
 
HINDI - com.yahoo.language.Language
Language tag "hi".
Hint - Class in com.yahoo.language.detect
A hint that can be given to a Detector.
HIRAGANA - com.yahoo.language.process.TokenScript
 
HUNGARIAN - com.yahoo.language.Language
Language tag "hu".

I

ICELANDIC - com.yahoo.language.Language
Language tag "is".
INDONESIAN - com.yahoo.language.Language
Language tag "id".
INHERITED - com.yahoo.language.process.TokenScript
 
INPUT_FIELD_NUMBER - Static variable in class sentencepiece.SentencepieceModel.SelfTestData.Sample
 
INPUT_FIELD_NUMBER - Static variable in class sentencepiece.SentencepieceModel.TrainerSpec
 
INPUT_FORMAT_FIELD_NUMBER - Static variable in class sentencepiece.SentencepieceModel.TrainerSpec
 
INPUT_SENTENCE_SIZE_FIELD_NUMBER - Static variable in class sentencepiece.SentencepieceModel.TrainerSpec
 
INTERLINGUA - com.yahoo.language.Language
Language tag "ia".
INTERLINGUE - com.yahoo.language.Language
Language tag "ie".
internalGetFieldAccessorTable() - Method in class sentencepiece.SentencepieceModel.ModelProto.Builder
 
internalGetFieldAccessorTable() - Method in class sentencepiece.SentencepieceModel.ModelProto
 
internalGetFieldAccessorTable() - Method in class sentencepiece.SentencepieceModel.ModelProto.SentencePiece.Builder
 
internalGetFieldAccessorTable() - Method in class sentencepiece.SentencepieceModel.ModelProto.SentencePiece
 
internalGetFieldAccessorTable() - Method in class sentencepiece.SentencepieceModel.NormalizerSpec.Builder
 
internalGetFieldAccessorTable() - Method in class sentencepiece.SentencepieceModel.NormalizerSpec
 
internalGetFieldAccessorTable() - Method in class sentencepiece.SentencepieceModel.SelfTestData.Builder
 
internalGetFieldAccessorTable() - Method in class sentencepiece.SentencepieceModel.SelfTestData
 
internalGetFieldAccessorTable() - Method in class sentencepiece.SentencepieceModel.SelfTestData.Sample.Builder
 
internalGetFieldAccessorTable() - Method in class sentencepiece.SentencepieceModel.SelfTestData.Sample
 
internalGetFieldAccessorTable() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
 
internalGetFieldAccessorTable() - Method in class sentencepiece.SentencepieceModel.TrainerSpec
 
internalGetValueMap() - Static method in enum sentencepiece.SentencepieceModel.ModelProto.SentencePiece.Type
 
internalGetValueMap() - Static method in enum sentencepiece.SentencepieceModel.TrainerSpec.ModelType
 
INUKTITUT - com.yahoo.language.Language
Language tag "iu".
INUPIAK - com.yahoo.language.Language
Language tag "ik".
IRISH - com.yahoo.language.Language
Language tag "ga".
isCjk() - Method in enum com.yahoo.language.Language
Returns whether this is a "cjk" language.
isDigit(int) - Method in class com.yahoo.language.process.CharacterClasses
Returns true for code points which should be considered digits - same as java.lang.Character.isDigit
isIndexable() - Method in interface com.yahoo.language.process.Token
Whether this token should be indexed
isIndexable() - Method in enum com.yahoo.language.process.TokenType
Marker for whether this type of token can be indexed for search.
isInitialized() - Method in class sentencepiece.SentencepieceModel.ModelProto.Builder
 
isInitialized() - Method in class sentencepiece.SentencepieceModel.ModelProto
 
isInitialized() - Method in class sentencepiece.SentencepieceModel.ModelProto.SentencePiece.Builder
 
isInitialized() - Method in class sentencepiece.SentencepieceModel.ModelProto.SentencePiece
 
isInitialized() - Method in class sentencepiece.SentencepieceModel.NormalizerSpec.Builder
 
isInitialized() - Method in class sentencepiece.SentencepieceModel.NormalizerSpec
 
isInitialized() - Method in class sentencepiece.SentencepieceModel.SelfTestData.Builder
 
isInitialized() - Method in class sentencepiece.SentencepieceModel.SelfTestData
 
isInitialized() - Method in class sentencepiece.SentencepieceModel.SelfTestData.Sample.Builder
 
isInitialized() - Method in class sentencepiece.SentencepieceModel.SelfTestData.Sample
 
isInitialized() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
 
isInitialized() - Method in class sentencepiece.SentencepieceModel.TrainerSpec
 
isLatin(int) - Method in class com.yahoo.language.process.CharacterClasses
Returns true if this is a latin character
isLatinDigit(int) - Method in class com.yahoo.language.process.CharacterClasses
Returns true if this is a latin digit (other digits are not consistently parsed into numbers by Java)
isLetter(int) - Method in class com.yahoo.language.process.CharacterClasses
Returns true for code points which are letters in unicode 3 or 4, plus some additional characters which are useful to view as letters even though not defined as such in unicode.
isLetterOrDigit(int) - Method in class com.yahoo.language.process.CharacterClasses
Convenience, returns isLetter(c) || isDigit(c)
isLocal() - Method in class com.yahoo.language.detect.Detection
 
isSpecialToken() - Method in interface com.yahoo.language.process.Token
Returns whether this is an instance of a declared special token (e.g.
ITALIAN - com.yahoo.language.Language
Language tag "it".

J

JAPANESE - com.yahoo.language.Language
Language tag "ja".
JAVANESE - com.yahoo.language.Language
Language tag "jw".

K

KANNADA - com.yahoo.language.Language
Language tag "kn".
KANNADA - com.yahoo.language.process.TokenScript
 
KASHMIRI - com.yahoo.language.Language
Language tag "ks".
KATAKANA - com.yahoo.language.process.TokenScript
 
KAZAKH - com.yahoo.language.Language
Language tag "kk".
KHAROSHTHI - com.yahoo.language.process.TokenScript
 
KHMER - com.yahoo.language.process.TokenScript
 
KINYARWANDA - com.yahoo.language.Language
Language tag "rw".
KIRGHIZ - com.yahoo.language.Language
Language tag "ky".
KIRUNDI - com.yahoo.language.Language
Language tag "rn".
KOREAN - com.yahoo.language.Language
Language tag "ko".
KURDISH - com.yahoo.language.Language
Language tag "ku".

L

language() - Method in class com.yahoo.language.sentencepiece.SentencePieceConfig.Model
 
language(String) - Method in class com.yahoo.language.sentencepiece.SentencePieceConfig.Model.Builder
 
Language - Enum in com.yahoo.language
 
languageCode() - Method in enum com.yahoo.language.Language
 
LAO - com.yahoo.language.process.TokenScript
 
LAOTHIAN - com.yahoo.language.Language
Language tag "lo".
LATIN - com.yahoo.language.Language
Language tag "la".
LATIN - com.yahoo.language.process.TokenScript
 
LATVIAN - com.yahoo.language.Language
Language tag "lv".
LIMBU - com.yahoo.language.process.TokenScript
 
LINEARB - com.yahoo.language.process.TokenScript
 
LINGALA - com.yahoo.language.Language
Language tag "ln".
Linguistics - Interface in com.yahoo.language
Factory of linguistic processors.
Linguistics.Component - Enum in com.yahoo.language
 
LinguisticsCase - Class in com.yahoo.language
This class provides a case normalization operation to be used e.g.
LinguisticsCase() - Constructor for class com.yahoo.language.LinguisticsCase
 
LITHUANIAN - com.yahoo.language.Language
Language tag "lt".
LocaleFactory - Class in com.yahoo.language
 

M

MACEDONIAN - com.yahoo.language.Language
Language tag "mk".
MALAGASY - com.yahoo.language.Language
Language tag "mg".
MALAY - com.yahoo.language.Language
Language tag "ms".
MALAYALAM - com.yahoo.language.Language
Language tag "ml".
MALAYALAM - com.yahoo.language.process.TokenScript
 
MALTESE - com.yahoo.language.Language
Language tag "mt".
MANIPURI - com.yahoo.language.Language
Language tag "mni".
MAORI - com.yahoo.language.Language
Language tag "mi".
MARATHI - com.yahoo.language.Language
Language tag "mr".
MARKER - com.yahoo.language.process.TokenType
 
MAX_SENTENCE_LENGTH_FIELD_NUMBER - Static variable in class sentencepiece.SentencepieceModel.TrainerSpec
 
MAX_SENTENCEPIECE_LENGTH_FIELD_NUMBER - Static variable in class sentencepiece.SentencepieceModel.TrainerSpec
 
mergeDenormalizerSpec(SentencepieceModel.NormalizerSpec) - Method in class sentencepiece.SentencepieceModel.ModelProto.Builder
Spec for text de-normalization.
mergeFrom(CodedInputStream, ExtensionRegistryLite) - Method in class sentencepiece.SentencepieceModel.ModelProto.Builder
 
mergeFrom(CodedInputStream, ExtensionRegistryLite) - Method in class sentencepiece.SentencepieceModel.ModelProto.SentencePiece.Builder
 
mergeFrom(CodedInputStream, ExtensionRegistryLite) - Method in class sentencepiece.SentencepieceModel.NormalizerSpec.Builder
 
mergeFrom(CodedInputStream, ExtensionRegistryLite) - Method in class sentencepiece.SentencepieceModel.SelfTestData.Builder
 
mergeFrom(CodedInputStream, ExtensionRegistryLite) - Method in class sentencepiece.SentencepieceModel.SelfTestData.Sample.Builder
 
mergeFrom(CodedInputStream, ExtensionRegistryLite) - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
 
mergeFrom(Message) - Method in class sentencepiece.SentencepieceModel.ModelProto.Builder
 
mergeFrom(Message) - Method in class sentencepiece.SentencepieceModel.ModelProto.SentencePiece.Builder
 
mergeFrom(Message) - Method in class sentencepiece.SentencepieceModel.NormalizerSpec.Builder
 
mergeFrom(Message) - Method in class sentencepiece.SentencepieceModel.SelfTestData.Builder
 
mergeFrom(Message) - Method in class sentencepiece.SentencepieceModel.SelfTestData.Sample.Builder
 
mergeFrom(Message) - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
 
mergeFrom(SentencepieceModel.ModelProto) - Method in class sentencepiece.SentencepieceModel.ModelProto.Builder
 
mergeFrom(SentencepieceModel.ModelProto.SentencePiece) - Method in class sentencepiece.SentencepieceModel.ModelProto.SentencePiece.Builder
 
mergeFrom(SentencepieceModel.NormalizerSpec) - Method in class sentencepiece.SentencepieceModel.NormalizerSpec.Builder
 
mergeFrom(SentencepieceModel.SelfTestData) - Method in class sentencepiece.SentencepieceModel.SelfTestData.Builder
 
mergeFrom(SentencepieceModel.SelfTestData.Sample) - Method in class sentencepiece.SentencepieceModel.SelfTestData.Sample.Builder
 
mergeFrom(SentencepieceModel.TrainerSpec) - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
 
mergeNormalizerSpec(SentencepieceModel.NormalizerSpec) - Method in class sentencepiece.SentencepieceModel.ModelProto.Builder
Spec for text normalization.
mergeSelfTestData(SentencepieceModel.SelfTestData) - Method in class sentencepiece.SentencepieceModel.ModelProto.Builder
Stores sample input and its expected segmentation to verify the model.
mergeTrainerSpec(SentencepieceModel.TrainerSpec) - Method in class sentencepiece.SentencepieceModel.ModelProto.Builder
Spec used to generate this model file.
mergeUnknownFields(UnknownFieldSet) - Method in class sentencepiece.SentencepieceModel.ModelProto.Builder
 
mergeUnknownFields(UnknownFieldSet) - Method in class sentencepiece.SentencepieceModel.ModelProto.SentencePiece.Builder
 
mergeUnknownFields(UnknownFieldSet) - Method in class sentencepiece.SentencepieceModel.NormalizerSpec.Builder
 
mergeUnknownFields(UnknownFieldSet) - Method in class sentencepiece.SentencepieceModel.SelfTestData.Builder
 
mergeUnknownFields(UnknownFieldSet) - Method in class sentencepiece.SentencepieceModel.SelfTestData.Sample.Builder
 
mergeUnknownFields(UnknownFieldSet) - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
 
MINING_SENTENCE_SIZE_FIELD_NUMBER - Static variable in class sentencepiece.SentencepieceModel.TrainerSpec
 
model - Variable in class com.yahoo.language.sentencepiece.SentencePieceConfig.Builder
 
model() - Method in class com.yahoo.language.sentencepiece.SentencePieceConfig
 
model(int) - Method in class com.yahoo.language.sentencepiece.SentencePieceConfig
 
model(SentencePieceConfig.Model.Builder) - Method in class com.yahoo.language.sentencepiece.SentencePieceConfig.Builder
Add the given builder to this builder's list of Model builders
model(List<SentencePieceConfig.Model.Builder>) - Method in class com.yahoo.language.sentencepiece.SentencePieceConfig.Builder
Set the given list as this builder's list of Model builders
Model(SentencePieceConfig.Model.Builder) - Constructor for class com.yahoo.language.sentencepiece.SentencePieceConfig.Model
 
MODEL_PREFIX_FIELD_NUMBER - Static variable in class sentencepiece.SentencepieceModel.TrainerSpec
 
MODEL_TYPE_FIELD_NUMBER - Static variable in class sentencepiece.SentencepieceModel.TrainerSpec
 
MOLDAVIAN - com.yahoo.language.Language
Language tag "mo".
MONGOLIAN - com.yahoo.language.Language
Language tag "mn".
MONGOLIAN - com.yahoo.language.process.TokenScript
 
MUNDA - com.yahoo.language.Language
Language tag "mun".
MYANMAR - com.yahoo.language.process.TokenScript
 

N

name() - Method in class com.yahoo.language.process.SpecialTokens
Returns the name of this special tokens list
NAME_FIELD_NUMBER - Static variable in class sentencepiece.SentencepieceModel.NormalizerSpec
 
NAURU - com.yahoo.language.Language
Language tag "na".
NEPALI - com.yahoo.language.Language
Language tag "ne".
newBuilder() - Static method in class sentencepiece.SentencepieceModel.ModelProto
 
newBuilder() - Static method in class sentencepiece.SentencepieceModel.ModelProto.SentencePiece
 
newBuilder() - Static method in class sentencepiece.SentencepieceModel.NormalizerSpec
 
newBuilder() - Static method in class sentencepiece.SentencepieceModel.SelfTestData
 
newBuilder() - Static method in class sentencepiece.SentencepieceModel.SelfTestData.Sample
 
newBuilder() - Static method in class sentencepiece.SentencepieceModel.TrainerSpec
 
newBuilder(SentencepieceModel.ModelProto) - Static method in class sentencepiece.SentencepieceModel.ModelProto
 
newBuilder(SentencepieceModel.ModelProto.SentencePiece) - Static method in class sentencepiece.SentencepieceModel.ModelProto.SentencePiece
 
newBuilder(SentencepieceModel.NormalizerSpec) - Static method in class sentencepiece.SentencepieceModel.NormalizerSpec
 
newBuilder(SentencepieceModel.SelfTestData) - Static method in class sentencepiece.SentencepieceModel.SelfTestData
 
newBuilder(SentencepieceModel.SelfTestData.Sample) - Static method in class sentencepiece.SentencepieceModel.SelfTestData.Sample
 
newBuilder(SentencepieceModel.TrainerSpec) - Static method in class sentencepiece.SentencepieceModel.TrainerSpec
 
newBuilderForType() - Method in class sentencepiece.SentencepieceModel.ModelProto
 
newBuilderForType() - Method in class sentencepiece.SentencepieceModel.ModelProto.SentencePiece
 
newBuilderForType() - Method in class sentencepiece.SentencepieceModel.NormalizerSpec
 
newBuilderForType() - Method in class sentencepiece.SentencepieceModel.SelfTestData
 
newBuilderForType() - Method in class sentencepiece.SentencepieceModel.SelfTestData.Sample
 
newBuilderForType() - Method in class sentencepiece.SentencepieceModel.TrainerSpec
 
newBuilderForType(GeneratedMessageV3.BuilderParent) - Method in class sentencepiece.SentencepieceModel.ModelProto
 
newBuilderForType(GeneratedMessageV3.BuilderParent) - Method in class sentencepiece.SentencepieceModel.ModelProto.SentencePiece
 
newBuilderForType(GeneratedMessageV3.BuilderParent) - Method in class sentencepiece.SentencepieceModel.NormalizerSpec
 
newBuilderForType(GeneratedMessageV3.BuilderParent) - Method in class sentencepiece.SentencepieceModel.SelfTestData
 
newBuilderForType(GeneratedMessageV3.BuilderParent) - Method in class sentencepiece.SentencepieceModel.SelfTestData.Sample
 
newBuilderForType(GeneratedMessageV3.BuilderParent) - Method in class sentencepiece.SentencepieceModel.TrainerSpec
 
newCountryHint(String) - Static method in class com.yahoo.language.detect.Hint
 
newInstance(GeneratedMessageV3.UnusedPrivateParameter) - Method in class sentencepiece.SentencepieceModel.ModelProto
 
newInstance(GeneratedMessageV3.UnusedPrivateParameter) - Method in class sentencepiece.SentencepieceModel.ModelProto.SentencePiece
 
newInstance(GeneratedMessageV3.UnusedPrivateParameter) - Method in class sentencepiece.SentencepieceModel.NormalizerSpec
 
newInstance(GeneratedMessageV3.UnusedPrivateParameter) - Method in class sentencepiece.SentencepieceModel.SelfTestData
 
newInstance(GeneratedMessageV3.UnusedPrivateParameter) - Method in class sentencepiece.SentencepieceModel.SelfTestData.Sample
 
newInstance(GeneratedMessageV3.UnusedPrivateParameter) - Method in class sentencepiece.SentencepieceModel.TrainerSpec
 
newInstance(String, String) - Static method in class com.yahoo.language.detect.Hint
 
newMarketHint(String) - Static method in class com.yahoo.language.detect.Hint
 
next() - Method in class com.yahoo.language.process.GramSplitter.GramSplitterIterator
 
NONE - com.yahoo.language.process.StemMode
 
NORMAL - sentencepiece.SentencepieceModel.ModelProto.SentencePiece.Type
normal symbol
NORMAL_VALUE - Static variable in enum sentencepiece.SentencepieceModel.ModelProto.SentencePiece.Type
normal symbol
NORMALIZATION_RULE_TSV_FIELD_NUMBER - Static variable in class sentencepiece.SentencepieceModel.NormalizerSpec
 
normalize(String) - Method in interface com.yahoo.language.process.Normalizer
NFKC normalizes a String.
normalize(String) - Method in class com.yahoo.language.sentencepiece.SentencePieceEncoder
 
Normalizer - Interface in com.yahoo.language.process
This interface provides NFKC normalization of Strings through the underlying linguistics library.
NORMALIZER - com.yahoo.language.Linguistics.Component
 
NORMALIZER_SPEC_FIELD_NUMBER - Static variable in class sentencepiece.SentencepieceModel.ModelProto
 
NORWEGIAN_BOKMAL - com.yahoo.language.Language
Language tag "nb".
NORWEGIAN_NYNORSK - com.yahoo.language.Language
Language tag "nn".
NUM_SUB_ITERATIONS_FIELD_NUMBER - Static variable in class sentencepiece.SentencepieceModel.TrainerSpec
 
NUM_THREADS_FIELD_NUMBER - Static variable in class sentencepiece.SentencepieceModel.TrainerSpec
 
NUMERIC - com.yahoo.language.process.TokenType
 

O

OCCITAN - com.yahoo.language.Language
Language tag "oc".
OGHAM - com.yahoo.language.process.TokenScript
 
OLDITALIC - com.yahoo.language.process.TokenScript
 
OLDPERSIAN - com.yahoo.language.process.TokenScript
 
ORIYA - com.yahoo.language.Language
Language tag "or".
ORIYA - com.yahoo.language.process.TokenScript
 
OROMO - com.yahoo.language.Language
Language tag "om".
OSMANYA - com.yahoo.language.process.TokenScript
 

P

PAD_ID_FIELD_NUMBER - Static variable in class sentencepiece.SentencepieceModel.TrainerSpec
 
PAD_PIECE_FIELD_NUMBER - Static variable in class sentencepiece.SentencepieceModel.TrainerSpec
 
parseDelimitedFrom(InputStream) - Static method in class sentencepiece.SentencepieceModel.ModelProto
 
parseDelimitedFrom(InputStream) - Static method in class sentencepiece.SentencepieceModel.ModelProto.SentencePiece
 
parseDelimitedFrom(InputStream) - Static method in class sentencepiece.SentencepieceModel.NormalizerSpec
 
parseDelimitedFrom(InputStream) - Static method in class sentencepiece.SentencepieceModel.SelfTestData
 
parseDelimitedFrom(InputStream) - Static method in class sentencepiece.SentencepieceModel.SelfTestData.Sample
 
parseDelimitedFrom(InputStream) - Static method in class sentencepiece.SentencepieceModel.TrainerSpec
 
parseDelimitedFrom(InputStream, ExtensionRegistryLite) - Static method in class sentencepiece.SentencepieceModel.ModelProto
 
parseDelimitedFrom(InputStream, ExtensionRegistryLite) - Static method in class sentencepiece.SentencepieceModel.ModelProto.SentencePiece
 
parseDelimitedFrom(InputStream, ExtensionRegistryLite) - Static method in class sentencepiece.SentencepieceModel.NormalizerSpec
 
parseDelimitedFrom(InputStream, ExtensionRegistryLite) - Static method in class sentencepiece.SentencepieceModel.SelfTestData
 
parseDelimitedFrom(InputStream, ExtensionRegistryLite) - Static method in class sentencepiece.SentencepieceModel.SelfTestData.Sample
 
parseDelimitedFrom(InputStream, ExtensionRegistryLite) - Static method in class sentencepiece.SentencepieceModel.TrainerSpec
 
parseFrom(byte[]) - Static method in class sentencepiece.SentencepieceModel.ModelProto
 
parseFrom(byte[]) - Static method in class sentencepiece.SentencepieceModel.ModelProto.SentencePiece
 
parseFrom(byte[]) - Static method in class sentencepiece.SentencepieceModel.NormalizerSpec
 
parseFrom(byte[]) - Static method in class sentencepiece.SentencepieceModel.SelfTestData
 
parseFrom(byte[]) - Static method in class sentencepiece.SentencepieceModel.SelfTestData.Sample
 
parseFrom(byte[]) - Static method in class sentencepiece.SentencepieceModel.TrainerSpec
 
parseFrom(byte[], ExtensionRegistryLite) - Static method in class sentencepiece.SentencepieceModel.ModelProto
 
parseFrom(byte[], ExtensionRegistryLite) - Static method in class sentencepiece.SentencepieceModel.ModelProto.SentencePiece
 
parseFrom(byte[], ExtensionRegistryLite) - Static method in class sentencepiece.SentencepieceModel.NormalizerSpec
 
parseFrom(byte[], ExtensionRegistryLite) - Static method in class sentencepiece.SentencepieceModel.SelfTestData
 
parseFrom(byte[], ExtensionRegistryLite) - Static method in class sentencepiece.SentencepieceModel.SelfTestData.Sample
 
parseFrom(byte[], ExtensionRegistryLite) - Static method in class sentencepiece.SentencepieceModel.TrainerSpec
 
parseFrom(ByteString) - Static method in class sentencepiece.SentencepieceModel.ModelProto
 
parseFrom(ByteString) - Static method in class sentencepiece.SentencepieceModel.ModelProto.SentencePiece
 
parseFrom(ByteString) - Static method in class sentencepiece.SentencepieceModel.NormalizerSpec
 
parseFrom(ByteString) - Static method in class sentencepiece.SentencepieceModel.SelfTestData
 
parseFrom(ByteString) - Static method in class sentencepiece.SentencepieceModel.SelfTestData.Sample
 
parseFrom(ByteString) - Static method in class sentencepiece.SentencepieceModel.TrainerSpec
 
parseFrom(ByteString, ExtensionRegistryLite) - Static method in class sentencepiece.SentencepieceModel.ModelProto
 
parseFrom(ByteString, ExtensionRegistryLite) - Static method in class sentencepiece.SentencepieceModel.ModelProto.SentencePiece
 
parseFrom(ByteString, ExtensionRegistryLite) - Static method in class sentencepiece.SentencepieceModel.NormalizerSpec
 
parseFrom(ByteString, ExtensionRegistryLite) - Static method in class sentencepiece.SentencepieceModel.SelfTestData
 
parseFrom(ByteString, ExtensionRegistryLite) - Static method in class sentencepiece.SentencepieceModel.SelfTestData.Sample
 
parseFrom(ByteString, ExtensionRegistryLite) - Static method in class sentencepiece.SentencepieceModel.TrainerSpec
 
parseFrom(CodedInputStream) - Static method in class sentencepiece.SentencepieceModel.ModelProto
 
parseFrom(CodedInputStream) - Static method in class sentencepiece.SentencepieceModel.ModelProto.SentencePiece
 
parseFrom(CodedInputStream) - Static method in class sentencepiece.SentencepieceModel.NormalizerSpec
 
parseFrom(CodedInputStream) - Static method in class sentencepiece.SentencepieceModel.SelfTestData
 
parseFrom(CodedInputStream) - Static method in class sentencepiece.SentencepieceModel.SelfTestData.Sample
 
parseFrom(CodedInputStream) - Static method in class sentencepiece.SentencepieceModel.TrainerSpec
 
parseFrom(CodedInputStream, ExtensionRegistryLite) - Static method in class sentencepiece.SentencepieceModel.ModelProto
 
parseFrom(CodedInputStream, ExtensionRegistryLite) - Static method in class sentencepiece.SentencepieceModel.ModelProto.SentencePiece
 
parseFrom(CodedInputStream, ExtensionRegistryLite) - Static method in class sentencepiece.SentencepieceModel.NormalizerSpec
 
parseFrom(CodedInputStream, ExtensionRegistryLite) - Static method in class sentencepiece.SentencepieceModel.SelfTestData
 
parseFrom(CodedInputStream, ExtensionRegistryLite) - Static method in class sentencepiece.SentencepieceModel.SelfTestData.Sample
 
parseFrom(CodedInputStream, ExtensionRegistryLite) - Static method in class sentencepiece.SentencepieceModel.TrainerSpec
 
parseFrom(InputStream) - Static method in class sentencepiece.SentencepieceModel.ModelProto
 
parseFrom(InputStream) - Static method in class sentencepiece.SentencepieceModel.ModelProto.SentencePiece
 
parseFrom(InputStream) - Static method in class sentencepiece.SentencepieceModel.NormalizerSpec
 
parseFrom(InputStream) - Static method in class sentencepiece.SentencepieceModel.SelfTestData
 
parseFrom(InputStream) - Static method in class sentencepiece.SentencepieceModel.SelfTestData.Sample
 
parseFrom(InputStream) - Static method in class sentencepiece.SentencepieceModel.TrainerSpec
 
parseFrom(InputStream, ExtensionRegistryLite) - Static method in class sentencepiece.SentencepieceModel.ModelProto
 
parseFrom(InputStream, ExtensionRegistryLite) - Static method in class sentencepiece.SentencepieceModel.ModelProto.SentencePiece
 
parseFrom(InputStream, ExtensionRegistryLite) - Static method in class sentencepiece.SentencepieceModel.NormalizerSpec
 
parseFrom(InputStream, ExtensionRegistryLite) - Static method in class sentencepiece.SentencepieceModel.SelfTestData
 
parseFrom(InputStream, ExtensionRegistryLite) - Static method in class sentencepiece.SentencepieceModel.SelfTestData.Sample
 
parseFrom(InputStream, ExtensionRegistryLite) - Static method in class sentencepiece.SentencepieceModel.TrainerSpec
 
parseFrom(ByteBuffer) - Static method in class sentencepiece.SentencepieceModel.ModelProto
 
parseFrom(ByteBuffer) - Static method in class sentencepiece.SentencepieceModel.ModelProto.SentencePiece
 
parseFrom(ByteBuffer) - Static method in class sentencepiece.SentencepieceModel.NormalizerSpec
 
parseFrom(ByteBuffer) - Static method in class sentencepiece.SentencepieceModel.SelfTestData
 
parseFrom(ByteBuffer) - Static method in class sentencepiece.SentencepieceModel.SelfTestData.Sample
 
parseFrom(ByteBuffer) - Static method in class sentencepiece.SentencepieceModel.TrainerSpec
 
parseFrom(ByteBuffer, ExtensionRegistryLite) - Static method in class sentencepiece.SentencepieceModel.ModelProto
 
parseFrom(ByteBuffer, ExtensionRegistryLite) - Static method in class sentencepiece.SentencepieceModel.ModelProto.SentencePiece
 
parseFrom(ByteBuffer, ExtensionRegistryLite) - Static method in class sentencepiece.SentencepieceModel.NormalizerSpec
 
parseFrom(ByteBuffer, ExtensionRegistryLite) - Static method in class sentencepiece.SentencepieceModel.SelfTestData
 
parseFrom(ByteBuffer, ExtensionRegistryLite) - Static method in class sentencepiece.SentencepieceModel.SelfTestData.Sample
 
parseFrom(ByteBuffer, ExtensionRegistryLite) - Static method in class sentencepiece.SentencepieceModel.TrainerSpec
 
parser() - Static method in class sentencepiece.SentencepieceModel.ModelProto
 
parser() - Static method in class sentencepiece.SentencepieceModel.ModelProto.SentencePiece
 
parser() - Static method in class sentencepiece.SentencepieceModel.NormalizerSpec
 
parser() - Static method in class sentencepiece.SentencepieceModel.SelfTestData
 
parser() - Static method in class sentencepiece.SentencepieceModel.SelfTestData.Sample
 
parser() - Static method in class sentencepiece.SentencepieceModel.TrainerSpec
 
PARSER - Static variable in class sentencepiece.SentencepieceModel.ModelProto
Deprecated.
PARSER - Static variable in class sentencepiece.SentencepieceModel.ModelProto.SentencePiece
Deprecated.
PARSER - Static variable in class sentencepiece.SentencepieceModel.NormalizerSpec
Deprecated.
PARSER - Static variable in class sentencepiece.SentencepieceModel.SelfTestData
Deprecated.
PARSER - Static variable in class sentencepiece.SentencepieceModel.SelfTestData.Sample
Deprecated.
PARSER - Static variable in class sentencepiece.SentencepieceModel.TrainerSpec
Deprecated.
PASHTO - com.yahoo.language.Language
Language tag "ps".
path() - Method in class com.yahoo.language.sentencepiece.SentencePieceConfig.Model
 
path(FileReference) - Method in class com.yahoo.language.sentencepiece.SentencePieceConfig.Model.Builder
 
PERSIAN - com.yahoo.language.Language
Language tag "fa".
PIECE_FIELD_NUMBER - Static variable in class sentencepiece.SentencepieceModel.ModelProto.SentencePiece
 
PIECES_FIELD_NUMBER - Static variable in class sentencepiece.SentencepieceModel.ModelProto
 
POLISH - com.yahoo.language.Language
Language tag "pl".
PORTUGUESE - com.yahoo.language.Language
Language tag "pt".
PRECOMPILED_CHARSMAP_FIELD_NUMBER - Static variable in class sentencepiece.SentencepieceModel.NormalizerSpec
 
ProcessingException - Exception in com.yahoo.language.process
Exception class indicating that a fatal error occured during linguistic processing.
ProcessingException(String) - Constructor for exception com.yahoo.language.process.ProcessingException
 
ProcessingException(String, Throwable) - Constructor for exception com.yahoo.language.process.ProcessingException
 
PUNCTUATION - com.yahoo.language.process.TokenType
 
PUNJABI - com.yahoo.language.Language
Language tag "pa".

Q

QUECHUA - com.yahoo.language.Language
Language tag "qu".

R

registerAllExtensions(ExtensionRegistry) - Static method in class sentencepiece.SentencepieceModel
 
registerAllExtensions(ExtensionRegistryLite) - Static method in class sentencepiece.SentencepieceModel
 
remove() - Method in class com.yahoo.language.process.GramSplitter.GramSplitterIterator
 
remove(int) - Method in class com.yahoo.language.process.StemList
 
REMOVE_EXTRA_WHITESPACES_FIELD_NUMBER - Static variable in class sentencepiece.SentencepieceModel.NormalizerSpec
 
removePieces(int) - Method in class sentencepiece.SentencepieceModel.ModelProto.Builder
Sentence pieces with scores.
removeSamples(int) - Method in class sentencepiece.SentencepieceModel.SelfTestData.Builder
repeated .sentencepiece.SelfTestData.Sample samples = 1;
replacement() - Method in class com.yahoo.language.process.SpecialTokens.Token
Returns the token to replace occurrences of this by, which equals token() unless this has a replacement.
REQUIRED_CHARS_FIELD_NUMBER - Static variable in class sentencepiece.SentencepieceModel.TrainerSpec
 
RHAETO_ROMANCE - com.yahoo.language.Language
Language tag "rm".
ROMANIAN - com.yahoo.language.Language
Language tag "ro".
RUNIC - com.yahoo.language.process.TokenScript
 
RUSSIAN - com.yahoo.language.Language
Language tag "ru".

S

SAMOAN - com.yahoo.language.Language
Language tag "sm".
SAMPLES_FIELD_NUMBER - Static variable in class sentencepiece.SentencepieceModel.SelfTestData
 
SANGHO - com.yahoo.language.Language
Language tag "sg".
SANSKRIT - com.yahoo.language.Language
Language tag "sa".
SCORE_FIELD_NUMBER - Static variable in class sentencepiece.SentencepieceModel.ModelProto.SentencePiece
 
scoring() - Method in class com.yahoo.language.sentencepiece.SentencePieceConfig
 
scoring(SentencePieceConfig.Scoring.Enum) - Method in class com.yahoo.language.sentencepiece.SentencePieceConfig.Builder
 
Scoring - Enum in com.yahoo.language.sentencepiece
The scoring strategy to use for picking segments
Scoring() - Constructor for class com.yahoo.language.sentencepiece.SentencePieceConfig.Scoring
 
Scoring(SentencePieceConfig.Scoring.Enum) - Constructor for class com.yahoo.language.sentencepiece.SentencePieceConfig.Scoring
 
SCOTS_GAELIC - com.yahoo.language.Language
Language tag "gd".
SEED_SENTENCEPIECE_SIZE_FIELD_NUMBER - Static variable in class sentencepiece.SentencepieceModel.TrainerSpec
 
segment(String, Language) - Method in interface com.yahoo.language.process.Segmenter
Split input-string into tokens, and returned a list of tokens in unprocessed form (i.e.
segment(String, Language) - Method in class com.yahoo.language.process.SegmenterImpl
 
segment(String, Language) - Method in class com.yahoo.language.sentencepiece.SentencePieceEncoder
Segments the given text into token segments using the SentencePiece algorithm
Segmenter - Interface in com.yahoo.language.process
Interface providing segmentation, i.e.
SEGMENTER - com.yahoo.language.Linguistics.Component
 
SegmenterImpl - Class in com.yahoo.language.process
 
SegmenterImpl(Tokenizer) - Constructor for class com.yahoo.language.process.SegmenterImpl
 
SELF_TEST_DATA_FIELD_NUMBER - Static variable in class sentencepiece.SentencepieceModel.ModelProto
 
SELF_TEST_SAMPLE_SIZE_FIELD_NUMBER - Static variable in class sentencepiece.SentencepieceModel.TrainerSpec
 
sentencepiece - package sentencepiece
 
SentencePieceConfig - Class in com.yahoo.language.sentencepiece
This class represents the root node of sentence-piece Copyright Yahoo.
SentencePieceConfig(SentencePieceConfig.Builder) - Constructor for class com.yahoo.language.sentencepiece.SentencePieceConfig
 
SentencePieceConfig.Builder - Class in com.yahoo.language.sentencepiece
 
SentencePieceConfig.Model - Class in com.yahoo.language.sentencepiece
This class represents sentence-piece.model[]
SentencePieceConfig.Model.Builder - Class in com.yahoo.language.sentencepiece
 
SentencePieceConfig.Producer - Interface in com.yahoo.language.sentencepiece
 
SentencePieceConfig.Scoring - Class in com.yahoo.language.sentencepiece
This class represents sentence-piece.scoring The scoring strategy to use when picking a segmentation.
SentencePieceConfig.Scoring.Enum - Enum in com.yahoo.language.sentencepiece
 
SentencePieceEncoder - Class in com.yahoo.language.sentencepiece
Integration with https://github.com/google/sentencepiece through http://docs.djl.ai/extensions/sentencepiece/index.html SentencePiece is a language-agnostic tokenizer for neural nets.
SentencePieceEncoder(SentencePieceConfig) - Constructor for class com.yahoo.language.sentencepiece.SentencePieceEncoder
 
SentencePieceEncoder(SentencePieceEncoder.Builder) - Constructor for class com.yahoo.language.sentencepiece.SentencePieceEncoder
 
SentencePieceEncoder.Builder - Class in com.yahoo.language.sentencepiece
 
SentencepieceModel - Class in sentencepiece
 
SentencepieceModel.ModelProto - Class in sentencepiece
ModelProto stores model parameters.
SentencepieceModel.ModelProto.Builder - Class in sentencepiece
ModelProto stores model parameters.
SentencepieceModel.ModelProto.SentencePiece - Class in sentencepiece
Protobuf type sentencepiece.ModelProto.SentencePiece
SentencepieceModel.ModelProto.SentencePiece.Builder - Class in sentencepiece
Protobuf type sentencepiece.ModelProto.SentencePiece
SentencepieceModel.ModelProto.SentencePiece.Type - Enum in sentencepiece
Protobuf enum sentencepiece.ModelProto.SentencePiece.Type
SentencepieceModel.ModelProto.SentencePieceOrBuilder - Interface in sentencepiece
 
SentencepieceModel.ModelProtoOrBuilder - Interface in sentencepiece
 
SentencepieceModel.NormalizerSpec - Class in sentencepiece
NormalizerSpec encodes a various parameters for string normalizaiton
SentencepieceModel.NormalizerSpec.Builder - Class in sentencepiece
NormalizerSpec encodes a various parameters for string normalizaiton
SentencepieceModel.NormalizerSpecOrBuilder - Interface in sentencepiece
 
SentencepieceModel.SelfTestData - Class in sentencepiece
Proto to store samples for self-testing.
SentencepieceModel.SelfTestData.Builder - Class in sentencepiece
Proto to store samples for self-testing.
SentencepieceModel.SelfTestData.Sample - Class in sentencepiece
Protobuf type sentencepiece.SelfTestData.Sample
SentencepieceModel.SelfTestData.Sample.Builder - Class in sentencepiece
Protobuf type sentencepiece.SelfTestData.Sample
SentencepieceModel.SelfTestData.SampleOrBuilder - Interface in sentencepiece
 
SentencepieceModel.SelfTestDataOrBuilder - Interface in sentencepiece
 
SentencepieceModel.TrainerSpec - Class in sentencepiece
TrainerSpec encodes a various parameters for SentencePiece training.
SentencepieceModel.TrainerSpec.Builder - Class in sentencepiece
TrainerSpec encodes a various parameters for SentencePiece training.
SentencepieceModel.TrainerSpec.ModelType - Enum in sentencepiece
Model type.
SentencepieceModel.TrainerSpecOrBuilder - Interface in sentencepiece
 
SERBIAN - com.yahoo.language.Language
Language tag "sr".
SERBO_CROATIAN - com.yahoo.language.Language
Language tag "s".
SESOTHO - com.yahoo.language.Language
Language tag "st".
set(int, String) - Method in class com.yahoo.language.process.StemList
 
setAcceptLanguage(int, String) - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
List of the languages this model can accept.
setAddDummyPrefix(boolean) - Method in class sentencepiece.SentencepieceModel.NormalizerSpec.Builder
Adds dummy whitespace at the beginning of text in order to treat "world" in "world" and "hello world" in the same way.
setAllowWhitespaceOnlyPieces(boolean) - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
Allows pieces that only contain whitespaces instead of appearing only as prefix or suffix of other pieces.
setApplyOnRestart(boolean) - Method in class com.yahoo.language.sentencepiece.SentencePieceConfig.Builder
 
setBosId(int) - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
<s>
setBosPiece(String) - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
optional string bos_piece = 46 [default = "<s>"];
setBosPieceBytes(ByteString) - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
optional string bos_piece = 46 [default = "<s>"];
setByteFallback(boolean) - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
Decomposes unknown pieces into UTF-8 bytes.
setCharacterCoverage(float) - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
///////////////////////////////////////////////////////////////// Training parameters.
setCollapseUnknowns(boolean) - Method in class com.yahoo.language.sentencepiece.SentencePieceEncoder.Builder
Sets whether consecutive unknown character should be collapsed into one large unknown token (default) or be returned as single character tokens.
setControlSymbols(int, String) - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
///////////////////////////////////////////////////////////////// Vocabulary management Defines control symbols used as an indicator to change the behavior of the decoder.
setDenormalizerSpec(SentencepieceModel.NormalizerSpec) - Method in class sentencepiece.SentencepieceModel.ModelProto.Builder
Spec for text de-normalization.
setDenormalizerSpec(SentencepieceModel.NormalizerSpec.Builder) - Method in class sentencepiece.SentencepieceModel.ModelProto.Builder
Spec for text de-normalization.
setEosId(int) - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
</s>
setEosPiece(String) - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
optional string eos_piece = 47 [default = "</s>"];
setEosPieceBytes(ByteString) - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
optional string eos_piece = 47 [default = "</s>"];
setEscapeWhitespaces(boolean) - Method in class sentencepiece.SentencepieceModel.NormalizerSpec.Builder
Replaces whitespace with meta symbol.
setExpected(String) - Method in class sentencepiece.SentencepieceModel.SelfTestData.Sample.Builder
optional string expected = 2;
setExpectedBytes(ByteString) - Method in class sentencepiece.SentencepieceModel.SelfTestData.Sample.Builder
optional string expected = 2;
setExtension(GeneratedMessage.GeneratedExtension<SentencepieceModel.ModelProto, List<Type>>, int, Type) - Method in class sentencepiece.SentencepieceModel.ModelProto.Builder
 
setExtension(GeneratedMessage.GeneratedExtension<SentencepieceModel.ModelProto, Type>, Type) - Method in class sentencepiece.SentencepieceModel.ModelProto.Builder
 
setExtension(GeneratedMessage.GeneratedExtension<SentencepieceModel.ModelProto.SentencePiece, List<Type>>, int, Type) - Method in class sentencepiece.SentencepieceModel.ModelProto.SentencePiece.Builder
 
setExtension(GeneratedMessage.GeneratedExtension<SentencepieceModel.ModelProto.SentencePiece, Type>, Type) - Method in class sentencepiece.SentencepieceModel.ModelProto.SentencePiece.Builder
 
setExtension(GeneratedMessage.GeneratedExtension<SentencepieceModel.NormalizerSpec, List<Type>>, int, Type) - Method in class sentencepiece.SentencepieceModel.NormalizerSpec.Builder
 
setExtension(GeneratedMessage.GeneratedExtension<SentencepieceModel.NormalizerSpec, Type>, Type) - Method in class sentencepiece.SentencepieceModel.NormalizerSpec.Builder
 
setExtension(GeneratedMessage.GeneratedExtension<SentencepieceModel.SelfTestData, List<Type>>, int, Type) - Method in class sentencepiece.SentencepieceModel.SelfTestData.Builder
 
setExtension(GeneratedMessage.GeneratedExtension<SentencepieceModel.SelfTestData, Type>, Type) - Method in class sentencepiece.SentencepieceModel.SelfTestData.Builder
 
setExtension(GeneratedMessage.GeneratedExtension<SentencepieceModel.TrainerSpec, List<Type>>, int, Type) - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
 
setExtension(GeneratedMessage.GeneratedExtension<SentencepieceModel.TrainerSpec, Type>, Type) - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
 
setField(Descriptors.FieldDescriptor, Object) - Method in class sentencepiece.SentencepieceModel.ModelProto.Builder
 
setField(Descriptors.FieldDescriptor, Object) - Method in class sentencepiece.SentencepieceModel.ModelProto.SentencePiece.Builder
 
setField(Descriptors.FieldDescriptor, Object) - Method in class sentencepiece.SentencepieceModel.NormalizerSpec.Builder
 
setField(Descriptors.FieldDescriptor, Object) - Method in class sentencepiece.SentencepieceModel.SelfTestData.Builder
 
setField(Descriptors.FieldDescriptor, Object) - Method in class sentencepiece.SentencepieceModel.SelfTestData.Sample.Builder
 
setField(Descriptors.FieldDescriptor, Object) - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
 
setHardVocabLimit(boolean) - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
`vocab_size` is treated as hard limit.
setInput(int, String) - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
///////////////////////////////////////////////////////////////// General parameters Input corpus files.
setInput(String) - Method in class sentencepiece.SentencepieceModel.SelfTestData.Sample.Builder
optional string input = 1;
setInputBytes(ByteString) - Method in class sentencepiece.SentencepieceModel.SelfTestData.Sample.Builder
optional string input = 1;
setInputFormat(String) - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
Input corpus format: "text": one-sentence-per-line text format (default) "tsv": sentence <tab> freq
setInputFormatBytes(ByteString) - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
Input corpus format: "text": one-sentence-per-line text format (default) "tsv": sentence <tab> freq
setInputSentenceSize(long) - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
Maximum size of sentences the trainer loads from `input` parameter.
setMaxSentenceLength(int) - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
The maximum sentence length in byte.
setMaxSentencepieceLength(int) - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
///////////////////////////////////////////////////////////////// SentencePiece parameters which control the shapes of sentence piece.
setMiningSentenceSize(int) - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
Deprecated.
setModelPrefix(String) - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
Output model file prefix.
setModelPrefixBytes(ByteString) - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
Output model file prefix.
setModelType(SentencepieceModel.TrainerSpec.ModelType) - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
optional .sentencepiece.TrainerSpec.ModelType model_type = 3 [default = UNIGRAM];
setName(String) - Method in class sentencepiece.SentencepieceModel.NormalizerSpec.Builder
name of normalization rule.
setNameBytes(ByteString) - Method in class sentencepiece.SentencepieceModel.NormalizerSpec.Builder
name of normalization rule.
setNormalizationRuleTsv(String) - Method in class sentencepiece.SentencepieceModel.NormalizerSpec.Builder
Custom normalization rule file in TSV format.
setNormalizationRuleTsvBytes(ByteString) - Method in class sentencepiece.SentencepieceModel.NormalizerSpec.Builder
Custom normalization rule file in TSV format.
setNormalizerSpec(SentencepieceModel.NormalizerSpec) - Method in class sentencepiece.SentencepieceModel.ModelProto.Builder
Spec for text normalization.
setNormalizerSpec(SentencepieceModel.NormalizerSpec.Builder) - Method in class sentencepiece.SentencepieceModel.ModelProto.Builder
Spec for text normalization.
setNumSubIterations(int) - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
Number of EM sub iterations.
setNumThreads(int) - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
Number of threads in the training.
setPadId(int) - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
<pad> (padding)
setPadPiece(String) - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
optional string pad_piece = 48 [default = "<pad>"];
setPadPieceBytes(ByteString) - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
optional string pad_piece = 48 [default = "<pad>"];
setPiece(String) - Method in class sentencepiece.SentencepieceModel.ModelProto.SentencePiece.Builder
piece must not be empty.
setPieceBytes(ByteString) - Method in class sentencepiece.SentencepieceModel.ModelProto.SentencePiece.Builder
piece must not be empty.
setPieces(int, SentencepieceModel.ModelProto.SentencePiece) - Method in class sentencepiece.SentencepieceModel.ModelProto.Builder
Sentence pieces with scores.
setPieces(int, SentencepieceModel.ModelProto.SentencePiece.Builder) - Method in class sentencepiece.SentencepieceModel.ModelProto.Builder
Sentence pieces with scores.
setPrecompiledCharsmap(ByteString) - Method in class sentencepiece.SentencepieceModel.NormalizerSpec.Builder
Pre-compiled normalization rule created by Builder::GetPrecompiledCharsMap() or Builder::CompileCharsMap() method.
setRemoveExtraWhitespaces(boolean) - Method in class sentencepiece.SentencepieceModel.NormalizerSpec.Builder
Removes leading, trailing, and duplicate internal whitespace.
setRepeatedField(Descriptors.FieldDescriptor, int, Object) - Method in class sentencepiece.SentencepieceModel.ModelProto.Builder
 
setRepeatedField(Descriptors.FieldDescriptor, int, Object) - Method in class sentencepiece.SentencepieceModel.ModelProto.SentencePiece.Builder
 
setRepeatedField(Descriptors.FieldDescriptor, int, Object) - Method in class sentencepiece.SentencepieceModel.NormalizerSpec.Builder
 
setRepeatedField(Descriptors.FieldDescriptor, int, Object) - Method in class sentencepiece.SentencepieceModel.SelfTestData.Builder
 
setRepeatedField(Descriptors.FieldDescriptor, int, Object) - Method in class sentencepiece.SentencepieceModel.SelfTestData.Sample.Builder
 
setRepeatedField(Descriptors.FieldDescriptor, int, Object) - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
 
setRequiredChars(String) - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
Defines required characters.
setRequiredCharsBytes(ByteString) - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
Defines required characters.
setSamples(int, SentencepieceModel.SelfTestData.Sample) - Method in class sentencepiece.SentencepieceModel.SelfTestData.Builder
repeated .sentencepiece.SelfTestData.Sample samples = 1;
setSamples(int, SentencepieceModel.SelfTestData.Sample.Builder) - Method in class sentencepiece.SentencepieceModel.SelfTestData.Builder
repeated .sentencepiece.SelfTestData.Sample samples = 1;
setScore(float) - Method in class sentencepiece.SentencepieceModel.ModelProto.SentencePiece.Builder
optional float score = 2;
setScoring(Scoring) - Method in class com.yahoo.language.sentencepiece.SentencePieceEncoder.Builder
Sets the scoring strategy to use when picking a segmentation.
setSeedSentencepieceSize(int) - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
The size of seed sentencepieces.
setSelfTestData(SentencepieceModel.SelfTestData) - Method in class sentencepiece.SentencepieceModel.ModelProto.Builder
Stores sample input and its expected segmentation to verify the model.
setSelfTestData(SentencepieceModel.SelfTestData.Builder) - Method in class sentencepiece.SentencepieceModel.ModelProto.Builder
Stores sample input and its expected segmentation to verify the model.
setSelfTestSampleSize(int) - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
Size of self-test samples, which are encoded in the model file.
setShrinkingFactor(float) - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
In every EM sub-iterations, keeps top `shrinking_factor` * `current sentencepieces size` with respect to the loss of the sentence piece.
setShuffleInputSentence(boolean) - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
optional bool shuffle_input_sentence = 19 [default = true];
setSplitByNumber(boolean) - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
When `split_by_number` is true, put a boundary between number and non-number transition.
setSplitByUnicodeScript(boolean) - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
Uses Unicode script to split sentence pieces.
setSplitByWhitespace(boolean) - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
Use a white space to split sentence pieces.
setSplitDigits(boolean) - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
Split all digits (0-9) into separate pieces.
SETSWANA - com.yahoo.language.Language
Language tag "tn".
setTrainerSpec(SentencepieceModel.TrainerSpec) - Method in class sentencepiece.SentencepieceModel.ModelProto.Builder
Spec used to generate this model file.
setTrainerSpec(SentencepieceModel.TrainerSpec.Builder) - Method in class sentencepiece.SentencepieceModel.ModelProto.Builder
Spec used to generate this model file.
setTrainExtremelyLargeCorpus(boolean) - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
Increase bit depth to allow unigram model training on large (>10M sentences) corpora.
setTrainingSentenceSize(int) - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
Deprecated.
setTreatWhitespaceAsSuffix(boolean) - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
Adds whitespace symbol (_) as a suffix instead of prefix.
setType(SentencepieceModel.ModelProto.SentencePiece.Type) - Method in class sentencepiece.SentencepieceModel.ModelProto.SentencePiece.Builder
optional .sentencepiece.ModelProto.SentencePiece.Type type = 3 [default = NORMAL];
setUnkId(int) - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
///////////////////////////////////////////////////////////////// Reserved special meta tokens.
setUnknownFields(UnknownFieldSet) - Method in class sentencepiece.SentencepieceModel.ModelProto.Builder
 
setUnknownFields(UnknownFieldSet) - Method in class sentencepiece.SentencepieceModel.ModelProto.SentencePiece.Builder
 
setUnknownFields(UnknownFieldSet) - Method in class sentencepiece.SentencepieceModel.NormalizerSpec.Builder
 
setUnknownFields(UnknownFieldSet) - Method in class sentencepiece.SentencepieceModel.SelfTestData.Builder
 
setUnknownFields(UnknownFieldSet) - Method in class sentencepiece.SentencepieceModel.SelfTestData.Sample.Builder
 
setUnknownFields(UnknownFieldSet) - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
 
setUnkPiece(String) - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
optional string unk_piece = 45 [default = "<unk>"];
setUnkPieceBytes(ByteString) - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
optional string unk_piece = 45 [default = "<unk>"];
setUnkSurface(String) - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
Encodes <unk> into U+2047 (DOUBLE QUESTION MARK), since this character can be useful both for user and developer.
setUnkSurfaceBytes(ByteString) - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
Encodes <unk> into U+2047 (DOUBLE QUESTION MARK), since this character can be useful both for user and developer.
setUseAllVocab(boolean) - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
use all symbols for vocab extraction.
setUserDefinedSymbols(int, String) - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
Defines user defined symbols.
setVocabSize(int) - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
Vocabulary size.
setVocabularyOutputPieceScore(boolean) - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
When creating the vocabulary file, defines whether or not to additionally output the score for each piece.
SHAVIAN - com.yahoo.language.process.TokenScript
 
SHONA - com.yahoo.language.Language
Language tag "sn".
SHORTEST - com.yahoo.language.process.StemMode
 
SHRINKING_FACTOR_FIELD_NUMBER - Static variable in class sentencepiece.SentencepieceModel.TrainerSpec
 
SHUFFLE_INPUT_SENTENCE_FIELD_NUMBER - Static variable in class sentencepiece.SentencepieceModel.TrainerSpec
 
SICHUAN_YI - com.yahoo.language.Language
Language tag "ii".
SINDHI - com.yahoo.language.Language
Language tag "sd".
SINHALA - com.yahoo.language.process.TokenScript
 
SINHALESE - com.yahoo.language.Language
Language tag "si".
SISWATI - com.yahoo.language.Language
Language tag "ss".
size() - Method in class com.yahoo.language.process.StemList
 
SLOVAK - com.yahoo.language.Language
Language tag "sk".
SLOVENIAN - com.yahoo.language.Language
Language tag "sl".
SOMALI - com.yahoo.language.Language
Language tag "so".
SPACE - com.yahoo.language.process.TokenType
 
SPANISH - com.yahoo.language.Language
Language tag "es".
SpecialTokenRegistry - Class in com.yahoo.language.process
Immutable named lists of "special tokens" - strings which should override the normal tokenizer semantics and be tokenized into a single token.
SpecialTokenRegistry() - Constructor for class com.yahoo.language.process.SpecialTokenRegistry
Creates an empty special token registry
SpecialTokenRegistry(SpecialtokensConfig) - Constructor for class com.yahoo.language.process.SpecialTokenRegistry
Create a special token registry from a configuration object.
SpecialTokenRegistry(List<SpecialTokens>) - Constructor for class com.yahoo.language.process.SpecialTokenRegistry
 
SpecialTokens - Class in com.yahoo.language.process
An immutable list of special tokens - strings which should override the normal tokenizer semantics and be tokenized into a single token.
SpecialTokens(String, List<SpecialTokens.Token>) - Constructor for class com.yahoo.language.process.SpecialTokens
 
SpecialTokens.Token - Class in com.yahoo.language.process
An immutable special token
split(String, int) - Method in class com.yahoo.language.process.GramSplitter
Splits the input into grams of size n and returns an iterator over grams represented as [start index,length] pairs into the input string.
SPLIT_BY_NUMBER_FIELD_NUMBER - Static variable in class sentencepiece.SentencepieceModel.TrainerSpec
 
SPLIT_BY_UNICODE_SCRIPT_FIELD_NUMBER - Static variable in class sentencepiece.SentencepieceModel.TrainerSpec
 
SPLIT_BY_WHITESPACE_FIELD_NUMBER - Static variable in class sentencepiece.SentencepieceModel.TrainerSpec
 
SPLIT_DIGITS_FIELD_NUMBER - Static variable in class sentencepiece.SentencepieceModel.TrainerSpec
 
stem(String, StemMode, Language) - Method in interface com.yahoo.language.process.Stemmer
Stem input according to specified stemming mode.
stem(String, StemMode, Language) - Method in class com.yahoo.language.process.StemmerImpl
 
StemList - Class in com.yahoo.language.process
A list of strings which does not allow for duplicate elements.
StemList() - Constructor for class com.yahoo.language.process.StemList
 
StemList(String...) - Constructor for class com.yahoo.language.process.StemList
 
Stemmer - Interface in com.yahoo.language.process
Interface providing stemming of single words.
STEMMER - com.yahoo.language.Linguistics.Component
 
StemmerImpl - Class in com.yahoo.language.process
 
StemmerImpl(Tokenizer) - Constructor for class com.yahoo.language.process.StemmerImpl
 
StemMode - Enum in com.yahoo.language.process
An enum of the stemming modes which can be requested.
SUNDANESE - com.yahoo.language.Language
Language tag "su".
SWAHILI - com.yahoo.language.Language
Language tag "sw".
SWEDISH - com.yahoo.language.Language
Language tag "sv".
SYLOTINAGRI - com.yahoo.language.process.TokenScript
 
SYMBOL - com.yahoo.language.process.TokenType
 
SYRIAC - com.yahoo.language.Language
Language tag "syr".
SYRIAC - com.yahoo.language.process.TokenScript
 

T

TAGALOG - com.yahoo.language.Language
Language tag "fil".
TAGALOG - com.yahoo.language.process.TokenScript
 
TAGBANWA - com.yahoo.language.process.TokenScript
 
TAILE - com.yahoo.language.process.TokenScript
 
TAILUE - com.yahoo.language.process.TokenScript
 
TAJIK - com.yahoo.language.Language
Language tag "tg".
TAMIL - com.yahoo.language.Language
Language tag "ta".
TAMIL - com.yahoo.language.process.TokenScript
 
TATAR - com.yahoo.language.Language
Language tag "tt".
TELUGU - com.yahoo.language.Language
Language tag "te".
TELUGU - com.yahoo.language.process.TokenScript
 
THAANA - com.yahoo.language.process.TokenScript
 
THAI - com.yahoo.language.Language
Language tag "th".
THAI - com.yahoo.language.process.TokenScript
 
throwsOnUse - Static variable in interface com.yahoo.language.process.Encoder
An instance of this which throws IllegalStateException if attempted used
TIBETAN - com.yahoo.language.Language
Language tag "bo".
TIBETAN - com.yahoo.language.process.TokenScript
 
TIFINAGH - com.yahoo.language.process.TokenScript
 
TIGRINYA - com.yahoo.language.Language
Language tag "ti".
toBuilder() - Method in class sentencepiece.SentencepieceModel.ModelProto.SentencePiece
 
toBuilder() - Method in class sentencepiece.SentencepieceModel.ModelProto
 
toBuilder() - Method in class sentencepiece.SentencepieceModel.NormalizerSpec
 
toBuilder() - Method in class sentencepiece.SentencepieceModel.SelfTestData.Sample
 
toBuilder() - Method in class sentencepiece.SentencepieceModel.SelfTestData
 
toBuilder() - Method in class sentencepiece.SentencepieceModel.TrainerSpec
 
toExtractedList() - Method in class com.yahoo.language.process.GramSplitter.GramSplitterIterator
Convenience list which splits the remaining items in this iterator into a list of gram strings
token() - Method in class com.yahoo.language.process.SpecialTokens.Token
Returns the special token
Token - Interface in com.yahoo.language.process
A single token produced by the tokenizer.
Token(String) - Constructor for class com.yahoo.language.process.SpecialTokens.Token
Creates a special token
Token(String, String) - Constructor for class com.yahoo.language.process.SpecialTokens.Token
Creates a special token which will be represented by the given replacement token
tokenize(String, boolean) - Method in class com.yahoo.language.process.SpecialTokens
Returns the special token starting at the start of the given string, or null if no special token starts at this string
tokenize(String, Language, StemMode, boolean) - Method in interface com.yahoo.language.process.Tokenizer
Returns the tokens produced from an input string under the rules of the given Language and additional options
Tokenizer - Interface in com.yahoo.language.process
Language-sensitive tokenization of a text string.
TOKENIZER - com.yahoo.language.Linguistics.Component
 
TokenScript - Enum in com.yahoo.language.process
List of token scripts (e.g.
TokenType - Enum in com.yahoo.language.process
An enumeration of token types.
toLowerCase(String) - Static method in class com.yahoo.language.LinguisticsCase
The lower casing method to use in Vespa when doing language independent processing of natural language data.
TONGA - com.yahoo.language.Language
Language tag "to".
toString() - Method in class com.yahoo.language.process.SpecialTokens.Token
 
TRAIN_EXTREMELY_LARGE_CORPUS_FIELD_NUMBER - Static variable in class sentencepiece.SentencepieceModel.TrainerSpec
 
TRAINER_SPEC_FIELD_NUMBER - Static variable in class sentencepiece.SentencepieceModel.ModelProto
 
TRAINING_SENTENCE_SIZE_FIELD_NUMBER - Static variable in class sentencepiece.SentencepieceModel.TrainerSpec
 
Transformer - Interface in com.yahoo.language.process
Interface for providers of text transformations such as accent removal.
TRANSFORMER - com.yahoo.language.Linguistics.Component
 
TREAT_WHITESPACE_AS_SUFFIX_FIELD_NUMBER - Static variable in class sentencepiece.SentencepieceModel.TrainerSpec
 
TSONGA - com.yahoo.language.Language
Language tag "ts".
TURKISH - com.yahoo.language.Language
Language tag "tr".
TURKMEN - com.yahoo.language.Language
Language tag "tk".
TWI - com.yahoo.language.Language
Language tag "tw".
TYPE_FIELD_NUMBER - Static variable in class sentencepiece.SentencepieceModel.ModelProto.SentencePiece
 

U

UGARITIC - com.yahoo.language.Language
Language tag "uga".
UGARITIC - com.yahoo.language.process.TokenScript
 
UIGHUR - com.yahoo.language.Language
Language tag "ug".
UKRAINIAN - com.yahoo.language.Language
Language tag "uk".
UNIGRAM - sentencepiece.SentencepieceModel.TrainerSpec.ModelType
Unigram language model with dynamic algorithm
UNIGRAM_VALUE - Static variable in enum sentencepiece.SentencepieceModel.TrainerSpec.ModelType
Unigram language model with dynamic algorithm
UNK_ID_FIELD_NUMBER - Static variable in class sentencepiece.SentencepieceModel.TrainerSpec
 
UNK_PIECE_FIELD_NUMBER - Static variable in class sentencepiece.SentencepieceModel.TrainerSpec
 
UNK_SURFACE_FIELD_NUMBER - Static variable in class sentencepiece.SentencepieceModel.TrainerSpec
 
UNKNOWN - com.yahoo.language.Language
Language tag "un".
UNKNOWN - com.yahoo.language.process.TokenScript
 
UNKNOWN - com.yahoo.language.process.TokenType
 
UNKNOWN - sentencepiece.SentencepieceModel.ModelProto.SentencePiece.Type
unknown symbol.
UNKNOWN_VALUE - Static variable in enum sentencepiece.SentencepieceModel.ModelProto.SentencePiece.Type
unknown symbol.
UNUSED - sentencepiece.SentencepieceModel.ModelProto.SentencePiece.Type
this piece is not used.
UNUSED_VALUE - Static variable in enum sentencepiece.SentencepieceModel.ModelProto.SentencePiece.Type
this piece is not used.
URDU - com.yahoo.language.Language
Language tag "ur".
USE_ALL_VOCAB_FIELD_NUMBER - Static variable in class sentencepiece.SentencepieceModel.TrainerSpec
 
USER_DEFINED - sentencepiece.SentencepieceModel.ModelProto.SentencePiece.Type
user defined symbols.
USER_DEFINED_SYMBOLS_FIELD_NUMBER - Static variable in class sentencepiece.SentencepieceModel.TrainerSpec
 
USER_DEFINED_VALUE - Static variable in enum sentencepiece.SentencepieceModel.ModelProto.SentencePiece.Type
user defined symbols.
UZBEK - com.yahoo.language.Language
Language tag "uz".

V

valueOf(int) - Static method in enum com.yahoo.language.process.TokenType
Translates this from the int code representation returned from TokenType.getValue()
valueOf(int) - Static method in enum sentencepiece.SentencepieceModel.ModelProto.SentencePiece.Type
Deprecated.
valueOf(int) - Static method in enum sentencepiece.SentencepieceModel.TrainerSpec.ModelType
Deprecated.
valueOf(Descriptors.EnumValueDescriptor) - Static method in enum sentencepiece.SentencepieceModel.ModelProto.SentencePiece.Type
Returns the enum constant of this type with the specified name.
valueOf(Descriptors.EnumValueDescriptor) - Static method in enum sentencepiece.SentencepieceModel.TrainerSpec.ModelType
Returns the enum constant of this type with the specified name.
valueOf(String) - Static method in enum com.yahoo.language.Language
Returns the enum constant of this type with the specified name.
valueOf(String) - Static method in enum com.yahoo.language.Linguistics.Component
Returns the enum constant of this type with the specified name.
valueOf(String) - Static method in enum com.yahoo.language.process.StemMode
Returns the enum constant of this type with the specified name.
valueOf(String) - Static method in enum com.yahoo.language.process.TokenScript
Returns the enum constant of this type with the specified name.
valueOf(String) - Static method in enum com.yahoo.language.process.TokenType
Returns the enum constant of this type with the specified name.
valueOf(String) - Static method in enum com.yahoo.language.sentencepiece.Scoring
Returns the enum constant of this type with the specified name.
valueOf(String) - Static method in enum com.yahoo.language.sentencepiece.SentencePieceConfig.Scoring.Enum
Returns the enum constant of this type with the specified name.
valueOf(String) - Static method in enum sentencepiece.SentencepieceModel.ModelProto.SentencePiece.Type
Returns the enum constant of this type with the specified name.
valueOf(String) - Static method in enum sentencepiece.SentencepieceModel.TrainerSpec.ModelType
Returns the enum constant of this type with the specified name.
values() - Static method in enum com.yahoo.language.Language
Returns an array containing the constants of this enum type, in the order they are declared.
values() - Static method in enum com.yahoo.language.Linguistics.Component
Returns an array containing the constants of this enum type, in the order they are declared.
values() - Static method in enum com.yahoo.language.process.StemMode
Returns an array containing the constants of this enum type, in the order they are declared.
values() - Static method in enum com.yahoo.language.process.TokenScript
Returns an array containing the constants of this enum type, in the order they are declared.
values() - Static method in enum com.yahoo.language.process.TokenType
Returns an array containing the constants of this enum type, in the order they are declared.
values() - Static method in enum com.yahoo.language.sentencepiece.Scoring
Returns an array containing the constants of this enum type, in the order they are declared.
values() - Static method in enum com.yahoo.language.sentencepiece.SentencePieceConfig.Scoring.Enum
Returns an array containing the constants of this enum type, in the order they are declared.
values() - Static method in enum sentencepiece.SentencepieceModel.ModelProto.SentencePiece.Type
Returns an array containing the constants of this enum type, in the order they are declared.
values() - Static method in enum sentencepiece.SentencepieceModel.TrainerSpec.ModelType
Returns an array containing the constants of this enum type, in the order they are declared.
VIETNAMESE - com.yahoo.language.Language
Language tag "vi".
VIETNAMESE - com.yahoo.language.process.TokenScript
 
VOCAB_SIZE_FIELD_NUMBER - Static variable in class sentencepiece.SentencepieceModel.TrainerSpec
 
VOCABULARY_OUTPUT_PIECE_SCORE_FIELD_NUMBER - Static variable in class sentencepiece.SentencepieceModel.TrainerSpec
 
VOLAPUK - com.yahoo.language.Language
Language tag "vo".

W

WELSH - com.yahoo.language.Language
Language tag "cy".
WOLOF - com.yahoo.language.Language
Language tag "wo".
WORD - sentencepiece.SentencepieceModel.TrainerSpec.ModelType
Delimitered by whitespace.
WORD_VALUE - Static variable in enum sentencepiece.SentencepieceModel.TrainerSpec.ModelType
Delimitered by whitespace.
writeTo(CodedOutputStream) - Method in class sentencepiece.SentencepieceModel.ModelProto.SentencePiece
 
writeTo(CodedOutputStream) - Method in class sentencepiece.SentencepieceModel.ModelProto
 
writeTo(CodedOutputStream) - Method in class sentencepiece.SentencepieceModel.NormalizerSpec
 
writeTo(CodedOutputStream) - Method in class sentencepiece.SentencepieceModel.SelfTestData.Sample
 
writeTo(CodedOutputStream) - Method in class sentencepiece.SentencepieceModel.SelfTestData
 
writeTo(CodedOutputStream) - Method in class sentencepiece.SentencepieceModel.TrainerSpec
 

X

XHOSA - com.yahoo.language.Language
Language tag "xh".

Y

YI - com.yahoo.language.process.TokenScript
 
YIDDISH - com.yahoo.language.Language
Language tag "yi".
YORUBA - com.yahoo.language.Language
Language tag "yo".

Z

ZHUANG - com.yahoo.language.Language
Language tag "za".
ZULU - com.yahoo.language.Language
Language tag "zu".
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z 
All Classes All Packages