A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
All Classes All Packages
All Classes All Packages
All Classes All Packages
A
- ABKHAZIAN - com.yahoo.language.Language
-
Language tag "ab".
- AbstractDetector - Class in com.yahoo.language.detect
- AbstractDetector() - Constructor for class com.yahoo.language.detect.AbstractDetector
- accentDrop(String, Language) - Method in interface com.yahoo.language.process.Transformer
-
Remove accents from input text.
- ACCEPT_LANGUAGE_FIELD_NUMBER - Static variable in class sentencepiece.SentencepieceModel.TrainerSpec
- add(int, String) - Method in class com.yahoo.language.process.StemList
- ADD_DUMMY_PREFIX_FIELD_NUMBER - Static variable in class sentencepiece.SentencepieceModel.NormalizerSpec
- addAcceptLanguage(String) - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
-
List of the languages this model can accept.
- addAcceptLanguageBytes(ByteString) - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
-
List of the languages this model can accept.
- addAllAcceptLanguage(Iterable<String>) - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
-
List of the languages this model can accept.
- addAllControlSymbols(Iterable<String>) - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
-
///////////////////////////////////////////////////////////////// Vocabulary management Defines control symbols used as an indicator to change the behavior of the decoder.
- addAllInput(Iterable<String>) - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
-
///////////////////////////////////////////////////////////////// General parameters Input corpus files.
- addAllPieces(Iterable<? extends SentencepieceModel.ModelProto.SentencePiece>) - Method in class sentencepiece.SentencepieceModel.ModelProto.Builder
-
Sentence pieces with scores.
- addAllSamples(Iterable<? extends SentencepieceModel.SelfTestData.Sample>) - Method in class sentencepiece.SentencepieceModel.SelfTestData.Builder
-
repeated .sentencepiece.SelfTestData.Sample samples = 1;
- addAllUserDefinedSymbols(Iterable<String>) - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
-
Defines user defined symbols.
- addControlSymbols(String) - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
-
///////////////////////////////////////////////////////////////// Vocabulary management Defines control symbols used as an indicator to change the behavior of the decoder.
- addControlSymbolsBytes(ByteString) - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
-
///////////////////////////////////////////////////////////////// Vocabulary management Defines control symbols used as an indicator to change the behavior of the decoder.
- addDefaultModel(Path) - Method in class com.yahoo.language.sentencepiece.SentencePieceEncoder.Builder
-
Adds the model that will be used if the language is unknown, OR only one model is specified.
- addExtension(GeneratedMessage.GeneratedExtension<SentencepieceModel.ModelProto, List<Type>>, Type) - Method in class sentencepiece.SentencepieceModel.ModelProto.Builder
- addExtension(GeneratedMessage.GeneratedExtension<SentencepieceModel.ModelProto.SentencePiece, List<Type>>, Type) - Method in class sentencepiece.SentencepieceModel.ModelProto.SentencePiece.Builder
- addExtension(GeneratedMessage.GeneratedExtension<SentencepieceModel.NormalizerSpec, List<Type>>, Type) - Method in class sentencepiece.SentencepieceModel.NormalizerSpec.Builder
- addExtension(GeneratedMessage.GeneratedExtension<SentencepieceModel.SelfTestData, List<Type>>, Type) - Method in class sentencepiece.SentencepieceModel.SelfTestData.Builder
- addExtension(GeneratedMessage.GeneratedExtension<SentencepieceModel.TrainerSpec, List<Type>>, Type) - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
- addInput(String) - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
-
///////////////////////////////////////////////////////////////// General parameters Input corpus files.
- addInputBytes(ByteString) - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
-
///////////////////////////////////////////////////////////////// General parameters Input corpus files.
- addModel(Language, Path) - Method in class com.yahoo.language.sentencepiece.SentencePieceEncoder.Builder
- addPieces(int, SentencepieceModel.ModelProto.SentencePiece) - Method in class sentencepiece.SentencepieceModel.ModelProto.Builder
-
Sentence pieces with scores.
- addPieces(int, SentencepieceModel.ModelProto.SentencePiece.Builder) - Method in class sentencepiece.SentencepieceModel.ModelProto.Builder
-
Sentence pieces with scores.
- addPieces(SentencepieceModel.ModelProto.SentencePiece) - Method in class sentencepiece.SentencepieceModel.ModelProto.Builder
-
Sentence pieces with scores.
- addPieces(SentencepieceModel.ModelProto.SentencePiece.Builder) - Method in class sentencepiece.SentencepieceModel.ModelProto.Builder
-
Sentence pieces with scores.
- addPiecesBuilder() - Method in class sentencepiece.SentencepieceModel.ModelProto.Builder
-
Sentence pieces with scores.
- addPiecesBuilder(int) - Method in class sentencepiece.SentencepieceModel.ModelProto.Builder
-
Sentence pieces with scores.
- addRepeatedField(Descriptors.FieldDescriptor, Object) - Method in class sentencepiece.SentencepieceModel.ModelProto.Builder
- addRepeatedField(Descriptors.FieldDescriptor, Object) - Method in class sentencepiece.SentencepieceModel.ModelProto.SentencePiece.Builder
- addRepeatedField(Descriptors.FieldDescriptor, Object) - Method in class sentencepiece.SentencepieceModel.NormalizerSpec.Builder
- addRepeatedField(Descriptors.FieldDescriptor, Object) - Method in class sentencepiece.SentencepieceModel.SelfTestData.Builder
- addRepeatedField(Descriptors.FieldDescriptor, Object) - Method in class sentencepiece.SentencepieceModel.SelfTestData.Sample.Builder
- addRepeatedField(Descriptors.FieldDescriptor, Object) - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
- addSamples(int, SentencepieceModel.SelfTestData.Sample) - Method in class sentencepiece.SentencepieceModel.SelfTestData.Builder
-
repeated .sentencepiece.SelfTestData.Sample samples = 1;
- addSamples(int, SentencepieceModel.SelfTestData.Sample.Builder) - Method in class sentencepiece.SentencepieceModel.SelfTestData.Builder
-
repeated .sentencepiece.SelfTestData.Sample samples = 1;
- addSamples(SentencepieceModel.SelfTestData.Sample) - Method in class sentencepiece.SentencepieceModel.SelfTestData.Builder
-
repeated .sentencepiece.SelfTestData.Sample samples = 1;
- addSamples(SentencepieceModel.SelfTestData.Sample.Builder) - Method in class sentencepiece.SentencepieceModel.SelfTestData.Builder
-
repeated .sentencepiece.SelfTestData.Sample samples = 1;
- addSamplesBuilder() - Method in class sentencepiece.SentencepieceModel.SelfTestData.Builder
-
repeated .sentencepiece.SelfTestData.Sample samples = 1;
- addSamplesBuilder(int) - Method in class sentencepiece.SentencepieceModel.SelfTestData.Builder
-
repeated .sentencepiece.SelfTestData.Sample samples = 1;
- addUserDefinedSymbols(String) - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
-
Defines user defined symbols.
- addUserDefinedSymbolsBytes(ByteString) - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
-
Defines user defined symbols.
- AFAR - com.yahoo.language.Language
-
Language tag "aa".
- AFRIKAANS - com.yahoo.language.Language
-
Language tag "af".
- ALBANIAN - com.yahoo.language.Language
-
Language tag "sq".
- ALL - com.yahoo.language.process.StemMode
- ALLOW_WHITESPACE_ONLY_PIECES_FIELD_NUMBER - Static variable in class sentencepiece.SentencepieceModel.TrainerSpec
- ALPHABETIC - com.yahoo.language.process.TokenType
- AMHARIC - com.yahoo.language.Language
-
Language tag "am".
- ARABIC - com.yahoo.language.Language
-
Language tag "ar".
- ARABIC - com.yahoo.language.process.TokenScript
- ARMENIAN - com.yahoo.language.Language
-
Language tag "hy".
- ARMENIAN - com.yahoo.language.process.TokenScript
- ASCII - com.yahoo.language.process.TokenScript
- asMap() - Method in class com.yahoo.language.process.SpecialTokens
-
Returns the tokens of this as an immutable map from token to replacement.
- ASSAMESE - com.yahoo.language.Language
-
Language tag "as".
- AYMARA - com.yahoo.language.Language
-
Language tag "ay".
- AZERBAIJANI - com.yahoo.language.Language
-
Language tag "az".
B
- BASHKIR - com.yahoo.language.Language
-
Language tag "ba".
- BASQUE - com.yahoo.language.Language
-
Language tag "eu".
- BENGALI - com.yahoo.language.Language
-
Language tag "bn".
- BENGALI - com.yahoo.language.process.TokenScript
- BEST - com.yahoo.language.process.StemMode
- BHUTANI - com.yahoo.language.Language
-
Language tag "dz".
- BIHARI - com.yahoo.language.Language
-
Language tag "bh".
- BISLAMA - com.yahoo.language.Language
-
Language tag "bi".
- BOS_ID_FIELD_NUMBER - Static variable in class sentencepiece.SentencepieceModel.TrainerSpec
- BOS_PIECE_FIELD_NUMBER - Static variable in class sentencepiece.SentencepieceModel.TrainerSpec
- BPE - sentencepiece.SentencepieceModel.TrainerSpec.ModelType
-
Byte Pair Encoding
- BPE_VALUE - Static variable in enum sentencepiece.SentencepieceModel.TrainerSpec.ModelType
-
Byte Pair Encoding
- BRAILLE - com.yahoo.language.process.TokenScript
- BRETON - com.yahoo.language.Language
-
Language tag "br".
- BUGINESE - com.yahoo.language.Language
-
Language tag "bug".
- BUGINESE - com.yahoo.language.process.TokenScript
- BUHID - com.yahoo.language.process.TokenScript
- build() - Method in class com.yahoo.language.sentencepiece.SentencePieceConfig.Builder
- build() - Method in class com.yahoo.language.sentencepiece.SentencePieceConfig.Model.Builder
- build() - Method in class com.yahoo.language.sentencepiece.SentencePieceEncoder.Builder
- build() - Method in class sentencepiece.SentencepieceModel.ModelProto.Builder
- build() - Method in class sentencepiece.SentencepieceModel.ModelProto.SentencePiece.Builder
- build() - Method in class sentencepiece.SentencepieceModel.NormalizerSpec.Builder
- build() - Method in class sentencepiece.SentencepieceModel.SelfTestData.Builder
- build() - Method in class sentencepiece.SentencepieceModel.SelfTestData.Sample.Builder
- build() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
- Builder() - Constructor for class com.yahoo.language.sentencepiece.SentencePieceConfig.Builder
- Builder() - Constructor for class com.yahoo.language.sentencepiece.SentencePieceConfig.Model.Builder
- Builder() - Constructor for class com.yahoo.language.sentencepiece.SentencePieceEncoder.Builder
- Builder(SentencePieceConfig) - Constructor for class com.yahoo.language.sentencepiece.SentencePieceConfig.Builder
- Builder(SentencePieceConfig.Model) - Constructor for class com.yahoo.language.sentencepiece.SentencePieceConfig.Model.Builder
- buildPartial() - Method in class sentencepiece.SentencepieceModel.ModelProto.Builder
- buildPartial() - Method in class sentencepiece.SentencepieceModel.ModelProto.SentencePiece.Builder
- buildPartial() - Method in class sentencepiece.SentencepieceModel.NormalizerSpec.Builder
- buildPartial() - Method in class sentencepiece.SentencepieceModel.SelfTestData.Builder
- buildPartial() - Method in class sentencepiece.SentencepieceModel.SelfTestData.Sample.Builder
- buildPartial() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
- BULGARIAN - com.yahoo.language.Language
-
Language tag "bg".
- BURMESE - com.yahoo.language.Language
-
Language tag "my".
- BYELORUSSIAN - com.yahoo.language.Language
-
Language tag "be".
- BYTE - sentencepiece.SentencepieceModel.ModelProto.SentencePiece.Type
-
Typical usage of USER_DEFINED symbol is placeholder.
- BYTE_FALLBACK_FIELD_NUMBER - Static variable in class sentencepiece.SentencepieceModel.TrainerSpec
- BYTE_VALUE - Static variable in enum sentencepiece.SentencepieceModel.ModelProto.SentencePiece.Type
-
Typical usage of USER_DEFINED symbol is placeholder.
C
- CAMBODIAN - com.yahoo.language.Language
-
Language tag "km".
- CANADIAN - com.yahoo.language.process.TokenScript
- CATALAN - com.yahoo.language.Language
-
Language tag "ca".
- CHAR - sentencepiece.SentencepieceModel.TrainerSpec.ModelType
-
tokenizes into character sequence
- CHAR_VALUE - Static variable in enum sentencepiece.SentencepieceModel.TrainerSpec.ModelType
-
tokenizes into character sequence
- CHARACTER_CLASSES - com.yahoo.language.Linguistics.Component
- CHARACTER_COVERAGE_FIELD_NUMBER - Static variable in class sentencepiece.SentencepieceModel.TrainerSpec
- CharacterClasses - Class in com.yahoo.language.process
-
Determines the class of a given character.
- CharacterClasses() - Constructor for class com.yahoo.language.process.CharacterClasses
- CHEROKEE - com.yahoo.language.Language
-
Language tag "chr".
- CHEROKEE - com.yahoo.language.process.TokenScript
- CHINESE - com.yahoo.language.process.TokenScript
- CHINESE_SIMPLIFIED - com.yahoo.language.Language
-
Language tag "zh-hans".
- CHINESE_TRADITIONAL - com.yahoo.language.Language
-
Language tag "zh-hant".
- clear() - Method in class sentencepiece.SentencepieceModel.ModelProto.Builder
- clear() - Method in class sentencepiece.SentencepieceModel.ModelProto.SentencePiece.Builder
- clear() - Method in class sentencepiece.SentencepieceModel.NormalizerSpec.Builder
- clear() - Method in class sentencepiece.SentencepieceModel.SelfTestData.Builder
- clear() - Method in class sentencepiece.SentencepieceModel.SelfTestData.Sample.Builder
- clear() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
- clearAcceptLanguage() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
-
List of the languages this model can accept.
- clearAddDummyPrefix() - Method in class sentencepiece.SentencepieceModel.NormalizerSpec.Builder
-
Adds dummy whitespace at the beginning of text in order to treat "world" in "world" and "hello world" in the same way.
- clearAllowWhitespaceOnlyPieces() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
-
Allows pieces that only contain whitespaces instead of appearing only as prefix or suffix of other pieces.
- clearBosId() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
-
<s>
- clearBosPiece() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
-
optional string bos_piece = 46 [default = "<s>"];
- clearByteFallback() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
-
Decomposes unknown pieces into UTF-8 bytes.
- clearCharacterCoverage() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
-
///////////////////////////////////////////////////////////////// Training parameters.
- clearControlSymbols() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
-
///////////////////////////////////////////////////////////////// Vocabulary management Defines control symbols used as an indicator to change the behavior of the decoder.
- clearDenormalizerSpec() - Method in class sentencepiece.SentencepieceModel.ModelProto.Builder
-
Spec for text de-normalization.
- clearEosId() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
-
</s>
- clearEosPiece() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
-
optional string eos_piece = 47 [default = "</s>"];
- clearEscapeWhitespaces() - Method in class sentencepiece.SentencepieceModel.NormalizerSpec.Builder
-
Replaces whitespace with meta symbol.
- clearExpected() - Method in class sentencepiece.SentencepieceModel.SelfTestData.Sample.Builder
-
optional string expected = 2;
- clearExtension(GeneratedMessage.GeneratedExtension<SentencepieceModel.ModelProto, ?>) - Method in class sentencepiece.SentencepieceModel.ModelProto.Builder
- clearExtension(GeneratedMessage.GeneratedExtension<SentencepieceModel.ModelProto.SentencePiece, ?>) - Method in class sentencepiece.SentencepieceModel.ModelProto.SentencePiece.Builder
- clearExtension(GeneratedMessage.GeneratedExtension<SentencepieceModel.NormalizerSpec, ?>) - Method in class sentencepiece.SentencepieceModel.NormalizerSpec.Builder
- clearExtension(GeneratedMessage.GeneratedExtension<SentencepieceModel.SelfTestData, ?>) - Method in class sentencepiece.SentencepieceModel.SelfTestData.Builder
- clearExtension(GeneratedMessage.GeneratedExtension<SentencepieceModel.TrainerSpec, ?>) - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
- clearField(Descriptors.FieldDescriptor) - Method in class sentencepiece.SentencepieceModel.ModelProto.Builder
- clearField(Descriptors.FieldDescriptor) - Method in class sentencepiece.SentencepieceModel.ModelProto.SentencePiece.Builder
- clearField(Descriptors.FieldDescriptor) - Method in class sentencepiece.SentencepieceModel.NormalizerSpec.Builder
- clearField(Descriptors.FieldDescriptor) - Method in class sentencepiece.SentencepieceModel.SelfTestData.Builder
- clearField(Descriptors.FieldDescriptor) - Method in class sentencepiece.SentencepieceModel.SelfTestData.Sample.Builder
- clearField(Descriptors.FieldDescriptor) - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
- clearHardVocabLimit() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
-
`vocab_size` is treated as hard limit.
- clearInput() - Method in class sentencepiece.SentencepieceModel.SelfTestData.Sample.Builder
-
optional string input = 1;
- clearInput() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
-
///////////////////////////////////////////////////////////////// General parameters Input corpus files.
- clearInputFormat() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
-
Input corpus format: "text": one-sentence-per-line text format (default) "tsv": sentence <tab> freq
- clearInputSentenceSize() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
-
Maximum size of sentences the trainer loads from `input` parameter.
- clearMaxSentenceLength() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
-
The maximum sentence length in byte.
- clearMaxSentencepieceLength() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
-
///////////////////////////////////////////////////////////////// SentencePiece parameters which control the shapes of sentence piece.
- clearMiningSentenceSize() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
-
Deprecated.
- clearModelPrefix() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
-
Output model file prefix.
- clearModelType() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
-
optional .sentencepiece.TrainerSpec.ModelType model_type = 3 [default = UNIGRAM];
- clearName() - Method in class sentencepiece.SentencepieceModel.NormalizerSpec.Builder
-
name of normalization rule.
- clearNormalizationRuleTsv() - Method in class sentencepiece.SentencepieceModel.NormalizerSpec.Builder
-
Custom normalization rule file in TSV format.
- clearNormalizerSpec() - Method in class sentencepiece.SentencepieceModel.ModelProto.Builder
-
Spec for text normalization.
- clearNumSubIterations() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
-
Number of EM sub iterations.
- clearNumThreads() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
-
Number of threads in the training.
- clearOneof(Descriptors.OneofDescriptor) - Method in class sentencepiece.SentencepieceModel.ModelProto.Builder
- clearOneof(Descriptors.OneofDescriptor) - Method in class sentencepiece.SentencepieceModel.ModelProto.SentencePiece.Builder
- clearOneof(Descriptors.OneofDescriptor) - Method in class sentencepiece.SentencepieceModel.NormalizerSpec.Builder
- clearOneof(Descriptors.OneofDescriptor) - Method in class sentencepiece.SentencepieceModel.SelfTestData.Builder
- clearOneof(Descriptors.OneofDescriptor) - Method in class sentencepiece.SentencepieceModel.SelfTestData.Sample.Builder
- clearOneof(Descriptors.OneofDescriptor) - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
- clearPadId() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
-
<pad> (padding)
- clearPadPiece() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
-
optional string pad_piece = 48 [default = "<pad>"];
- clearPiece() - Method in class sentencepiece.SentencepieceModel.ModelProto.SentencePiece.Builder
-
piece must not be empty.
- clearPieces() - Method in class sentencepiece.SentencepieceModel.ModelProto.Builder
-
Sentence pieces with scores.
- clearPrecompiledCharsmap() - Method in class sentencepiece.SentencepieceModel.NormalizerSpec.Builder
-
Pre-compiled normalization rule created by Builder::GetPrecompiledCharsMap() or Builder::CompileCharsMap() method.
- clearRemoveExtraWhitespaces() - Method in class sentencepiece.SentencepieceModel.NormalizerSpec.Builder
-
Removes leading, trailing, and duplicate internal whitespace.
- clearRequiredChars() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
-
Defines required characters.
- clearSamples() - Method in class sentencepiece.SentencepieceModel.SelfTestData.Builder
-
repeated .sentencepiece.SelfTestData.Sample samples = 1;
- clearScore() - Method in class sentencepiece.SentencepieceModel.ModelProto.SentencePiece.Builder
-
optional float score = 2;
- clearSeedSentencepieceSize() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
-
The size of seed sentencepieces.
- clearSelfTestData() - Method in class sentencepiece.SentencepieceModel.ModelProto.Builder
-
Stores sample input and its expected segmentation to verify the model.
- clearSelfTestSampleSize() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
-
Size of self-test samples, which are encoded in the model file.
- clearShrinkingFactor() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
-
In every EM sub-iterations, keeps top `shrinking_factor` * `current sentencepieces size` with respect to the loss of the sentence piece.
- clearShuffleInputSentence() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
-
optional bool shuffle_input_sentence = 19 [default = true];
- clearSplitByNumber() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
-
When `split_by_number` is true, put a boundary between number and non-number transition.
- clearSplitByUnicodeScript() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
-
Uses Unicode script to split sentence pieces.
- clearSplitByWhitespace() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
-
Use a white space to split sentence pieces.
- clearSplitDigits() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
-
Split all digits (0-9) into separate pieces.
- clearTrainerSpec() - Method in class sentencepiece.SentencepieceModel.ModelProto.Builder
-
Spec used to generate this model file.
- clearTrainExtremelyLargeCorpus() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
-
Increase bit depth to allow unigram model training on large (>10M sentences) corpora.
- clearTrainingSentenceSize() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
-
Deprecated.
- clearTreatWhitespaceAsSuffix() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
-
Adds whitespace symbol (_) as a suffix instead of prefix.
- clearType() - Method in class sentencepiece.SentencepieceModel.ModelProto.SentencePiece.Builder
-
optional .sentencepiece.ModelProto.SentencePiece.Type type = 3 [default = NORMAL];
- clearUnkId() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
-
///////////////////////////////////////////////////////////////// Reserved special meta tokens.
- clearUnkPiece() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
-
optional string unk_piece = 45 [default = "<unk>"];
- clearUnkSurface() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
-
Encodes <unk> into U+2047 (DOUBLE QUESTION MARK), since this character can be useful both for user and developer.
- clearUseAllVocab() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
-
use all symbols for vocab extraction.
- clearUserDefinedSymbols() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
-
Defines user defined symbols.
- clearVocabSize() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
-
Vocabulary size.
- clearVocabularyOutputPieceScore() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
-
When creating the vocabulary file, defines whether or not to additionally output the score for each piece.
- clone() - Method in class sentencepiece.SentencepieceModel.ModelProto.Builder
- clone() - Method in class sentencepiece.SentencepieceModel.ModelProto.SentencePiece.Builder
- clone() - Method in class sentencepiece.SentencepieceModel.NormalizerSpec.Builder
- clone() - Method in class sentencepiece.SentencepieceModel.SelfTestData.Builder
- clone() - Method in class sentencepiece.SentencepieceModel.SelfTestData.Sample.Builder
- clone() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
- collapseUnknowns() - Method in class com.yahoo.language.sentencepiece.SentencePieceConfig
- collapseUnknowns(boolean) - Method in class com.yahoo.language.sentencepiece.SentencePieceConfig.Builder
- com.yahoo.language - package com.yahoo.language
- com.yahoo.language.detect - package com.yahoo.language.detect
- com.yahoo.language.process - package com.yahoo.language.process
- com.yahoo.language.sentencepiece - package com.yahoo.language.sentencepiece
- COMMON - com.yahoo.language.process.TokenScript
- compareTo(SpecialTokens.Token) - Method in class com.yahoo.language.process.SpecialTokens.Token
- CONFIG_DEF_MD5 - Static variable in class com.yahoo.language.sentencepiece.SentencePieceConfig
- CONFIG_DEF_NAME - Static variable in class com.yahoo.language.sentencepiece.SentencePieceConfig
- CONFIG_DEF_NAMESPACE - Static variable in class com.yahoo.language.sentencepiece.SentencePieceConfig
- CONFIG_DEF_SCHEMA - Static variable in class com.yahoo.language.sentencepiece.SentencePieceConfig
- CONFIG_DEF_VERSION - Static variable in class com.yahoo.language.sentencepiece.SentencePieceConfig
- CONTROL - sentencepiece.SentencepieceModel.ModelProto.SentencePiece.Type
-
control symbols.
- CONTROL_SYMBOLS_FIELD_NUMBER - Static variable in class sentencepiece.SentencepieceModel.TrainerSpec
- CONTROL_VALUE - Static variable in enum sentencepiece.SentencepieceModel.ModelProto.SentencePiece.Type
-
control symbols.
- COPTIC - com.yahoo.language.Language
-
Language tag "cop".
- COPTIC - com.yahoo.language.process.TokenScript
- CORSICAN - com.yahoo.language.Language
-
Language tag "co".
- CROATIAN - com.yahoo.language.Language
-
Language tag "hr".
- CYPRIOT - com.yahoo.language.process.TokenScript
- CYRILLIC - com.yahoo.language.process.TokenScript
- CZECH - com.yahoo.language.Language
-
Language tag "cs".
D
- DANISH - com.yahoo.language.Language
-
Language tag "da".
- DEFAULT - com.yahoo.language.process.StemMode
- DENORMALIZER_SPEC_FIELD_NUMBER - Static variable in class sentencepiece.SentencepieceModel.ModelProto
- DESERET - com.yahoo.language.process.TokenScript
- detect(byte[], int, int, Hint) - Method in interface com.yahoo.language.detect.Detector
-
Detects language and encoding of the supplied byte array, possibly using a language/encoding hint.
- detect(String, Hint) - Method in class com.yahoo.language.detect.AbstractDetector
- detect(String, Hint) - Method in interface com.yahoo.language.detect.Detector
-
Detects language of the supplied String, possibly using a language hint.
- detect(ByteBuffer, Hint) - Method in class com.yahoo.language.detect.AbstractDetector
- detect(ByteBuffer, Hint) - Method in interface com.yahoo.language.detect.Detector
-
Detects language and encoding of the supplied ByteBuffer, possibly using a language/encoding hint.
- Detection - Class in com.yahoo.language.detect
- Detection(Language, String, boolean) - Constructor for class com.yahoo.language.detect.Detection
- DetectionException - Exception in com.yahoo.language.detect
-
Exception that is thrown when detection fails.
- DetectionException(String) - Constructor for exception com.yahoo.language.detect.DetectionException
- Detector - Interface in com.yahoo.language.detect
-
Abstract superclass of all Detectors used for language and encoding detection.
- DETECTOR - com.yahoo.language.Linguistics.Component
- DEVANAGARI - com.yahoo.language.process.TokenScript
- dispatchGetConfig(ConfigInstance.Producer) - Method in class com.yahoo.language.sentencepiece.SentencePieceConfig.Builder
- DIVEHI - com.yahoo.language.Language
-
Language tag "div".
- doSetValue(String) - Method in class com.yahoo.language.sentencepiece.SentencePieceConfig.Scoring
- DUTCH - com.yahoo.language.Language
-
Language tag "nl".
E
- empty() - Static method in class com.yahoo.language.process.SpecialTokens
- encode(String, Language) - Method in interface com.yahoo.language.process.Encoder
-
Encodes text into tokens in a list of ids.
- encode(String, Language) - Method in class com.yahoo.language.process.Encoder.FailingEncoder
- encode(String, Language) - Method in class com.yahoo.language.sentencepiece.SentencePieceEncoder
-
Segments the given text into token segments using the SentencePiece algorithm and returns the segment ids.
- encode(String, Language, TensorType) - Method in interface com.yahoo.language.process.Encoder
-
Encodes text into tokens in a tensor.
- encode(String, Language, TensorType) - Method in class com.yahoo.language.process.Encoder.FailingEncoder
- encode(String, Language, TensorType) - Method in class com.yahoo.language.sentencepiece.SentencePieceEncoder
-
Encodes directly to a tensor.
- Encoder - Interface in com.yahoo.language.process
-
An encoder converts a text string to a tensor or list of tokens
- Encoder.FailingEncoder - Class in com.yahoo.language.process
- ENGLISH - com.yahoo.language.Language
-
Language tag "en".
- EOS_ID_FIELD_NUMBER - Static variable in class sentencepiece.SentencepieceModel.TrainerSpec
- EOS_PIECE_FIELD_NUMBER - Static variable in class sentencepiece.SentencepieceModel.TrainerSpec
- equals(Linguistics) - Method in interface com.yahoo.language.Linguistics
-
Check if another instance is equivalent to this one
- equals(Object) - Method in class com.yahoo.language.process.GramSplitter.Gram
- equals(Object) - Method in class com.yahoo.language.process.SpecialTokens.Token
- equals(Object) - Method in class sentencepiece.SentencepieceModel.ModelProto
- equals(Object) - Method in class sentencepiece.SentencepieceModel.ModelProto.SentencePiece
- equals(Object) - Method in class sentencepiece.SentencepieceModel.NormalizerSpec
- equals(Object) - Method in class sentencepiece.SentencepieceModel.SelfTestData
- equals(Object) - Method in class sentencepiece.SentencepieceModel.SelfTestData.Sample
- equals(Object) - Method in class sentencepiece.SentencepieceModel.TrainerSpec
- ESCAPE_WHITESPACES_FIELD_NUMBER - Static variable in class sentencepiece.SentencepieceModel.NormalizerSpec
- ESPERANTO - com.yahoo.language.Language
-
Language tag "eo".
- ESTONIAN - com.yahoo.language.Language
-
Language tag "et".
- ETHIOPIC - com.yahoo.language.process.TokenScript
- EXPECTED_FIELD_NUMBER - Static variable in class sentencepiece.SentencepieceModel.SelfTestData.Sample
- extractFrom(GramSplitter.UnicodeString) - Method in class com.yahoo.language.process.GramSplitter.Gram
-
Returns this gram as a string from the input string
- extractFrom(String) - Method in class com.yahoo.language.process.GramSplitter.Gram
-
Returns this gram as a string from the input string
F
- FailingEncoder() - Constructor for class com.yahoo.language.process.Encoder.FailingEncoder
- FAROESE - com.yahoo.language.Language
-
Language tag "fo".
- fewestSegments - com.yahoo.language.sentencepiece.Scoring
-
Find the segmentation that has the fewest segments, resolve ties by score sum
- fewestSegments - com.yahoo.language.sentencepiece.SentencePieceConfig.Scoring.Enum
- fewestSegments - Static variable in class com.yahoo.language.sentencepiece.SentencePieceConfig.Scoring
- FIJI - com.yahoo.language.Language
-
Language tag "fj".
- FINNISH - com.yahoo.language.Language
-
Language tag "fi".
- forNumber(int) - Static method in enum sentencepiece.SentencepieceModel.ModelProto.SentencePiece.Type
- forNumber(int) - Static method in enum sentencepiece.SentencepieceModel.TrainerSpec.ModelType
- FRENCH - com.yahoo.language.Language
-
Language tag "fr".
- FRISIAN - com.yahoo.language.Language
-
Language tag "fy".
- fromEncoding(String) - Static method in enum com.yahoo.language.Language
-
Returns the language from an encoding, or
Language.UNKNOWN
if it cannot be determined. - fromLanguageTag(String) - Static method in enum com.yahoo.language.Language
-
Convenience method for calling
fromLocale(LocaleFactory.fromLanguageTag(languageTag))
. - fromLanguageTag(String) - Static method in class com.yahoo.language.LocaleFactory
-
Implements a simple parser for RFC5646 language tags.
- fromLocale(Locale) - Static method in enum com.yahoo.language.Language
-
Returns the
Language
whoseLanguage.languageCode()
is equal tolocale.getLanguage()
, with the following additions:
G
- GALICIAN - com.yahoo.language.Language
-
Language tag "gl".
- GEORGIAN - com.yahoo.language.Language
-
Language tag "ka".
- GEORGIAN - com.yahoo.language.process.TokenScript
- GERMAN - com.yahoo.language.Language
-
Language tag "de".
- get(int) - Method in class com.yahoo.language.process.StemList
- getAcceptLanguage(int) - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
-
List of the languages this model can accept.
- getAcceptLanguage(int) - Method in class sentencepiece.SentencepieceModel.TrainerSpec
-
List of the languages this model can accept.
- getAcceptLanguage(int) - Method in interface sentencepiece.SentencepieceModel.TrainerSpecOrBuilder
-
List of the languages this model can accept.
- getAcceptLanguageBytes(int) - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
-
List of the languages this model can accept.
- getAcceptLanguageBytes(int) - Method in class sentencepiece.SentencepieceModel.TrainerSpec
-
List of the languages this model can accept.
- getAcceptLanguageBytes(int) - Method in interface sentencepiece.SentencepieceModel.TrainerSpecOrBuilder
-
List of the languages this model can accept.
- getAcceptLanguageCount() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
-
List of the languages this model can accept.
- getAcceptLanguageCount() - Method in class sentencepiece.SentencepieceModel.TrainerSpec
-
List of the languages this model can accept.
- getAcceptLanguageCount() - Method in interface sentencepiece.SentencepieceModel.TrainerSpecOrBuilder
-
List of the languages this model can accept.
- getAcceptLanguageList() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
-
List of the languages this model can accept.
- getAcceptLanguageList() - Method in class sentencepiece.SentencepieceModel.TrainerSpec
-
List of the languages this model can accept.
- getAcceptLanguageList() - Method in interface sentencepiece.SentencepieceModel.TrainerSpecOrBuilder
-
List of the languages this model can accept.
- getAddDummyPrefix() - Method in class sentencepiece.SentencepieceModel.NormalizerSpec.Builder
-
Adds dummy whitespace at the beginning of text in order to treat "world" in "world" and "hello world" in the same way.
- getAddDummyPrefix() - Method in class sentencepiece.SentencepieceModel.NormalizerSpec
-
Adds dummy whitespace at the beginning of text in order to treat "world" in "world" and "hello world" in the same way.
- getAddDummyPrefix() - Method in interface sentencepiece.SentencepieceModel.NormalizerSpecOrBuilder
-
Adds dummy whitespace at the beginning of text in order to treat "world" in "world" and "hello world" in the same way.
- getAllowWhitespaceOnlyPieces() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
-
Allows pieces that only contain whitespaces instead of appearing only as prefix or suffix of other pieces.
- getAllowWhitespaceOnlyPieces() - Method in class sentencepiece.SentencepieceModel.TrainerSpec
-
Allows pieces that only contain whitespaces instead of appearing only as prefix or suffix of other pieces.
- getAllowWhitespaceOnlyPieces() - Method in interface sentencepiece.SentencepieceModel.TrainerSpecOrBuilder
-
Allows pieces that only contain whitespaces instead of appearing only as prefix or suffix of other pieces.
- getApplyOnRestart() - Method in class com.yahoo.language.sentencepiece.SentencePieceConfig.Builder
- getBosId() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
-
<s>
- getBosId() - Method in class sentencepiece.SentencepieceModel.TrainerSpec
-
<s>
- getBosId() - Method in interface sentencepiece.SentencepieceModel.TrainerSpecOrBuilder
-
<s>
- getBosPiece() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
-
optional string bos_piece = 46 [default = "<s>"];
- getBosPiece() - Method in class sentencepiece.SentencepieceModel.TrainerSpec
-
optional string bos_piece = 46 [default = "<s>"];
- getBosPiece() - Method in interface sentencepiece.SentencepieceModel.TrainerSpecOrBuilder
-
optional string bos_piece = 46 [default = "<s>"];
- getBosPieceBytes() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
-
optional string bos_piece = 46 [default = "<s>"];
- getBosPieceBytes() - Method in class sentencepiece.SentencepieceModel.TrainerSpec
-
optional string bos_piece = 46 [default = "<s>"];
- getBosPieceBytes() - Method in interface sentencepiece.SentencepieceModel.TrainerSpecOrBuilder
-
optional string bos_piece = 46 [default = "<s>"];
- getByteFallback() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
-
Decomposes unknown pieces into UTF-8 bytes.
- getByteFallback() - Method in class sentencepiece.SentencepieceModel.TrainerSpec
-
Decomposes unknown pieces into UTF-8 bytes.
- getByteFallback() - Method in interface sentencepiece.SentencepieceModel.TrainerSpecOrBuilder
-
Decomposes unknown pieces into UTF-8 bytes.
- getCharacterClasses() - Method in interface com.yahoo.language.Linguistics
-
Returns a thread-unsafe character classes instance.
- getCharacterCoverage() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
-
///////////////////////////////////////////////////////////////// Training parameters.
- getCharacterCoverage() - Method in class sentencepiece.SentencepieceModel.TrainerSpec
-
///////////////////////////////////////////////////////////////// Training parameters.
- getCharacterCoverage() - Method in interface sentencepiece.SentencepieceModel.TrainerSpecOrBuilder
-
///////////////////////////////////////////////////////////////// Training parameters.
- getCodePointCount() - Method in class com.yahoo.language.process.GramSplitter.Gram
- getCollapseUnknowns() - Method in class com.yahoo.language.sentencepiece.SentencePieceEncoder.Builder
- getComponent(int) - Method in interface com.yahoo.language.process.Token
-
Returns a component token of this
- getConfig(SentencePieceConfig.Builder) - Method in interface com.yahoo.language.sentencepiece.SentencePieceConfig.Producer
- getControlSymbols(int) - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
-
///////////////////////////////////////////////////////////////// Vocabulary management Defines control symbols used as an indicator to change the behavior of the decoder.
- getControlSymbols(int) - Method in class sentencepiece.SentencepieceModel.TrainerSpec
-
///////////////////////////////////////////////////////////////// Vocabulary management Defines control symbols used as an indicator to change the behavior of the decoder.
- getControlSymbols(int) - Method in interface sentencepiece.SentencepieceModel.TrainerSpecOrBuilder
-
///////////////////////////////////////////////////////////////// Vocabulary management Defines control symbols used as an indicator to change the behavior of the decoder.
- getControlSymbolsBytes(int) - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
-
///////////////////////////////////////////////////////////////// Vocabulary management Defines control symbols used as an indicator to change the behavior of the decoder.
- getControlSymbolsBytes(int) - Method in class sentencepiece.SentencepieceModel.TrainerSpec
-
///////////////////////////////////////////////////////////////// Vocabulary management Defines control symbols used as an indicator to change the behavior of the decoder.
- getControlSymbolsBytes(int) - Method in interface sentencepiece.SentencepieceModel.TrainerSpecOrBuilder
-
///////////////////////////////////////////////////////////////// Vocabulary management Defines control symbols used as an indicator to change the behavior of the decoder.
- getControlSymbolsCount() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
-
///////////////////////////////////////////////////////////////// Vocabulary management Defines control symbols used as an indicator to change the behavior of the decoder.
- getControlSymbolsCount() - Method in class sentencepiece.SentencepieceModel.TrainerSpec
-
///////////////////////////////////////////////////////////////// Vocabulary management Defines control symbols used as an indicator to change the behavior of the decoder.
- getControlSymbolsCount() - Method in interface sentencepiece.SentencepieceModel.TrainerSpecOrBuilder
-
///////////////////////////////////////////////////////////////// Vocabulary management Defines control symbols used as an indicator to change the behavior of the decoder.
- getControlSymbolsList() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
-
///////////////////////////////////////////////////////////////// Vocabulary management Defines control symbols used as an indicator to change the behavior of the decoder.
- getControlSymbolsList() - Method in class sentencepiece.SentencepieceModel.TrainerSpec
-
///////////////////////////////////////////////////////////////// Vocabulary management Defines control symbols used as an indicator to change the behavior of the decoder.
- getControlSymbolsList() - Method in interface sentencepiece.SentencepieceModel.TrainerSpecOrBuilder
-
///////////////////////////////////////////////////////////////// Vocabulary management Defines control symbols used as an indicator to change the behavior of the decoder.
- getCountry() - Method in class com.yahoo.language.detect.Hint
- getDefaultInstance() - Static method in class sentencepiece.SentencepieceModel.ModelProto
- getDefaultInstance() - Static method in class sentencepiece.SentencepieceModel.ModelProto.SentencePiece
- getDefaultInstance() - Static method in class sentencepiece.SentencepieceModel.NormalizerSpec
- getDefaultInstance() - Static method in class sentencepiece.SentencepieceModel.SelfTestData
- getDefaultInstance() - Static method in class sentencepiece.SentencepieceModel.SelfTestData.Sample
- getDefaultInstance() - Static method in class sentencepiece.SentencepieceModel.TrainerSpec
- getDefaultInstanceForType() - Method in class sentencepiece.SentencepieceModel.ModelProto.Builder
- getDefaultInstanceForType() - Method in class sentencepiece.SentencepieceModel.ModelProto
- getDefaultInstanceForType() - Method in class sentencepiece.SentencepieceModel.ModelProto.SentencePiece.Builder
- getDefaultInstanceForType() - Method in class sentencepiece.SentencepieceModel.ModelProto.SentencePiece
- getDefaultInstanceForType() - Method in class sentencepiece.SentencepieceModel.NormalizerSpec.Builder
- getDefaultInstanceForType() - Method in class sentencepiece.SentencepieceModel.NormalizerSpec
- getDefaultInstanceForType() - Method in class sentencepiece.SentencepieceModel.SelfTestData.Builder
- getDefaultInstanceForType() - Method in class sentencepiece.SentencepieceModel.SelfTestData
- getDefaultInstanceForType() - Method in class sentencepiece.SentencepieceModel.SelfTestData.Sample.Builder
- getDefaultInstanceForType() - Method in class sentencepiece.SentencepieceModel.SelfTestData.Sample
- getDefaultInstanceForType() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
- getDefaultInstanceForType() - Method in class sentencepiece.SentencepieceModel.TrainerSpec
- getDefMd5() - Method in class com.yahoo.language.sentencepiece.SentencePieceConfig.Builder
- getDefMd5() - Static method in class com.yahoo.language.sentencepiece.SentencePieceConfig
- getDefName() - Method in class com.yahoo.language.sentencepiece.SentencePieceConfig.Builder
- getDefName() - Static method in class com.yahoo.language.sentencepiece.SentencePieceConfig
- getDefNamespace() - Method in class com.yahoo.language.sentencepiece.SentencePieceConfig.Builder
- getDefNamespace() - Static method in class com.yahoo.language.sentencepiece.SentencePieceConfig
- getDefVersion() - Static method in class com.yahoo.language.sentencepiece.SentencePieceConfig
- getDenormalizerSpec() - Method in class sentencepiece.SentencepieceModel.ModelProto.Builder
-
Spec for text de-normalization.
- getDenormalizerSpec() - Method in class sentencepiece.SentencepieceModel.ModelProto
-
Spec for text de-normalization.
- getDenormalizerSpec() - Method in interface sentencepiece.SentencepieceModel.ModelProtoOrBuilder
-
Spec for text de-normalization.
- getDenormalizerSpecBuilder() - Method in class sentencepiece.SentencepieceModel.ModelProto.Builder
-
Spec for text de-normalization.
- getDenormalizerSpecOrBuilder() - Method in class sentencepiece.SentencepieceModel.ModelProto.Builder
-
Spec for text de-normalization.
- getDenormalizerSpecOrBuilder() - Method in class sentencepiece.SentencepieceModel.ModelProto
-
Spec for text de-normalization.
- getDenormalizerSpecOrBuilder() - Method in interface sentencepiece.SentencepieceModel.ModelProtoOrBuilder
-
Spec for text de-normalization.
- getDescriptor() - Static method in class sentencepiece.SentencepieceModel
- getDescriptor() - Static method in class sentencepiece.SentencepieceModel.ModelProto.Builder
- getDescriptor() - Static method in class sentencepiece.SentencepieceModel.ModelProto
- getDescriptor() - Static method in class sentencepiece.SentencepieceModel.ModelProto.SentencePiece.Builder
- getDescriptor() - Static method in class sentencepiece.SentencepieceModel.ModelProto.SentencePiece
- getDescriptor() - Static method in enum sentencepiece.SentencepieceModel.ModelProto.SentencePiece.Type
- getDescriptor() - Static method in class sentencepiece.SentencepieceModel.NormalizerSpec.Builder
- getDescriptor() - Static method in class sentencepiece.SentencepieceModel.NormalizerSpec
- getDescriptor() - Static method in class sentencepiece.SentencepieceModel.SelfTestData.Builder
- getDescriptor() - Static method in class sentencepiece.SentencepieceModel.SelfTestData
- getDescriptor() - Static method in class sentencepiece.SentencepieceModel.SelfTestData.Sample.Builder
- getDescriptor() - Static method in class sentencepiece.SentencepieceModel.SelfTestData.Sample
- getDescriptor() - Static method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
- getDescriptor() - Static method in class sentencepiece.SentencepieceModel.TrainerSpec
- getDescriptor() - Static method in enum sentencepiece.SentencepieceModel.TrainerSpec.ModelType
- getDescriptorForType() - Method in class sentencepiece.SentencepieceModel.ModelProto.Builder
- getDescriptorForType() - Method in class sentencepiece.SentencepieceModel.ModelProto.SentencePiece.Builder
- getDescriptorForType() - Method in enum sentencepiece.SentencepieceModel.ModelProto.SentencePiece.Type
- getDescriptorForType() - Method in class sentencepiece.SentencepieceModel.NormalizerSpec.Builder
- getDescriptorForType() - Method in class sentencepiece.SentencepieceModel.SelfTestData.Builder
- getDescriptorForType() - Method in class sentencepiece.SentencepieceModel.SelfTestData.Sample.Builder
- getDescriptorForType() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
- getDescriptorForType() - Method in enum sentencepiece.SentencepieceModel.TrainerSpec.ModelType
- getDetector() - Method in interface com.yahoo.language.Linguistics
-
Returns a thread-unsafe detector.
- getEncoding() - Method in class com.yahoo.language.detect.Detection
- getEncodingName() - Method in class com.yahoo.language.detect.Detection
- getEosId() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
-
</s>
- getEosId() - Method in class sentencepiece.SentencepieceModel.TrainerSpec
-
</s>
- getEosId() - Method in interface sentencepiece.SentencepieceModel.TrainerSpecOrBuilder
-
</s>
- getEosPiece() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
-
optional string eos_piece = 47 [default = "</s>"];
- getEosPiece() - Method in class sentencepiece.SentencepieceModel.TrainerSpec
-
optional string eos_piece = 47 [default = "</s>"];
- getEosPiece() - Method in interface sentencepiece.SentencepieceModel.TrainerSpecOrBuilder
-
optional string eos_piece = 47 [default = "</s>"];
- getEosPieceBytes() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
-
optional string eos_piece = 47 [default = "</s>"];
- getEosPieceBytes() - Method in class sentencepiece.SentencepieceModel.TrainerSpec
-
optional string eos_piece = 47 [default = "</s>"];
- getEosPieceBytes() - Method in interface sentencepiece.SentencepieceModel.TrainerSpecOrBuilder
-
optional string eos_piece = 47 [default = "</s>"];
- getEscapeWhitespaces() - Method in class sentencepiece.SentencepieceModel.NormalizerSpec.Builder
-
Replaces whitespace with meta symbol.
- getEscapeWhitespaces() - Method in class sentencepiece.SentencepieceModel.NormalizerSpec
-
Replaces whitespace with meta symbol.
- getEscapeWhitespaces() - Method in interface sentencepiece.SentencepieceModel.NormalizerSpecOrBuilder
-
Replaces whitespace with meta symbol.
- getExpected() - Method in class sentencepiece.SentencepieceModel.SelfTestData.Sample.Builder
-
optional string expected = 2;
- getExpected() - Method in class sentencepiece.SentencepieceModel.SelfTestData.Sample
-
optional string expected = 2;
- getExpected() - Method in interface sentencepiece.SentencepieceModel.SelfTestData.SampleOrBuilder
-
optional string expected = 2;
- getExpectedBytes() - Method in class sentencepiece.SentencepieceModel.SelfTestData.Sample.Builder
-
optional string expected = 2;
- getExpectedBytes() - Method in class sentencepiece.SentencepieceModel.SelfTestData.Sample
-
optional string expected = 2;
- getExpectedBytes() - Method in interface sentencepiece.SentencepieceModel.SelfTestData.SampleOrBuilder
-
optional string expected = 2;
- getGramSplitter() - Method in interface com.yahoo.language.Linguistics
-
Returns a thread-unsafe gram splitter.
- getHardVocabLimit() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
-
`vocab_size` is treated as hard limit.
- getHardVocabLimit() - Method in class sentencepiece.SentencepieceModel.TrainerSpec
-
`vocab_size` is treated as hard limit.
- getHardVocabLimit() - Method in interface sentencepiece.SentencepieceModel.TrainerSpecOrBuilder
-
`vocab_size` is treated as hard limit.
- getInput() - Method in class sentencepiece.SentencepieceModel.SelfTestData.Sample.Builder
-
optional string input = 1;
- getInput() - Method in class sentencepiece.SentencepieceModel.SelfTestData.Sample
-
optional string input = 1;
- getInput() - Method in interface sentencepiece.SentencepieceModel.SelfTestData.SampleOrBuilder
-
optional string input = 1;
- getInput(int) - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
-
///////////////////////////////////////////////////////////////// General parameters Input corpus files.
- getInput(int) - Method in class sentencepiece.SentencepieceModel.TrainerSpec
-
///////////////////////////////////////////////////////////////// General parameters Input corpus files.
- getInput(int) - Method in interface sentencepiece.SentencepieceModel.TrainerSpecOrBuilder
-
///////////////////////////////////////////////////////////////// General parameters Input corpus files.
- getInputBytes() - Method in class sentencepiece.SentencepieceModel.SelfTestData.Sample.Builder
-
optional string input = 1;
- getInputBytes() - Method in class sentencepiece.SentencepieceModel.SelfTestData.Sample
-
optional string input = 1;
- getInputBytes() - Method in interface sentencepiece.SentencepieceModel.SelfTestData.SampleOrBuilder
-
optional string input = 1;
- getInputBytes(int) - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
-
///////////////////////////////////////////////////////////////// General parameters Input corpus files.
- getInputBytes(int) - Method in class sentencepiece.SentencepieceModel.TrainerSpec
-
///////////////////////////////////////////////////////////////// General parameters Input corpus files.
- getInputBytes(int) - Method in interface sentencepiece.SentencepieceModel.TrainerSpecOrBuilder
-
///////////////////////////////////////////////////////////////// General parameters Input corpus files.
- getInputCount() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
-
///////////////////////////////////////////////////////////////// General parameters Input corpus files.
- getInputCount() - Method in class sentencepiece.SentencepieceModel.TrainerSpec
-
///////////////////////////////////////////////////////////////// General parameters Input corpus files.
- getInputCount() - Method in interface sentencepiece.SentencepieceModel.TrainerSpecOrBuilder
-
///////////////////////////////////////////////////////////////// General parameters Input corpus files.
- getInputFormat() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
-
Input corpus format: "text": one-sentence-per-line text format (default) "tsv": sentence <tab> freq
- getInputFormat() - Method in class sentencepiece.SentencepieceModel.TrainerSpec
-
Input corpus format: "text": one-sentence-per-line text format (default) "tsv": sentence <tab> freq
- getInputFormat() - Method in interface sentencepiece.SentencepieceModel.TrainerSpecOrBuilder
-
Input corpus format: "text": one-sentence-per-line text format (default) "tsv": sentence <tab> freq
- getInputFormatBytes() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
-
Input corpus format: "text": one-sentence-per-line text format (default) "tsv": sentence <tab> freq
- getInputFormatBytes() - Method in class sentencepiece.SentencepieceModel.TrainerSpec
-
Input corpus format: "text": one-sentence-per-line text format (default) "tsv": sentence <tab> freq
- getInputFormatBytes() - Method in interface sentencepiece.SentencepieceModel.TrainerSpecOrBuilder
-
Input corpus format: "text": one-sentence-per-line text format (default) "tsv": sentence <tab> freq
- getInputList() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
-
///////////////////////////////////////////////////////////////// General parameters Input corpus files.
- getInputList() - Method in class sentencepiece.SentencepieceModel.TrainerSpec
-
///////////////////////////////////////////////////////////////// General parameters Input corpus files.
- getInputList() - Method in interface sentencepiece.SentencepieceModel.TrainerSpecOrBuilder
-
///////////////////////////////////////////////////////////////// General parameters Input corpus files.
- getInputSentenceSize() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
-
Maximum size of sentences the trainer loads from `input` parameter.
- getInputSentenceSize() - Method in class sentencepiece.SentencepieceModel.TrainerSpec
-
Maximum size of sentences the trainer loads from `input` parameter.
- getInputSentenceSize() - Method in interface sentencepiece.SentencepieceModel.TrainerSpecOrBuilder
-
Maximum size of sentences the trainer loads from `input` parameter.
- getLanguage() - Method in class com.yahoo.language.detect.Detection
- getMarket() - Method in class com.yahoo.language.detect.Hint
- getMaxSentenceLength() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
-
The maximum sentence length in byte.
- getMaxSentenceLength() - Method in class sentencepiece.SentencepieceModel.TrainerSpec
-
The maximum sentence length in byte.
- getMaxSentenceLength() - Method in interface sentencepiece.SentencepieceModel.TrainerSpecOrBuilder
-
The maximum sentence length in byte.
- getMaxSentencepieceLength() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
-
///////////////////////////////////////////////////////////////// SentencePiece parameters which control the shapes of sentence piece.
- getMaxSentencepieceLength() - Method in class sentencepiece.SentencepieceModel.TrainerSpec
-
///////////////////////////////////////////////////////////////// SentencePiece parameters which control the shapes of sentence piece.
- getMaxSentencepieceLength() - Method in interface sentencepiece.SentencepieceModel.TrainerSpecOrBuilder
-
///////////////////////////////////////////////////////////////// SentencePiece parameters which control the shapes of sentence piece.
- getMiningSentenceSize() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
-
Deprecated.
- getMiningSentenceSize() - Method in class sentencepiece.SentencepieceModel.TrainerSpec
-
Deprecated.
- getMiningSentenceSize() - Method in interface sentencepiece.SentencepieceModel.TrainerSpecOrBuilder
-
Deprecated.
- getModelPrefix() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
-
Output model file prefix.
- getModelPrefix() - Method in class sentencepiece.SentencepieceModel.TrainerSpec
-
Output model file prefix.
- getModelPrefix() - Method in interface sentencepiece.SentencepieceModel.TrainerSpecOrBuilder
-
Output model file prefix.
- getModelPrefixBytes() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
-
Output model file prefix.
- getModelPrefixBytes() - Method in class sentencepiece.SentencepieceModel.TrainerSpec
-
Output model file prefix.
- getModelPrefixBytes() - Method in interface sentencepiece.SentencepieceModel.TrainerSpecOrBuilder
-
Output model file prefix.
- getModels() - Method in class com.yahoo.language.sentencepiece.SentencePieceEncoder.Builder
- getModelType() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
-
optional .sentencepiece.TrainerSpec.ModelType model_type = 3 [default = UNIGRAM];
- getModelType() - Method in class sentencepiece.SentencepieceModel.TrainerSpec
-
optional .sentencepiece.TrainerSpec.ModelType model_type = 3 [default = UNIGRAM];
- getModelType() - Method in interface sentencepiece.SentencepieceModel.TrainerSpecOrBuilder
-
optional .sentencepiece.TrainerSpec.ModelType model_type = 3 [default = UNIGRAM];
- getName() - Method in class sentencepiece.SentencepieceModel.NormalizerSpec.Builder
-
name of normalization rule.
- getName() - Method in class sentencepiece.SentencepieceModel.NormalizerSpec
-
name of normalization rule.
- getName() - Method in interface sentencepiece.SentencepieceModel.NormalizerSpecOrBuilder
-
name of normalization rule.
- getNameBytes() - Method in class sentencepiece.SentencepieceModel.NormalizerSpec.Builder
-
name of normalization rule.
- getNameBytes() - Method in class sentencepiece.SentencepieceModel.NormalizerSpec
-
name of normalization rule.
- getNameBytes() - Method in interface sentencepiece.SentencepieceModel.NormalizerSpecOrBuilder
-
name of normalization rule.
- getNormalizationRuleTsv() - Method in class sentencepiece.SentencepieceModel.NormalizerSpec.Builder
-
Custom normalization rule file in TSV format.
- getNormalizationRuleTsv() - Method in class sentencepiece.SentencepieceModel.NormalizerSpec
-
Custom normalization rule file in TSV format.
- getNormalizationRuleTsv() - Method in interface sentencepiece.SentencepieceModel.NormalizerSpecOrBuilder
-
Custom normalization rule file in TSV format.
- getNormalizationRuleTsvBytes() - Method in class sentencepiece.SentencepieceModel.NormalizerSpec.Builder
-
Custom normalization rule file in TSV format.
- getNormalizationRuleTsvBytes() - Method in class sentencepiece.SentencepieceModel.NormalizerSpec
-
Custom normalization rule file in TSV format.
- getNormalizationRuleTsvBytes() - Method in interface sentencepiece.SentencepieceModel.NormalizerSpecOrBuilder
-
Custom normalization rule file in TSV format.
- getNormalizer() - Method in interface com.yahoo.language.Linguistics
-
Returns a thread-unsafe normalizer.
- getNormalizerSpec() - Method in class sentencepiece.SentencepieceModel.ModelProto.Builder
-
Spec for text normalization.
- getNormalizerSpec() - Method in class sentencepiece.SentencepieceModel.ModelProto
-
Spec for text normalization.
- getNormalizerSpec() - Method in interface sentencepiece.SentencepieceModel.ModelProtoOrBuilder
-
Spec for text normalization.
- getNormalizerSpecBuilder() - Method in class sentencepiece.SentencepieceModel.ModelProto.Builder
-
Spec for text normalization.
- getNormalizerSpecOrBuilder() - Method in class sentencepiece.SentencepieceModel.ModelProto.Builder
-
Spec for text normalization.
- getNormalizerSpecOrBuilder() - Method in class sentencepiece.SentencepieceModel.ModelProto
-
Spec for text normalization.
- getNormalizerSpecOrBuilder() - Method in interface sentencepiece.SentencepieceModel.ModelProtoOrBuilder
-
Spec for text normalization.
- getNumber() - Method in enum sentencepiece.SentencepieceModel.ModelProto.SentencePiece.Type
- getNumber() - Method in enum sentencepiece.SentencepieceModel.TrainerSpec.ModelType
- getNumComponents() - Method in interface com.yahoo.language.process.Token
-
Returns the number of components, if this token is a compound word (e.g.
- getNumStems() - Method in interface com.yahoo.language.process.Token
-
Returns the number of stem forms available for this token.
- getNumSubIterations() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
-
Number of EM sub iterations.
- getNumSubIterations() - Method in class sentencepiece.SentencepieceModel.TrainerSpec
-
Number of EM sub iterations.
- getNumSubIterations() - Method in interface sentencepiece.SentencepieceModel.TrainerSpecOrBuilder
-
Number of EM sub iterations.
- getNumThreads() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
-
Number of threads in the training.
- getNumThreads() - Method in class sentencepiece.SentencepieceModel.TrainerSpec
-
Number of threads in the training.
- getNumThreads() - Method in interface sentencepiece.SentencepieceModel.TrainerSpecOrBuilder
-
Number of threads in the training.
- getOffset() - Method in interface com.yahoo.language.process.Token
-
Returns the offset position of this token
- getOrig() - Method in interface com.yahoo.language.process.Token
-
Returns the original form of this token
- getPadId() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
-
<pad> (padding)
- getPadId() - Method in class sentencepiece.SentencepieceModel.TrainerSpec
-
<pad> (padding)
- getPadId() - Method in interface sentencepiece.SentencepieceModel.TrainerSpecOrBuilder
-
<pad> (padding)
- getPadPiece() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
-
optional string pad_piece = 48 [default = "<pad>"];
- getPadPiece() - Method in class sentencepiece.SentencepieceModel.TrainerSpec
-
optional string pad_piece = 48 [default = "<pad>"];
- getPadPiece() - Method in interface sentencepiece.SentencepieceModel.TrainerSpecOrBuilder
-
optional string pad_piece = 48 [default = "<pad>"];
- getPadPieceBytes() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
-
optional string pad_piece = 48 [default = "<pad>"];
- getPadPieceBytes() - Method in class sentencepiece.SentencepieceModel.TrainerSpec
-
optional string pad_piece = 48 [default = "<pad>"];
- getPadPieceBytes() - Method in interface sentencepiece.SentencepieceModel.TrainerSpecOrBuilder
-
optional string pad_piece = 48 [default = "<pad>"];
- getParserForType() - Method in class sentencepiece.SentencepieceModel.ModelProto
- getParserForType() - Method in class sentencepiece.SentencepieceModel.ModelProto.SentencePiece
- getParserForType() - Method in class sentencepiece.SentencepieceModel.NormalizerSpec
- getParserForType() - Method in class sentencepiece.SentencepieceModel.SelfTestData
- getParserForType() - Method in class sentencepiece.SentencepieceModel.SelfTestData.Sample
- getParserForType() - Method in class sentencepiece.SentencepieceModel.TrainerSpec
- getPiece() - Method in class sentencepiece.SentencepieceModel.ModelProto.SentencePiece.Builder
-
piece must not be empty.
- getPiece() - Method in class sentencepiece.SentencepieceModel.ModelProto.SentencePiece
-
piece must not be empty.
- getPiece() - Method in interface sentencepiece.SentencepieceModel.ModelProto.SentencePieceOrBuilder
-
piece must not be empty.
- getPieceBytes() - Method in class sentencepiece.SentencepieceModel.ModelProto.SentencePiece.Builder
-
piece must not be empty.
- getPieceBytes() - Method in class sentencepiece.SentencepieceModel.ModelProto.SentencePiece
-
piece must not be empty.
- getPieceBytes() - Method in interface sentencepiece.SentencepieceModel.ModelProto.SentencePieceOrBuilder
-
piece must not be empty.
- getPieces(int) - Method in class sentencepiece.SentencepieceModel.ModelProto.Builder
-
Sentence pieces with scores.
- getPieces(int) - Method in class sentencepiece.SentencepieceModel.ModelProto
-
Sentence pieces with scores.
- getPieces(int) - Method in interface sentencepiece.SentencepieceModel.ModelProtoOrBuilder
-
Sentence pieces with scores.
- getPiecesBuilder(int) - Method in class sentencepiece.SentencepieceModel.ModelProto.Builder
-
Sentence pieces with scores.
- getPiecesBuilderList() - Method in class sentencepiece.SentencepieceModel.ModelProto.Builder
-
Sentence pieces with scores.
- getPiecesCount() - Method in class sentencepiece.SentencepieceModel.ModelProto.Builder
-
Sentence pieces with scores.
- getPiecesCount() - Method in class sentencepiece.SentencepieceModel.ModelProto
-
Sentence pieces with scores.
- getPiecesCount() - Method in interface sentencepiece.SentencepieceModel.ModelProtoOrBuilder
-
Sentence pieces with scores.
- getPiecesList() - Method in class sentencepiece.SentencepieceModel.ModelProto.Builder
-
Sentence pieces with scores.
- getPiecesList() - Method in class sentencepiece.SentencepieceModel.ModelProto
-
Sentence pieces with scores.
- getPiecesList() - Method in interface sentencepiece.SentencepieceModel.ModelProtoOrBuilder
-
Sentence pieces with scores.
- getPiecesOrBuilder(int) - Method in class sentencepiece.SentencepieceModel.ModelProto.Builder
-
Sentence pieces with scores.
- getPiecesOrBuilder(int) - Method in class sentencepiece.SentencepieceModel.ModelProto
-
Sentence pieces with scores.
- getPiecesOrBuilder(int) - Method in interface sentencepiece.SentencepieceModel.ModelProtoOrBuilder
-
Sentence pieces with scores.
- getPiecesOrBuilderList() - Method in class sentencepiece.SentencepieceModel.ModelProto.Builder
-
Sentence pieces with scores.
- getPiecesOrBuilderList() - Method in class sentencepiece.SentencepieceModel.ModelProto
-
Sentence pieces with scores.
- getPiecesOrBuilderList() - Method in interface sentencepiece.SentencepieceModel.ModelProtoOrBuilder
-
Sentence pieces with scores.
- getPrecompiledCharsmap() - Method in class sentencepiece.SentencepieceModel.NormalizerSpec.Builder
-
Pre-compiled normalization rule created by Builder::GetPrecompiledCharsMap() or Builder::CompileCharsMap() method.
- getPrecompiledCharsmap() - Method in class sentencepiece.SentencepieceModel.NormalizerSpec
-
Pre-compiled normalization rule created by Builder::GetPrecompiledCharsMap() or Builder::CompileCharsMap() method.
- getPrecompiledCharsmap() - Method in interface sentencepiece.SentencepieceModel.NormalizerSpecOrBuilder
-
Pre-compiled normalization rule created by Builder::GetPrecompiledCharsMap() or Builder::CompileCharsMap() method.
- getRemoveExtraWhitespaces() - Method in class sentencepiece.SentencepieceModel.NormalizerSpec.Builder
-
Removes leading, trailing, and duplicate internal whitespace.
- getRemoveExtraWhitespaces() - Method in class sentencepiece.SentencepieceModel.NormalizerSpec
-
Removes leading, trailing, and duplicate internal whitespace.
- getRemoveExtraWhitespaces() - Method in interface sentencepiece.SentencepieceModel.NormalizerSpecOrBuilder
-
Removes leading, trailing, and duplicate internal whitespace.
- getReplacementTerm(String) - Method in interface com.yahoo.language.process.Tokenizer
-
Deprecated.replacements are already applied in tokens returned by tokenize
- getRequiredChars() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
-
Defines required characters.
- getRequiredChars() - Method in class sentencepiece.SentencepieceModel.TrainerSpec
-
Defines required characters.
- getRequiredChars() - Method in interface sentencepiece.SentencepieceModel.TrainerSpecOrBuilder
-
Defines required characters.
- getRequiredCharsBytes() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
-
Defines required characters.
- getRequiredCharsBytes() - Method in class sentencepiece.SentencepieceModel.TrainerSpec
-
Defines required characters.
- getRequiredCharsBytes() - Method in interface sentencepiece.SentencepieceModel.TrainerSpecOrBuilder
-
Defines required characters.
- getSamples(int) - Method in class sentencepiece.SentencepieceModel.SelfTestData.Builder
-
repeated .sentencepiece.SelfTestData.Sample samples = 1;
- getSamples(int) - Method in class sentencepiece.SentencepieceModel.SelfTestData
-
repeated .sentencepiece.SelfTestData.Sample samples = 1;
- getSamples(int) - Method in interface sentencepiece.SentencepieceModel.SelfTestDataOrBuilder
-
repeated .sentencepiece.SelfTestData.Sample samples = 1;
- getSamplesBuilder(int) - Method in class sentencepiece.SentencepieceModel.SelfTestData.Builder
-
repeated .sentencepiece.SelfTestData.Sample samples = 1;
- getSamplesBuilderList() - Method in class sentencepiece.SentencepieceModel.SelfTestData.Builder
-
repeated .sentencepiece.SelfTestData.Sample samples = 1;
- getSamplesCount() - Method in class sentencepiece.SentencepieceModel.SelfTestData.Builder
-
repeated .sentencepiece.SelfTestData.Sample samples = 1;
- getSamplesCount() - Method in class sentencepiece.SentencepieceModel.SelfTestData
-
repeated .sentencepiece.SelfTestData.Sample samples = 1;
- getSamplesCount() - Method in interface sentencepiece.SentencepieceModel.SelfTestDataOrBuilder
-
repeated .sentencepiece.SelfTestData.Sample samples = 1;
- getSamplesList() - Method in class sentencepiece.SentencepieceModel.SelfTestData.Builder
-
repeated .sentencepiece.SelfTestData.Sample samples = 1;
- getSamplesList() - Method in class sentencepiece.SentencepieceModel.SelfTestData
-
repeated .sentencepiece.SelfTestData.Sample samples = 1;
- getSamplesList() - Method in interface sentencepiece.SentencepieceModel.SelfTestDataOrBuilder
-
repeated .sentencepiece.SelfTestData.Sample samples = 1;
- getSamplesOrBuilder(int) - Method in class sentencepiece.SentencepieceModel.SelfTestData.Builder
-
repeated .sentencepiece.SelfTestData.Sample samples = 1;
- getSamplesOrBuilder(int) - Method in class sentencepiece.SentencepieceModel.SelfTestData
-
repeated .sentencepiece.SelfTestData.Sample samples = 1;
- getSamplesOrBuilder(int) - Method in interface sentencepiece.SentencepieceModel.SelfTestDataOrBuilder
-
repeated .sentencepiece.SelfTestData.Sample samples = 1;
- getSamplesOrBuilderList() - Method in class sentencepiece.SentencepieceModel.SelfTestData.Builder
-
repeated .sentencepiece.SelfTestData.Sample samples = 1;
- getSamplesOrBuilderList() - Method in class sentencepiece.SentencepieceModel.SelfTestData
-
repeated .sentencepiece.SelfTestData.Sample samples = 1;
- getSamplesOrBuilderList() - Method in interface sentencepiece.SentencepieceModel.SelfTestDataOrBuilder
-
repeated .sentencepiece.SelfTestData.Sample samples = 1;
- getScore() - Method in class sentencepiece.SentencepieceModel.ModelProto.SentencePiece.Builder
-
optional float score = 2;
- getScore() - Method in class sentencepiece.SentencepieceModel.ModelProto.SentencePiece
-
optional float score = 2;
- getScore() - Method in interface sentencepiece.SentencepieceModel.ModelProto.SentencePieceOrBuilder
-
optional float score = 2;
- getScoring() - Method in class com.yahoo.language.sentencepiece.SentencePieceEncoder.Builder
- getScript() - Method in interface com.yahoo.language.process.Token
-
Returns the script of this token
- getSeedSentencepieceSize() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
-
The size of seed sentencepieces.
- getSeedSentencepieceSize() - Method in class sentencepiece.SentencepieceModel.TrainerSpec
-
The size of seed sentencepieces.
- getSeedSentencepieceSize() - Method in interface sentencepiece.SentencepieceModel.TrainerSpecOrBuilder
-
The size of seed sentencepieces.
- getSegmenter() - Method in interface com.yahoo.language.Linguistics
-
Returns a thread-unsafe segmenter.
- getSelfTestData() - Method in class sentencepiece.SentencepieceModel.ModelProto.Builder
-
Stores sample input and its expected segmentation to verify the model.
- getSelfTestData() - Method in class sentencepiece.SentencepieceModel.ModelProto
-
Stores sample input and its expected segmentation to verify the model.
- getSelfTestData() - Method in interface sentencepiece.SentencepieceModel.ModelProtoOrBuilder
-
Stores sample input and its expected segmentation to verify the model.
- getSelfTestDataBuilder() - Method in class sentencepiece.SentencepieceModel.ModelProto.Builder
-
Stores sample input and its expected segmentation to verify the model.
- getSelfTestDataOrBuilder() - Method in class sentencepiece.SentencepieceModel.ModelProto.Builder
-
Stores sample input and its expected segmentation to verify the model.
- getSelfTestDataOrBuilder() - Method in class sentencepiece.SentencepieceModel.ModelProto
-
Stores sample input and its expected segmentation to verify the model.
- getSelfTestDataOrBuilder() - Method in interface sentencepiece.SentencepieceModel.ModelProtoOrBuilder
-
Stores sample input and its expected segmentation to verify the model.
- getSelfTestSampleSize() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
-
Size of self-test samples, which are encoded in the model file.
- getSelfTestSampleSize() - Method in class sentencepiece.SentencepieceModel.TrainerSpec
-
Size of self-test samples, which are encoded in the model file.
- getSelfTestSampleSize() - Method in interface sentencepiece.SentencepieceModel.TrainerSpecOrBuilder
-
Size of self-test samples, which are encoded in the model file.
- getSerializedSize() - Method in class sentencepiece.SentencepieceModel.ModelProto
- getSerializedSize() - Method in class sentencepiece.SentencepieceModel.ModelProto.SentencePiece
- getSerializedSize() - Method in class sentencepiece.SentencepieceModel.NormalizerSpec
- getSerializedSize() - Method in class sentencepiece.SentencepieceModel.SelfTestData
- getSerializedSize() - Method in class sentencepiece.SentencepieceModel.SelfTestData.Sample
- getSerializedSize() - Method in class sentencepiece.SentencepieceModel.TrainerSpec
- getShrinkingFactor() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
-
In every EM sub-iterations, keeps top `shrinking_factor` * `current sentencepieces size` with respect to the loss of the sentence piece.
- getShrinkingFactor() - Method in class sentencepiece.SentencepieceModel.TrainerSpec
-
In every EM sub-iterations, keeps top `shrinking_factor` * `current sentencepieces size` with respect to the loss of the sentence piece.
- getShrinkingFactor() - Method in interface sentencepiece.SentencepieceModel.TrainerSpecOrBuilder
-
In every EM sub-iterations, keeps top `shrinking_factor` * `current sentencepieces size` with respect to the loss of the sentence piece.
- getShuffleInputSentence() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
-
optional bool shuffle_input_sentence = 19 [default = true];
- getShuffleInputSentence() - Method in class sentencepiece.SentencepieceModel.TrainerSpec
-
optional bool shuffle_input_sentence = 19 [default = true];
- getShuffleInputSentence() - Method in interface sentencepiece.SentencepieceModel.TrainerSpecOrBuilder
-
optional bool shuffle_input_sentence = 19 [default = true];
- getSpecialTokens(String) - Method in class com.yahoo.language.process.SpecialTokenRegistry
-
Returns the list of special tokens for a given name.
- getSplitByNumber() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
-
When `split_by_number` is true, put a boundary between number and non-number transition.
- getSplitByNumber() - Method in class sentencepiece.SentencepieceModel.TrainerSpec
-
When `split_by_number` is true, put a boundary between number and non-number transition.
- getSplitByNumber() - Method in interface sentencepiece.SentencepieceModel.TrainerSpecOrBuilder
-
When `split_by_number` is true, put a boundary between number and non-number transition.
- getSplitByUnicodeScript() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
-
Uses Unicode script to split sentence pieces.
- getSplitByUnicodeScript() - Method in class sentencepiece.SentencepieceModel.TrainerSpec
-
Uses Unicode script to split sentence pieces.
- getSplitByUnicodeScript() - Method in interface sentencepiece.SentencepieceModel.TrainerSpecOrBuilder
-
Uses Unicode script to split sentence pieces.
- getSplitByWhitespace() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
-
Use a white space to split sentence pieces.
- getSplitByWhitespace() - Method in class sentencepiece.SentencepieceModel.TrainerSpec
-
Use a white space to split sentence pieces.
- getSplitByWhitespace() - Method in interface sentencepiece.SentencepieceModel.TrainerSpecOrBuilder
-
Use a white space to split sentence pieces.
- getSplitDigits() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
-
Split all digits (0-9) into separate pieces.
- getSplitDigits() - Method in class sentencepiece.SentencepieceModel.TrainerSpec
-
Split all digits (0-9) into separate pieces.
- getSplitDigits() - Method in interface sentencepiece.SentencepieceModel.TrainerSpecOrBuilder
-
Split all digits (0-9) into separate pieces.
- getStart() - Method in class com.yahoo.language.process.GramSplitter.Gram
- getStem(int) - Method in interface com.yahoo.language.process.Token
-
Returns the stem at position i
- getStemmer() - Method in interface com.yahoo.language.Linguistics
-
Returns a thread-unsafe stemmer or lemmatizer.
- getTokenizer() - Method in interface com.yahoo.language.Linguistics
-
Returns a thread-unsafe tokenizer.
- getTokenString() - Method in interface com.yahoo.language.process.Token
-
Returns the token string in a form suitable for indexing: The most lowercased variant of the most processed token form available, If called on a compound token this returns a lowercased form of the entire word.
- getTrainerSpec() - Method in class sentencepiece.SentencepieceModel.ModelProto.Builder
-
Spec used to generate this model file.
- getTrainerSpec() - Method in class sentencepiece.SentencepieceModel.ModelProto
-
Spec used to generate this model file.
- getTrainerSpec() - Method in interface sentencepiece.SentencepieceModel.ModelProtoOrBuilder
-
Spec used to generate this model file.
- getTrainerSpecBuilder() - Method in class sentencepiece.SentencepieceModel.ModelProto.Builder
-
Spec used to generate this model file.
- getTrainerSpecOrBuilder() - Method in class sentencepiece.SentencepieceModel.ModelProto.Builder
-
Spec used to generate this model file.
- getTrainerSpecOrBuilder() - Method in class sentencepiece.SentencepieceModel.ModelProto
-
Spec used to generate this model file.
- getTrainerSpecOrBuilder() - Method in interface sentencepiece.SentencepieceModel.ModelProtoOrBuilder
-
Spec used to generate this model file.
- getTrainExtremelyLargeCorpus() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
-
Increase bit depth to allow unigram model training on large (>10M sentences) corpora.
- getTrainExtremelyLargeCorpus() - Method in class sentencepiece.SentencepieceModel.TrainerSpec
-
Increase bit depth to allow unigram model training on large (>10M sentences) corpora.
- getTrainExtremelyLargeCorpus() - Method in interface sentencepiece.SentencepieceModel.TrainerSpecOrBuilder
-
Increase bit depth to allow unigram model training on large (>10M sentences) corpora.
- getTrainingSentenceSize() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
-
Deprecated.
- getTrainingSentenceSize() - Method in class sentencepiece.SentencepieceModel.TrainerSpec
-
Deprecated.
- getTrainingSentenceSize() - Method in interface sentencepiece.SentencepieceModel.TrainerSpecOrBuilder
-
Deprecated.
- getTransformer() - Method in interface com.yahoo.language.Linguistics
-
Returns a thread-unsafe transformer.
- getTreatWhitespaceAsSuffix() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
-
Adds whitespace symbol (_) as a suffix instead of prefix.
- getTreatWhitespaceAsSuffix() - Method in class sentencepiece.SentencepieceModel.TrainerSpec
-
Adds whitespace symbol (_) as a suffix instead of prefix.
- getTreatWhitespaceAsSuffix() - Method in interface sentencepiece.SentencepieceModel.TrainerSpecOrBuilder
-
Adds whitespace symbol (_) as a suffix instead of prefix.
- getType() - Method in interface com.yahoo.language.process.Token
-
Returns the type of this token - word, space or punctuation etc.
- getType() - Method in class sentencepiece.SentencepieceModel.ModelProto.SentencePiece.Builder
-
optional .sentencepiece.ModelProto.SentencePiece.Type type = 3 [default = NORMAL];
- getType() - Method in class sentencepiece.SentencepieceModel.ModelProto.SentencePiece
-
optional .sentencepiece.ModelProto.SentencePiece.Type type = 3 [default = NORMAL];
- getType() - Method in interface sentencepiece.SentencepieceModel.ModelProto.SentencePieceOrBuilder
-
optional .sentencepiece.ModelProto.SentencePiece.Type type = 3 [default = NORMAL];
- getUnkId() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
-
///////////////////////////////////////////////////////////////// Reserved special meta tokens.
- getUnkId() - Method in class sentencepiece.SentencepieceModel.TrainerSpec
-
///////////////////////////////////////////////////////////////// Reserved special meta tokens.
- getUnkId() - Method in interface sentencepiece.SentencepieceModel.TrainerSpecOrBuilder
-
///////////////////////////////////////////////////////////////// Reserved special meta tokens.
- getUnknownFields() - Method in class sentencepiece.SentencepieceModel.ModelProto
- getUnknownFields() - Method in class sentencepiece.SentencepieceModel.ModelProto.SentencePiece
- getUnknownFields() - Method in class sentencepiece.SentencepieceModel.NormalizerSpec
- getUnknownFields() - Method in class sentencepiece.SentencepieceModel.SelfTestData
- getUnknownFields() - Method in class sentencepiece.SentencepieceModel.SelfTestData.Sample
- getUnknownFields() - Method in class sentencepiece.SentencepieceModel.TrainerSpec
- getUnkPiece() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
-
optional string unk_piece = 45 [default = "<unk>"];
- getUnkPiece() - Method in class sentencepiece.SentencepieceModel.TrainerSpec
-
optional string unk_piece = 45 [default = "<unk>"];
- getUnkPiece() - Method in interface sentencepiece.SentencepieceModel.TrainerSpecOrBuilder
-
optional string unk_piece = 45 [default = "<unk>"];
- getUnkPieceBytes() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
-
optional string unk_piece = 45 [default = "<unk>"];
- getUnkPieceBytes() - Method in class sentencepiece.SentencepieceModel.TrainerSpec
-
optional string unk_piece = 45 [default = "<unk>"];
- getUnkPieceBytes() - Method in interface sentencepiece.SentencepieceModel.TrainerSpecOrBuilder
-
optional string unk_piece = 45 [default = "<unk>"];
- getUnkSurface() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
-
Encodes <unk> into U+2047 (DOUBLE QUESTION MARK), since this character can be useful both for user and developer.
- getUnkSurface() - Method in class sentencepiece.SentencepieceModel.TrainerSpec
-
Encodes <unk> into U+2047 (DOUBLE QUESTION MARK), since this character can be useful both for user and developer.
- getUnkSurface() - Method in interface sentencepiece.SentencepieceModel.TrainerSpecOrBuilder
-
Encodes <unk> into U+2047 (DOUBLE QUESTION MARK), since this character can be useful both for user and developer.
- getUnkSurfaceBytes() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
-
Encodes <unk> into U+2047 (DOUBLE QUESTION MARK), since this character can be useful both for user and developer.
- getUnkSurfaceBytes() - Method in class sentencepiece.SentencepieceModel.TrainerSpec
-
Encodes <unk> into U+2047 (DOUBLE QUESTION MARK), since this character can be useful both for user and developer.
- getUnkSurfaceBytes() - Method in interface sentencepiece.SentencepieceModel.TrainerSpecOrBuilder
-
Encodes <unk> into U+2047 (DOUBLE QUESTION MARK), since this character can be useful both for user and developer.
- getUseAllVocab() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
-
use all symbols for vocab extraction.
- getUseAllVocab() - Method in class sentencepiece.SentencepieceModel.TrainerSpec
-
use all symbols for vocab extraction.
- getUseAllVocab() - Method in interface sentencepiece.SentencepieceModel.TrainerSpecOrBuilder
-
use all symbols for vocab extraction.
- getUserDefinedSymbols(int) - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
-
Defines user defined symbols.
- getUserDefinedSymbols(int) - Method in class sentencepiece.SentencepieceModel.TrainerSpec
-
Defines user defined symbols.
- getUserDefinedSymbols(int) - Method in interface sentencepiece.SentencepieceModel.TrainerSpecOrBuilder
-
Defines user defined symbols.
- getUserDefinedSymbolsBytes(int) - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
-
Defines user defined symbols.
- getUserDefinedSymbolsBytes(int) - Method in class sentencepiece.SentencepieceModel.TrainerSpec
-
Defines user defined symbols.
- getUserDefinedSymbolsBytes(int) - Method in interface sentencepiece.SentencepieceModel.TrainerSpecOrBuilder
-
Defines user defined symbols.
- getUserDefinedSymbolsCount() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
-
Defines user defined symbols.
- getUserDefinedSymbolsCount() - Method in class sentencepiece.SentencepieceModel.TrainerSpec
-
Defines user defined symbols.
- getUserDefinedSymbolsCount() - Method in interface sentencepiece.SentencepieceModel.TrainerSpecOrBuilder
-
Defines user defined symbols.
- getUserDefinedSymbolsList() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
-
Defines user defined symbols.
- getUserDefinedSymbolsList() - Method in class sentencepiece.SentencepieceModel.TrainerSpec
-
Defines user defined symbols.
- getUserDefinedSymbolsList() - Method in interface sentencepiece.SentencepieceModel.TrainerSpecOrBuilder
-
Defines user defined symbols.
- getValue() - Method in enum com.yahoo.language.process.TokenType
-
Returns an int code for this type
- getValueDescriptor() - Method in enum sentencepiece.SentencepieceModel.ModelProto.SentencePiece.Type
- getValueDescriptor() - Method in enum sentencepiece.SentencepieceModel.TrainerSpec.ModelType
- getVocabSize() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
-
Vocabulary size.
- getVocabSize() - Method in class sentencepiece.SentencepieceModel.TrainerSpec
-
Vocabulary size.
- getVocabSize() - Method in interface sentencepiece.SentencepieceModel.TrainerSpecOrBuilder
-
Vocabulary size.
- getVocabularyOutputPieceScore() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
-
When creating the vocabulary file, defines whether or not to additionally output the score for each piece.
- getVocabularyOutputPieceScore() - Method in class sentencepiece.SentencepieceModel.TrainerSpec
-
When creating the vocabulary file, defines whether or not to additionally output the score for each piece.
- getVocabularyOutputPieceScore() - Method in interface sentencepiece.SentencepieceModel.TrainerSpecOrBuilder
-
When creating the vocabulary file, defines whether or not to additionally output the score for each piece.
- GLAGOLITIC - com.yahoo.language.process.TokenScript
- GOTHIC - com.yahoo.language.Language
-
Language tag "got".
- GOTHIC - com.yahoo.language.process.TokenScript
- Gram(int, int) - Constructor for class com.yahoo.language.process.GramSplitter.Gram
- GRAM_SPLITTER - com.yahoo.language.Linguistics.Component
- GramSplitter - Class in com.yahoo.language.process
-
A class which splits consecutive word character sequences into overlapping character n-grams.
- GramSplitter(CharacterClasses) - Constructor for class com.yahoo.language.process.GramSplitter
- GramSplitter.Gram - Class in com.yahoo.language.process
-
An immutable start index and length pair
- GramSplitter.GramSplitterIterator - Class in com.yahoo.language.process
- GramSplitterIterator(String, int, CharacterClasses) - Constructor for class com.yahoo.language.process.GramSplitter.GramSplitterIterator
- GREEK - com.yahoo.language.Language
-
Language tag "el".
- GREEK - com.yahoo.language.process.TokenScript
- GREENLANDIC - com.yahoo.language.Language
-
Language tag "kl".
- GUARANI - com.yahoo.language.Language
-
Language tag "gn".
- GUJARATI - com.yahoo.language.Language
-
Language tag "gu".
- GUJARATI - com.yahoo.language.process.TokenScript
- GURMUKHI - com.yahoo.language.process.TokenScript
H
- HAN - com.yahoo.language.process.TokenScript
- HANGUL - com.yahoo.language.process.TokenScript
- HANUNOO - com.yahoo.language.process.TokenScript
- HARD_VOCAB_LIMIT_FIELD_NUMBER - Static variable in class sentencepiece.SentencepieceModel.TrainerSpec
- hasAddDummyPrefix() - Method in class sentencepiece.SentencepieceModel.NormalizerSpec.Builder
-
Adds dummy whitespace at the beginning of text in order to treat "world" in "world" and "hello world" in the same way.
- hasAddDummyPrefix() - Method in class sentencepiece.SentencepieceModel.NormalizerSpec
-
Adds dummy whitespace at the beginning of text in order to treat "world" in "world" and "hello world" in the same way.
- hasAddDummyPrefix() - Method in interface sentencepiece.SentencepieceModel.NormalizerSpecOrBuilder
-
Adds dummy whitespace at the beginning of text in order to treat "world" in "world" and "hello world" in the same way.
- hasAllowWhitespaceOnlyPieces() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
-
Allows pieces that only contain whitespaces instead of appearing only as prefix or suffix of other pieces.
- hasAllowWhitespaceOnlyPieces() - Method in class sentencepiece.SentencepieceModel.TrainerSpec
-
Allows pieces that only contain whitespaces instead of appearing only as prefix or suffix of other pieces.
- hasAllowWhitespaceOnlyPieces() - Method in interface sentencepiece.SentencepieceModel.TrainerSpecOrBuilder
-
Allows pieces that only contain whitespaces instead of appearing only as prefix or suffix of other pieces.
- hasBosId() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
-
<s>
- hasBosId() - Method in class sentencepiece.SentencepieceModel.TrainerSpec
-
<s>
- hasBosId() - Method in interface sentencepiece.SentencepieceModel.TrainerSpecOrBuilder
-
<s>
- hasBosPiece() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
-
optional string bos_piece = 46 [default = "<s>"];
- hasBosPiece() - Method in class sentencepiece.SentencepieceModel.TrainerSpec
-
optional string bos_piece = 46 [default = "<s>"];
- hasBosPiece() - Method in interface sentencepiece.SentencepieceModel.TrainerSpecOrBuilder
-
optional string bos_piece = 46 [default = "<s>"];
- hasByteFallback() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
-
Decomposes unknown pieces into UTF-8 bytes.
- hasByteFallback() - Method in class sentencepiece.SentencepieceModel.TrainerSpec
-
Decomposes unknown pieces into UTF-8 bytes.
- hasByteFallback() - Method in interface sentencepiece.SentencepieceModel.TrainerSpecOrBuilder
-
Decomposes unknown pieces into UTF-8 bytes.
- hasCharacterCoverage() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
-
///////////////////////////////////////////////////////////////// Training parameters.
- hasCharacterCoverage() - Method in class sentencepiece.SentencepieceModel.TrainerSpec
-
///////////////////////////////////////////////////////////////// Training parameters.
- hasCharacterCoverage() - Method in interface sentencepiece.SentencepieceModel.TrainerSpecOrBuilder
-
///////////////////////////////////////////////////////////////// Training parameters.
- hasDenormalizerSpec() - Method in class sentencepiece.SentencepieceModel.ModelProto.Builder
-
Spec for text de-normalization.
- hasDenormalizerSpec() - Method in class sentencepiece.SentencepieceModel.ModelProto
-
Spec for text de-normalization.
- hasDenormalizerSpec() - Method in interface sentencepiece.SentencepieceModel.ModelProtoOrBuilder
-
Spec for text de-normalization.
- hasEosId() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
-
</s>
- hasEosId() - Method in class sentencepiece.SentencepieceModel.TrainerSpec
-
</s>
- hasEosId() - Method in interface sentencepiece.SentencepieceModel.TrainerSpecOrBuilder
-
</s>
- hasEosPiece() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
-
optional string eos_piece = 47 [default = "</s>"];
- hasEosPiece() - Method in class sentencepiece.SentencepieceModel.TrainerSpec
-
optional string eos_piece = 47 [default = "</s>"];
- hasEosPiece() - Method in interface sentencepiece.SentencepieceModel.TrainerSpecOrBuilder
-
optional string eos_piece = 47 [default = "</s>"];
- hasEscapeWhitespaces() - Method in class sentencepiece.SentencepieceModel.NormalizerSpec.Builder
-
Replaces whitespace with meta symbol.
- hasEscapeWhitespaces() - Method in class sentencepiece.SentencepieceModel.NormalizerSpec
-
Replaces whitespace with meta symbol.
- hasEscapeWhitespaces() - Method in interface sentencepiece.SentencepieceModel.NormalizerSpecOrBuilder
-
Replaces whitespace with meta symbol.
- hasExpected() - Method in class sentencepiece.SentencepieceModel.SelfTestData.Sample.Builder
-
optional string expected = 2;
- hasExpected() - Method in class sentencepiece.SentencepieceModel.SelfTestData.Sample
-
optional string expected = 2;
- hasExpected() - Method in interface sentencepiece.SentencepieceModel.SelfTestData.SampleOrBuilder
-
optional string expected = 2;
- hasHardVocabLimit() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
-
`vocab_size` is treated as hard limit.
- hasHardVocabLimit() - Method in class sentencepiece.SentencepieceModel.TrainerSpec
-
`vocab_size` is treated as hard limit.
- hasHardVocabLimit() - Method in interface sentencepiece.SentencepieceModel.TrainerSpecOrBuilder
-
`vocab_size` is treated as hard limit.
- hashCode() - Method in class com.yahoo.language.process.GramSplitter.Gram
- hashCode() - Method in class com.yahoo.language.process.SpecialTokens.Token
- hashCode() - Method in class sentencepiece.SentencepieceModel.ModelProto
- hashCode() - Method in class sentencepiece.SentencepieceModel.ModelProto.SentencePiece
- hashCode() - Method in class sentencepiece.SentencepieceModel.NormalizerSpec
- hashCode() - Method in class sentencepiece.SentencepieceModel.SelfTestData
- hashCode() - Method in class sentencepiece.SentencepieceModel.SelfTestData.Sample
- hashCode() - Method in class sentencepiece.SentencepieceModel.TrainerSpec
- hasInput() - Method in class sentencepiece.SentencepieceModel.SelfTestData.Sample.Builder
-
optional string input = 1;
- hasInput() - Method in class sentencepiece.SentencepieceModel.SelfTestData.Sample
-
optional string input = 1;
- hasInput() - Method in interface sentencepiece.SentencepieceModel.SelfTestData.SampleOrBuilder
-
optional string input = 1;
- hasInputFormat() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
-
Input corpus format: "text": one-sentence-per-line text format (default) "tsv": sentence <tab> freq
- hasInputFormat() - Method in class sentencepiece.SentencepieceModel.TrainerSpec
-
Input corpus format: "text": one-sentence-per-line text format (default) "tsv": sentence <tab> freq
- hasInputFormat() - Method in interface sentencepiece.SentencepieceModel.TrainerSpecOrBuilder
-
Input corpus format: "text": one-sentence-per-line text format (default) "tsv": sentence <tab> freq
- hasInputSentenceSize() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
-
Maximum size of sentences the trainer loads from `input` parameter.
- hasInputSentenceSize() - Method in class sentencepiece.SentencepieceModel.TrainerSpec
-
Maximum size of sentences the trainer loads from `input` parameter.
- hasInputSentenceSize() - Method in interface sentencepiece.SentencepieceModel.TrainerSpecOrBuilder
-
Maximum size of sentences the trainer loads from `input` parameter.
- hasMaxSentenceLength() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
-
The maximum sentence length in byte.
- hasMaxSentenceLength() - Method in class sentencepiece.SentencepieceModel.TrainerSpec
-
The maximum sentence length in byte.
- hasMaxSentenceLength() - Method in interface sentencepiece.SentencepieceModel.TrainerSpecOrBuilder
-
The maximum sentence length in byte.
- hasMaxSentencepieceLength() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
-
///////////////////////////////////////////////////////////////// SentencePiece parameters which control the shapes of sentence piece.
- hasMaxSentencepieceLength() - Method in class sentencepiece.SentencepieceModel.TrainerSpec
-
///////////////////////////////////////////////////////////////// SentencePiece parameters which control the shapes of sentence piece.
- hasMaxSentencepieceLength() - Method in interface sentencepiece.SentencepieceModel.TrainerSpecOrBuilder
-
///////////////////////////////////////////////////////////////// SentencePiece parameters which control the shapes of sentence piece.
- hasMiningSentenceSize() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
-
Deprecated.
- hasMiningSentenceSize() - Method in class sentencepiece.SentencepieceModel.TrainerSpec
-
Deprecated.
- hasMiningSentenceSize() - Method in interface sentencepiece.SentencepieceModel.TrainerSpecOrBuilder
-
Deprecated.
- hasModelPrefix() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
-
Output model file prefix.
- hasModelPrefix() - Method in class sentencepiece.SentencepieceModel.TrainerSpec
-
Output model file prefix.
- hasModelPrefix() - Method in interface sentencepiece.SentencepieceModel.TrainerSpecOrBuilder
-
Output model file prefix.
- hasModelType() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
-
optional .sentencepiece.TrainerSpec.ModelType model_type = 3 [default = UNIGRAM];
- hasModelType() - Method in class sentencepiece.SentencepieceModel.TrainerSpec
-
optional .sentencepiece.TrainerSpec.ModelType model_type = 3 [default = UNIGRAM];
- hasModelType() - Method in interface sentencepiece.SentencepieceModel.TrainerSpecOrBuilder
-
optional .sentencepiece.TrainerSpec.ModelType model_type = 3 [default = UNIGRAM];
- hasName() - Method in class sentencepiece.SentencepieceModel.NormalizerSpec.Builder
-
name of normalization rule.
- hasName() - Method in class sentencepiece.SentencepieceModel.NormalizerSpec
-
name of normalization rule.
- hasName() - Method in interface sentencepiece.SentencepieceModel.NormalizerSpecOrBuilder
-
name of normalization rule.
- hasNext() - Method in class com.yahoo.language.process.GramSplitter.GramSplitterIterator
- hasNormalizationRuleTsv() - Method in class sentencepiece.SentencepieceModel.NormalizerSpec.Builder
-
Custom normalization rule file in TSV format.
- hasNormalizationRuleTsv() - Method in class sentencepiece.SentencepieceModel.NormalizerSpec
-
Custom normalization rule file in TSV format.
- hasNormalizationRuleTsv() - Method in interface sentencepiece.SentencepieceModel.NormalizerSpecOrBuilder
-
Custom normalization rule file in TSV format.
- hasNormalizerSpec() - Method in class sentencepiece.SentencepieceModel.ModelProto.Builder
-
Spec for text normalization.
- hasNormalizerSpec() - Method in class sentencepiece.SentencepieceModel.ModelProto
-
Spec for text normalization.
- hasNormalizerSpec() - Method in interface sentencepiece.SentencepieceModel.ModelProtoOrBuilder
-
Spec for text normalization.
- hasNumSubIterations() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
-
Number of EM sub iterations.
- hasNumSubIterations() - Method in class sentencepiece.SentencepieceModel.TrainerSpec
-
Number of EM sub iterations.
- hasNumSubIterations() - Method in interface sentencepiece.SentencepieceModel.TrainerSpecOrBuilder
-
Number of EM sub iterations.
- hasNumThreads() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
-
Number of threads in the training.
- hasNumThreads() - Method in class sentencepiece.SentencepieceModel.TrainerSpec
-
Number of threads in the training.
- hasNumThreads() - Method in interface sentencepiece.SentencepieceModel.TrainerSpecOrBuilder
-
Number of threads in the training.
- hasPadId() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
-
<pad> (padding)
- hasPadId() - Method in class sentencepiece.SentencepieceModel.TrainerSpec
-
<pad> (padding)
- hasPadId() - Method in interface sentencepiece.SentencepieceModel.TrainerSpecOrBuilder
-
<pad> (padding)
- hasPadPiece() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
-
optional string pad_piece = 48 [default = "<pad>"];
- hasPadPiece() - Method in class sentencepiece.SentencepieceModel.TrainerSpec
-
optional string pad_piece = 48 [default = "<pad>"];
- hasPadPiece() - Method in interface sentencepiece.SentencepieceModel.TrainerSpecOrBuilder
-
optional string pad_piece = 48 [default = "<pad>"];
- hasPiece() - Method in class sentencepiece.SentencepieceModel.ModelProto.SentencePiece.Builder
-
piece must not be empty.
- hasPiece() - Method in class sentencepiece.SentencepieceModel.ModelProto.SentencePiece
-
piece must not be empty.
- hasPiece() - Method in interface sentencepiece.SentencepieceModel.ModelProto.SentencePieceOrBuilder
-
piece must not be empty.
- hasPrecompiledCharsmap() - Method in class sentencepiece.SentencepieceModel.NormalizerSpec.Builder
-
Pre-compiled normalization rule created by Builder::GetPrecompiledCharsMap() or Builder::CompileCharsMap() method.
- hasPrecompiledCharsmap() - Method in class sentencepiece.SentencepieceModel.NormalizerSpec
-
Pre-compiled normalization rule created by Builder::GetPrecompiledCharsMap() or Builder::CompileCharsMap() method.
- hasPrecompiledCharsmap() - Method in interface sentencepiece.SentencepieceModel.NormalizerSpecOrBuilder
-
Pre-compiled normalization rule created by Builder::GetPrecompiledCharsMap() or Builder::CompileCharsMap() method.
- hasRemoveExtraWhitespaces() - Method in class sentencepiece.SentencepieceModel.NormalizerSpec.Builder
-
Removes leading, trailing, and duplicate internal whitespace.
- hasRemoveExtraWhitespaces() - Method in class sentencepiece.SentencepieceModel.NormalizerSpec
-
Removes leading, trailing, and duplicate internal whitespace.
- hasRemoveExtraWhitespaces() - Method in interface sentencepiece.SentencepieceModel.NormalizerSpecOrBuilder
-
Removes leading, trailing, and duplicate internal whitespace.
- hasRequiredChars() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
-
Defines required characters.
- hasRequiredChars() - Method in class sentencepiece.SentencepieceModel.TrainerSpec
-
Defines required characters.
- hasRequiredChars() - Method in interface sentencepiece.SentencepieceModel.TrainerSpecOrBuilder
-
Defines required characters.
- hasScore() - Method in class sentencepiece.SentencepieceModel.ModelProto.SentencePiece.Builder
-
optional float score = 2;
- hasScore() - Method in class sentencepiece.SentencepieceModel.ModelProto.SentencePiece
-
optional float score = 2;
- hasScore() - Method in interface sentencepiece.SentencepieceModel.ModelProto.SentencePieceOrBuilder
-
optional float score = 2;
- hasSeedSentencepieceSize() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
-
The size of seed sentencepieces.
- hasSeedSentencepieceSize() - Method in class sentencepiece.SentencepieceModel.TrainerSpec
-
The size of seed sentencepieces.
- hasSeedSentencepieceSize() - Method in interface sentencepiece.SentencepieceModel.TrainerSpecOrBuilder
-
The size of seed sentencepieces.
- hasSelfTestData() - Method in class sentencepiece.SentencepieceModel.ModelProto.Builder
-
Stores sample input and its expected segmentation to verify the model.
- hasSelfTestData() - Method in class sentencepiece.SentencepieceModel.ModelProto
-
Stores sample input and its expected segmentation to verify the model.
- hasSelfTestData() - Method in interface sentencepiece.SentencepieceModel.ModelProtoOrBuilder
-
Stores sample input and its expected segmentation to verify the model.
- hasSelfTestSampleSize() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
-
Size of self-test samples, which are encoded in the model file.
- hasSelfTestSampleSize() - Method in class sentencepiece.SentencepieceModel.TrainerSpec
-
Size of self-test samples, which are encoded in the model file.
- hasSelfTestSampleSize() - Method in interface sentencepiece.SentencepieceModel.TrainerSpecOrBuilder
-
Size of self-test samples, which are encoded in the model file.
- hasShrinkingFactor() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
-
In every EM sub-iterations, keeps top `shrinking_factor` * `current sentencepieces size` with respect to the loss of the sentence piece.
- hasShrinkingFactor() - Method in class sentencepiece.SentencepieceModel.TrainerSpec
-
In every EM sub-iterations, keeps top `shrinking_factor` * `current sentencepieces size` with respect to the loss of the sentence piece.
- hasShrinkingFactor() - Method in interface sentencepiece.SentencepieceModel.TrainerSpecOrBuilder
-
In every EM sub-iterations, keeps top `shrinking_factor` * `current sentencepieces size` with respect to the loss of the sentence piece.
- hasShuffleInputSentence() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
-
optional bool shuffle_input_sentence = 19 [default = true];
- hasShuffleInputSentence() - Method in class sentencepiece.SentencepieceModel.TrainerSpec
-
optional bool shuffle_input_sentence = 19 [default = true];
- hasShuffleInputSentence() - Method in interface sentencepiece.SentencepieceModel.TrainerSpecOrBuilder
-
optional bool shuffle_input_sentence = 19 [default = true];
- hasSplitByNumber() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
-
When `split_by_number` is true, put a boundary between number and non-number transition.
- hasSplitByNumber() - Method in class sentencepiece.SentencepieceModel.TrainerSpec
-
When `split_by_number` is true, put a boundary between number and non-number transition.
- hasSplitByNumber() - Method in interface sentencepiece.SentencepieceModel.TrainerSpecOrBuilder
-
When `split_by_number` is true, put a boundary between number and non-number transition.
- hasSplitByUnicodeScript() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
-
Uses Unicode script to split sentence pieces.
- hasSplitByUnicodeScript() - Method in class sentencepiece.SentencepieceModel.TrainerSpec
-
Uses Unicode script to split sentence pieces.
- hasSplitByUnicodeScript() - Method in interface sentencepiece.SentencepieceModel.TrainerSpecOrBuilder
-
Uses Unicode script to split sentence pieces.
- hasSplitByWhitespace() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
-
Use a white space to split sentence pieces.
- hasSplitByWhitespace() - Method in class sentencepiece.SentencepieceModel.TrainerSpec
-
Use a white space to split sentence pieces.
- hasSplitByWhitespace() - Method in interface sentencepiece.SentencepieceModel.TrainerSpecOrBuilder
-
Use a white space to split sentence pieces.
- hasSplitDigits() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
-
Split all digits (0-9) into separate pieces.
- hasSplitDigits() - Method in class sentencepiece.SentencepieceModel.TrainerSpec
-
Split all digits (0-9) into separate pieces.
- hasSplitDigits() - Method in interface sentencepiece.SentencepieceModel.TrainerSpecOrBuilder
-
Split all digits (0-9) into separate pieces.
- hasTrainerSpec() - Method in class sentencepiece.SentencepieceModel.ModelProto.Builder
-
Spec used to generate this model file.
- hasTrainerSpec() - Method in class sentencepiece.SentencepieceModel.ModelProto
-
Spec used to generate this model file.
- hasTrainerSpec() - Method in interface sentencepiece.SentencepieceModel.ModelProtoOrBuilder
-
Spec used to generate this model file.
- hasTrainExtremelyLargeCorpus() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
-
Increase bit depth to allow unigram model training on large (>10M sentences) corpora.
- hasTrainExtremelyLargeCorpus() - Method in class sentencepiece.SentencepieceModel.TrainerSpec
-
Increase bit depth to allow unigram model training on large (>10M sentences) corpora.
- hasTrainExtremelyLargeCorpus() - Method in interface sentencepiece.SentencepieceModel.TrainerSpecOrBuilder
-
Increase bit depth to allow unigram model training on large (>10M sentences) corpora.
- hasTrainingSentenceSize() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
-
Deprecated.
- hasTrainingSentenceSize() - Method in class sentencepiece.SentencepieceModel.TrainerSpec
-
Deprecated.
- hasTrainingSentenceSize() - Method in interface sentencepiece.SentencepieceModel.TrainerSpecOrBuilder
-
Deprecated.
- hasTreatWhitespaceAsSuffix() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
-
Adds whitespace symbol (_) as a suffix instead of prefix.
- hasTreatWhitespaceAsSuffix() - Method in class sentencepiece.SentencepieceModel.TrainerSpec
-
Adds whitespace symbol (_) as a suffix instead of prefix.
- hasTreatWhitespaceAsSuffix() - Method in interface sentencepiece.SentencepieceModel.TrainerSpecOrBuilder
-
Adds whitespace symbol (_) as a suffix instead of prefix.
- hasType() - Method in class sentencepiece.SentencepieceModel.ModelProto.SentencePiece.Builder
-
optional .sentencepiece.ModelProto.SentencePiece.Type type = 3 [default = NORMAL];
- hasType() - Method in class sentencepiece.SentencepieceModel.ModelProto.SentencePiece
-
optional .sentencepiece.ModelProto.SentencePiece.Type type = 3 [default = NORMAL];
- hasType() - Method in interface sentencepiece.SentencepieceModel.ModelProto.SentencePieceOrBuilder
-
optional .sentencepiece.ModelProto.SentencePiece.Type type = 3 [default = NORMAL];
- hasUnkId() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
-
///////////////////////////////////////////////////////////////// Reserved special meta tokens.
- hasUnkId() - Method in class sentencepiece.SentencepieceModel.TrainerSpec
-
///////////////////////////////////////////////////////////////// Reserved special meta tokens.
- hasUnkId() - Method in interface sentencepiece.SentencepieceModel.TrainerSpecOrBuilder
-
///////////////////////////////////////////////////////////////// Reserved special meta tokens.
- hasUnkPiece() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
-
optional string unk_piece = 45 [default = "<unk>"];
- hasUnkPiece() - Method in class sentencepiece.SentencepieceModel.TrainerSpec
-
optional string unk_piece = 45 [default = "<unk>"];
- hasUnkPiece() - Method in interface sentencepiece.SentencepieceModel.TrainerSpecOrBuilder
-
optional string unk_piece = 45 [default = "<unk>"];
- hasUnkSurface() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
-
Encodes <unk> into U+2047 (DOUBLE QUESTION MARK), since this character can be useful both for user and developer.
- hasUnkSurface() - Method in class sentencepiece.SentencepieceModel.TrainerSpec
-
Encodes <unk> into U+2047 (DOUBLE QUESTION MARK), since this character can be useful both for user and developer.
- hasUnkSurface() - Method in interface sentencepiece.SentencepieceModel.TrainerSpecOrBuilder
-
Encodes <unk> into U+2047 (DOUBLE QUESTION MARK), since this character can be useful both for user and developer.
- hasUseAllVocab() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
-
use all symbols for vocab extraction.
- hasUseAllVocab() - Method in class sentencepiece.SentencepieceModel.TrainerSpec
-
use all symbols for vocab extraction.
- hasUseAllVocab() - Method in interface sentencepiece.SentencepieceModel.TrainerSpecOrBuilder
-
use all symbols for vocab extraction.
- hasVocabSize() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
-
Vocabulary size.
- hasVocabSize() - Method in class sentencepiece.SentencepieceModel.TrainerSpec
-
Vocabulary size.
- hasVocabSize() - Method in interface sentencepiece.SentencepieceModel.TrainerSpecOrBuilder
-
Vocabulary size.
- hasVocabularyOutputPieceScore() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
-
When creating the vocabulary file, defines whether or not to additionally output the score for each piece.
- hasVocabularyOutputPieceScore() - Method in class sentencepiece.SentencepieceModel.TrainerSpec
-
When creating the vocabulary file, defines whether or not to additionally output the score for each piece.
- hasVocabularyOutputPieceScore() - Method in interface sentencepiece.SentencepieceModel.TrainerSpecOrBuilder
-
When creating the vocabulary file, defines whether or not to additionally output the score for each piece.
- HAUSA - com.yahoo.language.Language
-
Language tag "ha".
- HEBREW - com.yahoo.language.Language
-
Language tag "he".
- HEBREW - com.yahoo.language.process.TokenScript
- highestScore - com.yahoo.language.sentencepiece.Scoring
-
Find the segmentation that has the highest score
- highestScore - com.yahoo.language.sentencepiece.SentencePieceConfig.Scoring.Enum
- highestScore - Static variable in class com.yahoo.language.sentencepiece.SentencePieceConfig.Scoring
- HINDI - com.yahoo.language.Language
-
Language tag "hi".
- Hint - Class in com.yahoo.language.detect
-
A hint that can be given to a
Detector
. - HIRAGANA - com.yahoo.language.process.TokenScript
- HUNGARIAN - com.yahoo.language.Language
-
Language tag "hu".
I
- ICELANDIC - com.yahoo.language.Language
-
Language tag "is".
- INDONESIAN - com.yahoo.language.Language
-
Language tag "id".
- INHERITED - com.yahoo.language.process.TokenScript
- INPUT_FIELD_NUMBER - Static variable in class sentencepiece.SentencepieceModel.SelfTestData.Sample
- INPUT_FIELD_NUMBER - Static variable in class sentencepiece.SentencepieceModel.TrainerSpec
- INPUT_FORMAT_FIELD_NUMBER - Static variable in class sentencepiece.SentencepieceModel.TrainerSpec
- INPUT_SENTENCE_SIZE_FIELD_NUMBER - Static variable in class sentencepiece.SentencepieceModel.TrainerSpec
- INTERLINGUA - com.yahoo.language.Language
-
Language tag "ia".
- INTERLINGUE - com.yahoo.language.Language
-
Language tag "ie".
- internalGetFieldAccessorTable() - Method in class sentencepiece.SentencepieceModel.ModelProto.Builder
- internalGetFieldAccessorTable() - Method in class sentencepiece.SentencepieceModel.ModelProto
- internalGetFieldAccessorTable() - Method in class sentencepiece.SentencepieceModel.ModelProto.SentencePiece.Builder
- internalGetFieldAccessorTable() - Method in class sentencepiece.SentencepieceModel.ModelProto.SentencePiece
- internalGetFieldAccessorTable() - Method in class sentencepiece.SentencepieceModel.NormalizerSpec.Builder
- internalGetFieldAccessorTable() - Method in class sentencepiece.SentencepieceModel.NormalizerSpec
- internalGetFieldAccessorTable() - Method in class sentencepiece.SentencepieceModel.SelfTestData.Builder
- internalGetFieldAccessorTable() - Method in class sentencepiece.SentencepieceModel.SelfTestData
- internalGetFieldAccessorTable() - Method in class sentencepiece.SentencepieceModel.SelfTestData.Sample.Builder
- internalGetFieldAccessorTable() - Method in class sentencepiece.SentencepieceModel.SelfTestData.Sample
- internalGetFieldAccessorTable() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
- internalGetFieldAccessorTable() - Method in class sentencepiece.SentencepieceModel.TrainerSpec
- internalGetValueMap() - Static method in enum sentencepiece.SentencepieceModel.ModelProto.SentencePiece.Type
- internalGetValueMap() - Static method in enum sentencepiece.SentencepieceModel.TrainerSpec.ModelType
- INUKTITUT - com.yahoo.language.Language
-
Language tag "iu".
- INUPIAK - com.yahoo.language.Language
-
Language tag "ik".
- IRISH - com.yahoo.language.Language
-
Language tag "ga".
- isCjk() - Method in enum com.yahoo.language.Language
-
Returns whether this is a "cjk" language.
- isDigit(int) - Method in class com.yahoo.language.process.CharacterClasses
-
Returns true for code points which should be considered digits - same as java.lang.Character.isDigit
- isIndexable() - Method in interface com.yahoo.language.process.Token
-
Whether this token should be indexed
- isIndexable() - Method in enum com.yahoo.language.process.TokenType
-
Marker for whether this type of token can be indexed for search.
- isInitialized() - Method in class sentencepiece.SentencepieceModel.ModelProto.Builder
- isInitialized() - Method in class sentencepiece.SentencepieceModel.ModelProto
- isInitialized() - Method in class sentencepiece.SentencepieceModel.ModelProto.SentencePiece.Builder
- isInitialized() - Method in class sentencepiece.SentencepieceModel.ModelProto.SentencePiece
- isInitialized() - Method in class sentencepiece.SentencepieceModel.NormalizerSpec.Builder
- isInitialized() - Method in class sentencepiece.SentencepieceModel.NormalizerSpec
- isInitialized() - Method in class sentencepiece.SentencepieceModel.SelfTestData.Builder
- isInitialized() - Method in class sentencepiece.SentencepieceModel.SelfTestData
- isInitialized() - Method in class sentencepiece.SentencepieceModel.SelfTestData.Sample.Builder
- isInitialized() - Method in class sentencepiece.SentencepieceModel.SelfTestData.Sample
- isInitialized() - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
- isInitialized() - Method in class sentencepiece.SentencepieceModel.TrainerSpec
- isLatin(int) - Method in class com.yahoo.language.process.CharacterClasses
-
Returns true if this is a latin character
- isLatinDigit(int) - Method in class com.yahoo.language.process.CharacterClasses
-
Returns true if this is a latin digit (other digits are not consistently parsed into numbers by Java)
- isLetter(int) - Method in class com.yahoo.language.process.CharacterClasses
-
Returns true for code points which are letters in unicode 3 or 4, plus some additional characters which are useful to view as letters even though not defined as such in unicode.
- isLetterOrDigit(int) - Method in class com.yahoo.language.process.CharacterClasses
-
Convenience, returns isLetter(c) || isDigit(c)
- isLocal() - Method in class com.yahoo.language.detect.Detection
- isSpecialToken() - Method in interface com.yahoo.language.process.Token
-
Returns whether this is an instance of a declared special token (e.g.
- ITALIAN - com.yahoo.language.Language
-
Language tag "it".
J
- JAPANESE - com.yahoo.language.Language
-
Language tag "ja".
- JAVANESE - com.yahoo.language.Language
-
Language tag "jw".
K
- KANNADA - com.yahoo.language.Language
-
Language tag "kn".
- KANNADA - com.yahoo.language.process.TokenScript
- KASHMIRI - com.yahoo.language.Language
-
Language tag "ks".
- KATAKANA - com.yahoo.language.process.TokenScript
- KAZAKH - com.yahoo.language.Language
-
Language tag "kk".
- KHAROSHTHI - com.yahoo.language.process.TokenScript
- KHMER - com.yahoo.language.process.TokenScript
- KINYARWANDA - com.yahoo.language.Language
-
Language tag "rw".
- KIRGHIZ - com.yahoo.language.Language
-
Language tag "ky".
- KIRUNDI - com.yahoo.language.Language
-
Language tag "rn".
- KOREAN - com.yahoo.language.Language
-
Language tag "ko".
- KURDISH - com.yahoo.language.Language
-
Language tag "ku".
L
- language() - Method in class com.yahoo.language.sentencepiece.SentencePieceConfig.Model
- language(String) - Method in class com.yahoo.language.sentencepiece.SentencePieceConfig.Model.Builder
- Language - Enum in com.yahoo.language
- languageCode() - Method in enum com.yahoo.language.Language
- LAO - com.yahoo.language.process.TokenScript
- LAOTHIAN - com.yahoo.language.Language
-
Language tag "lo".
- LATIN - com.yahoo.language.Language
-
Language tag "la".
- LATIN - com.yahoo.language.process.TokenScript
- LATVIAN - com.yahoo.language.Language
-
Language tag "lv".
- LIMBU - com.yahoo.language.process.TokenScript
- LINEARB - com.yahoo.language.process.TokenScript
- LINGALA - com.yahoo.language.Language
-
Language tag "ln".
- Linguistics - Interface in com.yahoo.language
-
Factory of linguistic processors.
- Linguistics.Component - Enum in com.yahoo.language
- LinguisticsCase - Class in com.yahoo.language
-
This class provides a case normalization operation to be used e.g.
- LinguisticsCase() - Constructor for class com.yahoo.language.LinguisticsCase
- LITHUANIAN - com.yahoo.language.Language
-
Language tag "lt".
- LocaleFactory - Class in com.yahoo.language
M
- MACEDONIAN - com.yahoo.language.Language
-
Language tag "mk".
- MALAGASY - com.yahoo.language.Language
-
Language tag "mg".
- MALAY - com.yahoo.language.Language
-
Language tag "ms".
- MALAYALAM - com.yahoo.language.Language
-
Language tag "ml".
- MALAYALAM - com.yahoo.language.process.TokenScript
- MALTESE - com.yahoo.language.Language
-
Language tag "mt".
- MANIPURI - com.yahoo.language.Language
-
Language tag "mni".
- MAORI - com.yahoo.language.Language
-
Language tag "mi".
- MARATHI - com.yahoo.language.Language
-
Language tag "mr".
- MARKER - com.yahoo.language.process.TokenType
- MAX_SENTENCE_LENGTH_FIELD_NUMBER - Static variable in class sentencepiece.SentencepieceModel.TrainerSpec
- MAX_SENTENCEPIECE_LENGTH_FIELD_NUMBER - Static variable in class sentencepiece.SentencepieceModel.TrainerSpec
- mergeDenormalizerSpec(SentencepieceModel.NormalizerSpec) - Method in class sentencepiece.SentencepieceModel.ModelProto.Builder
-
Spec for text de-normalization.
- mergeFrom(CodedInputStream, ExtensionRegistryLite) - Method in class sentencepiece.SentencepieceModel.ModelProto.Builder
- mergeFrom(CodedInputStream, ExtensionRegistryLite) - Method in class sentencepiece.SentencepieceModel.ModelProto.SentencePiece.Builder
- mergeFrom(CodedInputStream, ExtensionRegistryLite) - Method in class sentencepiece.SentencepieceModel.NormalizerSpec.Builder
- mergeFrom(CodedInputStream, ExtensionRegistryLite) - Method in class sentencepiece.SentencepieceModel.SelfTestData.Builder
- mergeFrom(CodedInputStream, ExtensionRegistryLite) - Method in class sentencepiece.SentencepieceModel.SelfTestData.Sample.Builder
- mergeFrom(CodedInputStream, ExtensionRegistryLite) - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
- mergeFrom(Message) - Method in class sentencepiece.SentencepieceModel.ModelProto.Builder
- mergeFrom(Message) - Method in class sentencepiece.SentencepieceModel.ModelProto.SentencePiece.Builder
- mergeFrom(Message) - Method in class sentencepiece.SentencepieceModel.NormalizerSpec.Builder
- mergeFrom(Message) - Method in class sentencepiece.SentencepieceModel.SelfTestData.Builder
- mergeFrom(Message) - Method in class sentencepiece.SentencepieceModel.SelfTestData.Sample.Builder
- mergeFrom(Message) - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
- mergeFrom(SentencepieceModel.ModelProto) - Method in class sentencepiece.SentencepieceModel.ModelProto.Builder
- mergeFrom(SentencepieceModel.ModelProto.SentencePiece) - Method in class sentencepiece.SentencepieceModel.ModelProto.SentencePiece.Builder
- mergeFrom(SentencepieceModel.NormalizerSpec) - Method in class sentencepiece.SentencepieceModel.NormalizerSpec.Builder
- mergeFrom(SentencepieceModel.SelfTestData) - Method in class sentencepiece.SentencepieceModel.SelfTestData.Builder
- mergeFrom(SentencepieceModel.SelfTestData.Sample) - Method in class sentencepiece.SentencepieceModel.SelfTestData.Sample.Builder
- mergeFrom(SentencepieceModel.TrainerSpec) - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
- mergeNormalizerSpec(SentencepieceModel.NormalizerSpec) - Method in class sentencepiece.SentencepieceModel.ModelProto.Builder
-
Spec for text normalization.
- mergeSelfTestData(SentencepieceModel.SelfTestData) - Method in class sentencepiece.SentencepieceModel.ModelProto.Builder
-
Stores sample input and its expected segmentation to verify the model.
- mergeTrainerSpec(SentencepieceModel.TrainerSpec) - Method in class sentencepiece.SentencepieceModel.ModelProto.Builder
-
Spec used to generate this model file.
- mergeUnknownFields(UnknownFieldSet) - Method in class sentencepiece.SentencepieceModel.ModelProto.Builder
- mergeUnknownFields(UnknownFieldSet) - Method in class sentencepiece.SentencepieceModel.ModelProto.SentencePiece.Builder
- mergeUnknownFields(UnknownFieldSet) - Method in class sentencepiece.SentencepieceModel.NormalizerSpec.Builder
- mergeUnknownFields(UnknownFieldSet) - Method in class sentencepiece.SentencepieceModel.SelfTestData.Builder
- mergeUnknownFields(UnknownFieldSet) - Method in class sentencepiece.SentencepieceModel.SelfTestData.Sample.Builder
- mergeUnknownFields(UnknownFieldSet) - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
- MINING_SENTENCE_SIZE_FIELD_NUMBER - Static variable in class sentencepiece.SentencepieceModel.TrainerSpec
- model - Variable in class com.yahoo.language.sentencepiece.SentencePieceConfig.Builder
- model() - Method in class com.yahoo.language.sentencepiece.SentencePieceConfig
- model(int) - Method in class com.yahoo.language.sentencepiece.SentencePieceConfig
- model(SentencePieceConfig.Model.Builder) - Method in class com.yahoo.language.sentencepiece.SentencePieceConfig.Builder
-
Add the given builder to this builder's list of Model builders
- model(List<SentencePieceConfig.Model.Builder>) - Method in class com.yahoo.language.sentencepiece.SentencePieceConfig.Builder
-
Set the given list as this builder's list of Model builders
- Model(SentencePieceConfig.Model.Builder) - Constructor for class com.yahoo.language.sentencepiece.SentencePieceConfig.Model
- MODEL_PREFIX_FIELD_NUMBER - Static variable in class sentencepiece.SentencepieceModel.TrainerSpec
- MODEL_TYPE_FIELD_NUMBER - Static variable in class sentencepiece.SentencepieceModel.TrainerSpec
- MOLDAVIAN - com.yahoo.language.Language
-
Language tag "mo".
- MONGOLIAN - com.yahoo.language.Language
-
Language tag "mn".
- MONGOLIAN - com.yahoo.language.process.TokenScript
- MUNDA - com.yahoo.language.Language
-
Language tag "mun".
- MYANMAR - com.yahoo.language.process.TokenScript
N
- name() - Method in class com.yahoo.language.process.SpecialTokens
-
Returns the name of this special tokens list
- NAME_FIELD_NUMBER - Static variable in class sentencepiece.SentencepieceModel.NormalizerSpec
- NAURU - com.yahoo.language.Language
-
Language tag "na".
- NEPALI - com.yahoo.language.Language
-
Language tag "ne".
- newBuilder() - Static method in class sentencepiece.SentencepieceModel.ModelProto
- newBuilder() - Static method in class sentencepiece.SentencepieceModel.ModelProto.SentencePiece
- newBuilder() - Static method in class sentencepiece.SentencepieceModel.NormalizerSpec
- newBuilder() - Static method in class sentencepiece.SentencepieceModel.SelfTestData
- newBuilder() - Static method in class sentencepiece.SentencepieceModel.SelfTestData.Sample
- newBuilder() - Static method in class sentencepiece.SentencepieceModel.TrainerSpec
- newBuilder(SentencepieceModel.ModelProto) - Static method in class sentencepiece.SentencepieceModel.ModelProto
- newBuilder(SentencepieceModel.ModelProto.SentencePiece) - Static method in class sentencepiece.SentencepieceModel.ModelProto.SentencePiece
- newBuilder(SentencepieceModel.NormalizerSpec) - Static method in class sentencepiece.SentencepieceModel.NormalizerSpec
- newBuilder(SentencepieceModel.SelfTestData) - Static method in class sentencepiece.SentencepieceModel.SelfTestData
- newBuilder(SentencepieceModel.SelfTestData.Sample) - Static method in class sentencepiece.SentencepieceModel.SelfTestData.Sample
- newBuilder(SentencepieceModel.TrainerSpec) - Static method in class sentencepiece.SentencepieceModel.TrainerSpec
- newBuilderForType() - Method in class sentencepiece.SentencepieceModel.ModelProto
- newBuilderForType() - Method in class sentencepiece.SentencepieceModel.ModelProto.SentencePiece
- newBuilderForType() - Method in class sentencepiece.SentencepieceModel.NormalizerSpec
- newBuilderForType() - Method in class sentencepiece.SentencepieceModel.SelfTestData
- newBuilderForType() - Method in class sentencepiece.SentencepieceModel.SelfTestData.Sample
- newBuilderForType() - Method in class sentencepiece.SentencepieceModel.TrainerSpec
- newBuilderForType(GeneratedMessageV3.BuilderParent) - Method in class sentencepiece.SentencepieceModel.ModelProto
- newBuilderForType(GeneratedMessageV3.BuilderParent) - Method in class sentencepiece.SentencepieceModel.ModelProto.SentencePiece
- newBuilderForType(GeneratedMessageV3.BuilderParent) - Method in class sentencepiece.SentencepieceModel.NormalizerSpec
- newBuilderForType(GeneratedMessageV3.BuilderParent) - Method in class sentencepiece.SentencepieceModel.SelfTestData
- newBuilderForType(GeneratedMessageV3.BuilderParent) - Method in class sentencepiece.SentencepieceModel.SelfTestData.Sample
- newBuilderForType(GeneratedMessageV3.BuilderParent) - Method in class sentencepiece.SentencepieceModel.TrainerSpec
- newCountryHint(String) - Static method in class com.yahoo.language.detect.Hint
- newInstance(GeneratedMessageV3.UnusedPrivateParameter) - Method in class sentencepiece.SentencepieceModel.ModelProto
- newInstance(GeneratedMessageV3.UnusedPrivateParameter) - Method in class sentencepiece.SentencepieceModel.ModelProto.SentencePiece
- newInstance(GeneratedMessageV3.UnusedPrivateParameter) - Method in class sentencepiece.SentencepieceModel.NormalizerSpec
- newInstance(GeneratedMessageV3.UnusedPrivateParameter) - Method in class sentencepiece.SentencepieceModel.SelfTestData
- newInstance(GeneratedMessageV3.UnusedPrivateParameter) - Method in class sentencepiece.SentencepieceModel.SelfTestData.Sample
- newInstance(GeneratedMessageV3.UnusedPrivateParameter) - Method in class sentencepiece.SentencepieceModel.TrainerSpec
- newInstance(String, String) - Static method in class com.yahoo.language.detect.Hint
- newMarketHint(String) - Static method in class com.yahoo.language.detect.Hint
- next() - Method in class com.yahoo.language.process.GramSplitter.GramSplitterIterator
- NONE - com.yahoo.language.process.StemMode
- NORMAL - sentencepiece.SentencepieceModel.ModelProto.SentencePiece.Type
-
normal symbol
- NORMAL_VALUE - Static variable in enum sentencepiece.SentencepieceModel.ModelProto.SentencePiece.Type
-
normal symbol
- NORMALIZATION_RULE_TSV_FIELD_NUMBER - Static variable in class sentencepiece.SentencepieceModel.NormalizerSpec
- normalize(String) - Method in interface com.yahoo.language.process.Normalizer
-
NFKC normalizes a String.
- normalize(String) - Method in class com.yahoo.language.sentencepiece.SentencePieceEncoder
- Normalizer - Interface in com.yahoo.language.process
-
This interface provides NFKC normalization of Strings through the underlying linguistics library.
- NORMALIZER - com.yahoo.language.Linguistics.Component
- NORMALIZER_SPEC_FIELD_NUMBER - Static variable in class sentencepiece.SentencepieceModel.ModelProto
- NORWEGIAN_BOKMAL - com.yahoo.language.Language
-
Language tag "nb".
- NORWEGIAN_NYNORSK - com.yahoo.language.Language
-
Language tag "nn".
- NUM_SUB_ITERATIONS_FIELD_NUMBER - Static variable in class sentencepiece.SentencepieceModel.TrainerSpec
- NUM_THREADS_FIELD_NUMBER - Static variable in class sentencepiece.SentencepieceModel.TrainerSpec
- NUMERIC - com.yahoo.language.process.TokenType
O
- OCCITAN - com.yahoo.language.Language
-
Language tag "oc".
- OGHAM - com.yahoo.language.process.TokenScript
- OLDITALIC - com.yahoo.language.process.TokenScript
- OLDPERSIAN - com.yahoo.language.process.TokenScript
- ORIYA - com.yahoo.language.Language
-
Language tag "or".
- ORIYA - com.yahoo.language.process.TokenScript
- OROMO - com.yahoo.language.Language
-
Language tag "om".
- OSMANYA - com.yahoo.language.process.TokenScript
P
- PAD_ID_FIELD_NUMBER - Static variable in class sentencepiece.SentencepieceModel.TrainerSpec
- PAD_PIECE_FIELD_NUMBER - Static variable in class sentencepiece.SentencepieceModel.TrainerSpec
- parseDelimitedFrom(InputStream) - Static method in class sentencepiece.SentencepieceModel.ModelProto
- parseDelimitedFrom(InputStream) - Static method in class sentencepiece.SentencepieceModel.ModelProto.SentencePiece
- parseDelimitedFrom(InputStream) - Static method in class sentencepiece.SentencepieceModel.NormalizerSpec
- parseDelimitedFrom(InputStream) - Static method in class sentencepiece.SentencepieceModel.SelfTestData
- parseDelimitedFrom(InputStream) - Static method in class sentencepiece.SentencepieceModel.SelfTestData.Sample
- parseDelimitedFrom(InputStream) - Static method in class sentencepiece.SentencepieceModel.TrainerSpec
- parseDelimitedFrom(InputStream, ExtensionRegistryLite) - Static method in class sentencepiece.SentencepieceModel.ModelProto
- parseDelimitedFrom(InputStream, ExtensionRegistryLite) - Static method in class sentencepiece.SentencepieceModel.ModelProto.SentencePiece
- parseDelimitedFrom(InputStream, ExtensionRegistryLite) - Static method in class sentencepiece.SentencepieceModel.NormalizerSpec
- parseDelimitedFrom(InputStream, ExtensionRegistryLite) - Static method in class sentencepiece.SentencepieceModel.SelfTestData
- parseDelimitedFrom(InputStream, ExtensionRegistryLite) - Static method in class sentencepiece.SentencepieceModel.SelfTestData.Sample
- parseDelimitedFrom(InputStream, ExtensionRegistryLite) - Static method in class sentencepiece.SentencepieceModel.TrainerSpec
- parseFrom(byte[]) - Static method in class sentencepiece.SentencepieceModel.ModelProto
- parseFrom(byte[]) - Static method in class sentencepiece.SentencepieceModel.ModelProto.SentencePiece
- parseFrom(byte[]) - Static method in class sentencepiece.SentencepieceModel.NormalizerSpec
- parseFrom(byte[]) - Static method in class sentencepiece.SentencepieceModel.SelfTestData
- parseFrom(byte[]) - Static method in class sentencepiece.SentencepieceModel.SelfTestData.Sample
- parseFrom(byte[]) - Static method in class sentencepiece.SentencepieceModel.TrainerSpec
- parseFrom(byte[], ExtensionRegistryLite) - Static method in class sentencepiece.SentencepieceModel.ModelProto
- parseFrom(byte[], ExtensionRegistryLite) - Static method in class sentencepiece.SentencepieceModel.ModelProto.SentencePiece
- parseFrom(byte[], ExtensionRegistryLite) - Static method in class sentencepiece.SentencepieceModel.NormalizerSpec
- parseFrom(byte[], ExtensionRegistryLite) - Static method in class sentencepiece.SentencepieceModel.SelfTestData
- parseFrom(byte[], ExtensionRegistryLite) - Static method in class sentencepiece.SentencepieceModel.SelfTestData.Sample
- parseFrom(byte[], ExtensionRegistryLite) - Static method in class sentencepiece.SentencepieceModel.TrainerSpec
- parseFrom(ByteString) - Static method in class sentencepiece.SentencepieceModel.ModelProto
- parseFrom(ByteString) - Static method in class sentencepiece.SentencepieceModel.ModelProto.SentencePiece
- parseFrom(ByteString) - Static method in class sentencepiece.SentencepieceModel.NormalizerSpec
- parseFrom(ByteString) - Static method in class sentencepiece.SentencepieceModel.SelfTestData
- parseFrom(ByteString) - Static method in class sentencepiece.SentencepieceModel.SelfTestData.Sample
- parseFrom(ByteString) - Static method in class sentencepiece.SentencepieceModel.TrainerSpec
- parseFrom(ByteString, ExtensionRegistryLite) - Static method in class sentencepiece.SentencepieceModel.ModelProto
- parseFrom(ByteString, ExtensionRegistryLite) - Static method in class sentencepiece.SentencepieceModel.ModelProto.SentencePiece
- parseFrom(ByteString, ExtensionRegistryLite) - Static method in class sentencepiece.SentencepieceModel.NormalizerSpec
- parseFrom(ByteString, ExtensionRegistryLite) - Static method in class sentencepiece.SentencepieceModel.SelfTestData
- parseFrom(ByteString, ExtensionRegistryLite) - Static method in class sentencepiece.SentencepieceModel.SelfTestData.Sample
- parseFrom(ByteString, ExtensionRegistryLite) - Static method in class sentencepiece.SentencepieceModel.TrainerSpec
- parseFrom(CodedInputStream) - Static method in class sentencepiece.SentencepieceModel.ModelProto
- parseFrom(CodedInputStream) - Static method in class sentencepiece.SentencepieceModel.ModelProto.SentencePiece
- parseFrom(CodedInputStream) - Static method in class sentencepiece.SentencepieceModel.NormalizerSpec
- parseFrom(CodedInputStream) - Static method in class sentencepiece.SentencepieceModel.SelfTestData
- parseFrom(CodedInputStream) - Static method in class sentencepiece.SentencepieceModel.SelfTestData.Sample
- parseFrom(CodedInputStream) - Static method in class sentencepiece.SentencepieceModel.TrainerSpec
- parseFrom(CodedInputStream, ExtensionRegistryLite) - Static method in class sentencepiece.SentencepieceModel.ModelProto
- parseFrom(CodedInputStream, ExtensionRegistryLite) - Static method in class sentencepiece.SentencepieceModel.ModelProto.SentencePiece
- parseFrom(CodedInputStream, ExtensionRegistryLite) - Static method in class sentencepiece.SentencepieceModel.NormalizerSpec
- parseFrom(CodedInputStream, ExtensionRegistryLite) - Static method in class sentencepiece.SentencepieceModel.SelfTestData
- parseFrom(CodedInputStream, ExtensionRegistryLite) - Static method in class sentencepiece.SentencepieceModel.SelfTestData.Sample
- parseFrom(CodedInputStream, ExtensionRegistryLite) - Static method in class sentencepiece.SentencepieceModel.TrainerSpec
- parseFrom(InputStream) - Static method in class sentencepiece.SentencepieceModel.ModelProto
- parseFrom(InputStream) - Static method in class sentencepiece.SentencepieceModel.ModelProto.SentencePiece
- parseFrom(InputStream) - Static method in class sentencepiece.SentencepieceModel.NormalizerSpec
- parseFrom(InputStream) - Static method in class sentencepiece.SentencepieceModel.SelfTestData
- parseFrom(InputStream) - Static method in class sentencepiece.SentencepieceModel.SelfTestData.Sample
- parseFrom(InputStream) - Static method in class sentencepiece.SentencepieceModel.TrainerSpec
- parseFrom(InputStream, ExtensionRegistryLite) - Static method in class sentencepiece.SentencepieceModel.ModelProto
- parseFrom(InputStream, ExtensionRegistryLite) - Static method in class sentencepiece.SentencepieceModel.ModelProto.SentencePiece
- parseFrom(InputStream, ExtensionRegistryLite) - Static method in class sentencepiece.SentencepieceModel.NormalizerSpec
- parseFrom(InputStream, ExtensionRegistryLite) - Static method in class sentencepiece.SentencepieceModel.SelfTestData
- parseFrom(InputStream, ExtensionRegistryLite) - Static method in class sentencepiece.SentencepieceModel.SelfTestData.Sample
- parseFrom(InputStream, ExtensionRegistryLite) - Static method in class sentencepiece.SentencepieceModel.TrainerSpec
- parseFrom(ByteBuffer) - Static method in class sentencepiece.SentencepieceModel.ModelProto
- parseFrom(ByteBuffer) - Static method in class sentencepiece.SentencepieceModel.ModelProto.SentencePiece
- parseFrom(ByteBuffer) - Static method in class sentencepiece.SentencepieceModel.NormalizerSpec
- parseFrom(ByteBuffer) - Static method in class sentencepiece.SentencepieceModel.SelfTestData
- parseFrom(ByteBuffer) - Static method in class sentencepiece.SentencepieceModel.SelfTestData.Sample
- parseFrom(ByteBuffer) - Static method in class sentencepiece.SentencepieceModel.TrainerSpec
- parseFrom(ByteBuffer, ExtensionRegistryLite) - Static method in class sentencepiece.SentencepieceModel.ModelProto
- parseFrom(ByteBuffer, ExtensionRegistryLite) - Static method in class sentencepiece.SentencepieceModel.ModelProto.SentencePiece
- parseFrom(ByteBuffer, ExtensionRegistryLite) - Static method in class sentencepiece.SentencepieceModel.NormalizerSpec
- parseFrom(ByteBuffer, ExtensionRegistryLite) - Static method in class sentencepiece.SentencepieceModel.SelfTestData
- parseFrom(ByteBuffer, ExtensionRegistryLite) - Static method in class sentencepiece.SentencepieceModel.SelfTestData.Sample
- parseFrom(ByteBuffer, ExtensionRegistryLite) - Static method in class sentencepiece.SentencepieceModel.TrainerSpec
- parser() - Static method in class sentencepiece.SentencepieceModel.ModelProto
- parser() - Static method in class sentencepiece.SentencepieceModel.ModelProto.SentencePiece
- parser() - Static method in class sentencepiece.SentencepieceModel.NormalizerSpec
- parser() - Static method in class sentencepiece.SentencepieceModel.SelfTestData
- parser() - Static method in class sentencepiece.SentencepieceModel.SelfTestData.Sample
- parser() - Static method in class sentencepiece.SentencepieceModel.TrainerSpec
- PARSER - Static variable in class sentencepiece.SentencepieceModel.ModelProto
-
Deprecated.
- PARSER - Static variable in class sentencepiece.SentencepieceModel.ModelProto.SentencePiece
-
Deprecated.
- PARSER - Static variable in class sentencepiece.SentencepieceModel.NormalizerSpec
-
Deprecated.
- PARSER - Static variable in class sentencepiece.SentencepieceModel.SelfTestData
-
Deprecated.
- PARSER - Static variable in class sentencepiece.SentencepieceModel.SelfTestData.Sample
-
Deprecated.
- PARSER - Static variable in class sentencepiece.SentencepieceModel.TrainerSpec
-
Deprecated.
- PASHTO - com.yahoo.language.Language
-
Language tag "ps".
- path() - Method in class com.yahoo.language.sentencepiece.SentencePieceConfig.Model
- path(FileReference) - Method in class com.yahoo.language.sentencepiece.SentencePieceConfig.Model.Builder
- PERSIAN - com.yahoo.language.Language
-
Language tag "fa".
- PIECE_FIELD_NUMBER - Static variable in class sentencepiece.SentencepieceModel.ModelProto.SentencePiece
- PIECES_FIELD_NUMBER - Static variable in class sentencepiece.SentencepieceModel.ModelProto
- POLISH - com.yahoo.language.Language
-
Language tag "pl".
- PORTUGUESE - com.yahoo.language.Language
-
Language tag "pt".
- PRECOMPILED_CHARSMAP_FIELD_NUMBER - Static variable in class sentencepiece.SentencepieceModel.NormalizerSpec
- ProcessingException - Exception in com.yahoo.language.process
-
Exception class indicating that a fatal error occured during linguistic processing.
- ProcessingException(String) - Constructor for exception com.yahoo.language.process.ProcessingException
- ProcessingException(String, Throwable) - Constructor for exception com.yahoo.language.process.ProcessingException
- PUNCTUATION - com.yahoo.language.process.TokenType
- PUNJABI - com.yahoo.language.Language
-
Language tag "pa".
Q
R
- registerAllExtensions(ExtensionRegistry) - Static method in class sentencepiece.SentencepieceModel
- registerAllExtensions(ExtensionRegistryLite) - Static method in class sentencepiece.SentencepieceModel
- remove() - Method in class com.yahoo.language.process.GramSplitter.GramSplitterIterator
- remove(int) - Method in class com.yahoo.language.process.StemList
- REMOVE_EXTRA_WHITESPACES_FIELD_NUMBER - Static variable in class sentencepiece.SentencepieceModel.NormalizerSpec
- removePieces(int) - Method in class sentencepiece.SentencepieceModel.ModelProto.Builder
-
Sentence pieces with scores.
- removeSamples(int) - Method in class sentencepiece.SentencepieceModel.SelfTestData.Builder
-
repeated .sentencepiece.SelfTestData.Sample samples = 1;
- replacement() - Method in class com.yahoo.language.process.SpecialTokens.Token
-
Returns the token to replace occurrences of this by, which equals token() unless this has a replacement.
- REQUIRED_CHARS_FIELD_NUMBER - Static variable in class sentencepiece.SentencepieceModel.TrainerSpec
- RHAETO_ROMANCE - com.yahoo.language.Language
-
Language tag "rm".
- ROMANIAN - com.yahoo.language.Language
-
Language tag "ro".
- RUNIC - com.yahoo.language.process.TokenScript
- RUSSIAN - com.yahoo.language.Language
-
Language tag "ru".
S
- SAMOAN - com.yahoo.language.Language
-
Language tag "sm".
- SAMPLES_FIELD_NUMBER - Static variable in class sentencepiece.SentencepieceModel.SelfTestData
- SANGHO - com.yahoo.language.Language
-
Language tag "sg".
- SANSKRIT - com.yahoo.language.Language
-
Language tag "sa".
- SCORE_FIELD_NUMBER - Static variable in class sentencepiece.SentencepieceModel.ModelProto.SentencePiece
- scoring() - Method in class com.yahoo.language.sentencepiece.SentencePieceConfig
- scoring(SentencePieceConfig.Scoring.Enum) - Method in class com.yahoo.language.sentencepiece.SentencePieceConfig.Builder
- Scoring - Enum in com.yahoo.language.sentencepiece
-
The scoring strategy to use for picking segments
- Scoring() - Constructor for class com.yahoo.language.sentencepiece.SentencePieceConfig.Scoring
- Scoring(SentencePieceConfig.Scoring.Enum) - Constructor for class com.yahoo.language.sentencepiece.SentencePieceConfig.Scoring
- SCOTS_GAELIC - com.yahoo.language.Language
-
Language tag "gd".
- SEED_SENTENCEPIECE_SIZE_FIELD_NUMBER - Static variable in class sentencepiece.SentencepieceModel.TrainerSpec
- segment(String, Language) - Method in interface com.yahoo.language.process.Segmenter
-
Split input-string into tokens, and returned a list of tokens in unprocessed form (i.e.
- segment(String, Language) - Method in class com.yahoo.language.process.SegmenterImpl
- segment(String, Language) - Method in class com.yahoo.language.sentencepiece.SentencePieceEncoder
-
Segments the given text into token segments using the SentencePiece algorithm
- Segmenter - Interface in com.yahoo.language.process
-
Interface providing segmentation, i.e.
- SEGMENTER - com.yahoo.language.Linguistics.Component
- SegmenterImpl - Class in com.yahoo.language.process
- SegmenterImpl(Tokenizer) - Constructor for class com.yahoo.language.process.SegmenterImpl
- SELF_TEST_DATA_FIELD_NUMBER - Static variable in class sentencepiece.SentencepieceModel.ModelProto
- SELF_TEST_SAMPLE_SIZE_FIELD_NUMBER - Static variable in class sentencepiece.SentencepieceModel.TrainerSpec
- sentencepiece - package sentencepiece
- SentencePieceConfig - Class in com.yahoo.language.sentencepiece
-
This class represents the root node of sentence-piece Copyright Yahoo.
- SentencePieceConfig(SentencePieceConfig.Builder) - Constructor for class com.yahoo.language.sentencepiece.SentencePieceConfig
- SentencePieceConfig.Builder - Class in com.yahoo.language.sentencepiece
- SentencePieceConfig.Model - Class in com.yahoo.language.sentencepiece
-
This class represents sentence-piece.model[]
- SentencePieceConfig.Model.Builder - Class in com.yahoo.language.sentencepiece
- SentencePieceConfig.Producer - Interface in com.yahoo.language.sentencepiece
- SentencePieceConfig.Scoring - Class in com.yahoo.language.sentencepiece
-
This class represents sentence-piece.scoring The scoring strategy to use when picking a segmentation.
- SentencePieceConfig.Scoring.Enum - Enum in com.yahoo.language.sentencepiece
- SentencePieceEncoder - Class in com.yahoo.language.sentencepiece
-
Integration with https://github.com/google/sentencepiece through http://docs.djl.ai/extensions/sentencepiece/index.html SentencePiece is a language-agnostic tokenizer for neural nets.
- SentencePieceEncoder(SentencePieceConfig) - Constructor for class com.yahoo.language.sentencepiece.SentencePieceEncoder
- SentencePieceEncoder(SentencePieceEncoder.Builder) - Constructor for class com.yahoo.language.sentencepiece.SentencePieceEncoder
- SentencePieceEncoder.Builder - Class in com.yahoo.language.sentencepiece
- SentencepieceModel - Class in sentencepiece
- SentencepieceModel.ModelProto - Class in sentencepiece
-
ModelProto stores model parameters.
- SentencepieceModel.ModelProto.Builder - Class in sentencepiece
-
ModelProto stores model parameters.
- SentencepieceModel.ModelProto.SentencePiece - Class in sentencepiece
-
Protobuf type
sentencepiece.ModelProto.SentencePiece
- SentencepieceModel.ModelProto.SentencePiece.Builder - Class in sentencepiece
-
Protobuf type
sentencepiece.ModelProto.SentencePiece
- SentencepieceModel.ModelProto.SentencePiece.Type - Enum in sentencepiece
-
Protobuf enum
sentencepiece.ModelProto.SentencePiece.Type
- SentencepieceModel.ModelProto.SentencePieceOrBuilder - Interface in sentencepiece
- SentencepieceModel.ModelProtoOrBuilder - Interface in sentencepiece
- SentencepieceModel.NormalizerSpec - Class in sentencepiece
-
NormalizerSpec encodes a various parameters for string normalizaiton
- SentencepieceModel.NormalizerSpec.Builder - Class in sentencepiece
-
NormalizerSpec encodes a various parameters for string normalizaiton
- SentencepieceModel.NormalizerSpecOrBuilder - Interface in sentencepiece
- SentencepieceModel.SelfTestData - Class in sentencepiece
-
Proto to store samples for self-testing.
- SentencepieceModel.SelfTestData.Builder - Class in sentencepiece
-
Proto to store samples for self-testing.
- SentencepieceModel.SelfTestData.Sample - Class in sentencepiece
-
Protobuf type
sentencepiece.SelfTestData.Sample
- SentencepieceModel.SelfTestData.Sample.Builder - Class in sentencepiece
-
Protobuf type
sentencepiece.SelfTestData.Sample
- SentencepieceModel.SelfTestData.SampleOrBuilder - Interface in sentencepiece
- SentencepieceModel.SelfTestDataOrBuilder - Interface in sentencepiece
- SentencepieceModel.TrainerSpec - Class in sentencepiece
-
TrainerSpec encodes a various parameters for SentencePiece training.
- SentencepieceModel.TrainerSpec.Builder - Class in sentencepiece
-
TrainerSpec encodes a various parameters for SentencePiece training.
- SentencepieceModel.TrainerSpec.ModelType - Enum in sentencepiece
-
Model type.
- SentencepieceModel.TrainerSpecOrBuilder - Interface in sentencepiece
- SERBIAN - com.yahoo.language.Language
-
Language tag "sr".
- SERBO_CROATIAN - com.yahoo.language.Language
-
Language tag "s".
- SESOTHO - com.yahoo.language.Language
-
Language tag "st".
- set(int, String) - Method in class com.yahoo.language.process.StemList
- setAcceptLanguage(int, String) - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
-
List of the languages this model can accept.
- setAddDummyPrefix(boolean) - Method in class sentencepiece.SentencepieceModel.NormalizerSpec.Builder
-
Adds dummy whitespace at the beginning of text in order to treat "world" in "world" and "hello world" in the same way.
- setAllowWhitespaceOnlyPieces(boolean) - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
-
Allows pieces that only contain whitespaces instead of appearing only as prefix or suffix of other pieces.
- setApplyOnRestart(boolean) - Method in class com.yahoo.language.sentencepiece.SentencePieceConfig.Builder
- setBosId(int) - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
-
<s>
- setBosPiece(String) - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
-
optional string bos_piece = 46 [default = "<s>"];
- setBosPieceBytes(ByteString) - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
-
optional string bos_piece = 46 [default = "<s>"];
- setByteFallback(boolean) - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
-
Decomposes unknown pieces into UTF-8 bytes.
- setCharacterCoverage(float) - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
-
///////////////////////////////////////////////////////////////// Training parameters.
- setCollapseUnknowns(boolean) - Method in class com.yahoo.language.sentencepiece.SentencePieceEncoder.Builder
-
Sets whether consecutive unknown character should be collapsed into one large unknown token (default) or be returned as single character tokens.
- setControlSymbols(int, String) - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
-
///////////////////////////////////////////////////////////////// Vocabulary management Defines control symbols used as an indicator to change the behavior of the decoder.
- setDenormalizerSpec(SentencepieceModel.NormalizerSpec) - Method in class sentencepiece.SentencepieceModel.ModelProto.Builder
-
Spec for text de-normalization.
- setDenormalizerSpec(SentencepieceModel.NormalizerSpec.Builder) - Method in class sentencepiece.SentencepieceModel.ModelProto.Builder
-
Spec for text de-normalization.
- setEosId(int) - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
-
</s>
- setEosPiece(String) - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
-
optional string eos_piece = 47 [default = "</s>"];
- setEosPieceBytes(ByteString) - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
-
optional string eos_piece = 47 [default = "</s>"];
- setEscapeWhitespaces(boolean) - Method in class sentencepiece.SentencepieceModel.NormalizerSpec.Builder
-
Replaces whitespace with meta symbol.
- setExpected(String) - Method in class sentencepiece.SentencepieceModel.SelfTestData.Sample.Builder
-
optional string expected = 2;
- setExpectedBytes(ByteString) - Method in class sentencepiece.SentencepieceModel.SelfTestData.Sample.Builder
-
optional string expected = 2;
- setExtension(GeneratedMessage.GeneratedExtension<SentencepieceModel.ModelProto, List<Type>>, int, Type) - Method in class sentencepiece.SentencepieceModel.ModelProto.Builder
- setExtension(GeneratedMessage.GeneratedExtension<SentencepieceModel.ModelProto, Type>, Type) - Method in class sentencepiece.SentencepieceModel.ModelProto.Builder
- setExtension(GeneratedMessage.GeneratedExtension<SentencepieceModel.ModelProto.SentencePiece, List<Type>>, int, Type) - Method in class sentencepiece.SentencepieceModel.ModelProto.SentencePiece.Builder
- setExtension(GeneratedMessage.GeneratedExtension<SentencepieceModel.ModelProto.SentencePiece, Type>, Type) - Method in class sentencepiece.SentencepieceModel.ModelProto.SentencePiece.Builder
- setExtension(GeneratedMessage.GeneratedExtension<SentencepieceModel.NormalizerSpec, List<Type>>, int, Type) - Method in class sentencepiece.SentencepieceModel.NormalizerSpec.Builder
- setExtension(GeneratedMessage.GeneratedExtension<SentencepieceModel.NormalizerSpec, Type>, Type) - Method in class sentencepiece.SentencepieceModel.NormalizerSpec.Builder
- setExtension(GeneratedMessage.GeneratedExtension<SentencepieceModel.SelfTestData, List<Type>>, int, Type) - Method in class sentencepiece.SentencepieceModel.SelfTestData.Builder
- setExtension(GeneratedMessage.GeneratedExtension<SentencepieceModel.SelfTestData, Type>, Type) - Method in class sentencepiece.SentencepieceModel.SelfTestData.Builder
- setExtension(GeneratedMessage.GeneratedExtension<SentencepieceModel.TrainerSpec, List<Type>>, int, Type) - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
- setExtension(GeneratedMessage.GeneratedExtension<SentencepieceModel.TrainerSpec, Type>, Type) - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
- setField(Descriptors.FieldDescriptor, Object) - Method in class sentencepiece.SentencepieceModel.ModelProto.Builder
- setField(Descriptors.FieldDescriptor, Object) - Method in class sentencepiece.SentencepieceModel.ModelProto.SentencePiece.Builder
- setField(Descriptors.FieldDescriptor, Object) - Method in class sentencepiece.SentencepieceModel.NormalizerSpec.Builder
- setField(Descriptors.FieldDescriptor, Object) - Method in class sentencepiece.SentencepieceModel.SelfTestData.Builder
- setField(Descriptors.FieldDescriptor, Object) - Method in class sentencepiece.SentencepieceModel.SelfTestData.Sample.Builder
- setField(Descriptors.FieldDescriptor, Object) - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
- setHardVocabLimit(boolean) - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
-
`vocab_size` is treated as hard limit.
- setInput(int, String) - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
-
///////////////////////////////////////////////////////////////// General parameters Input corpus files.
- setInput(String) - Method in class sentencepiece.SentencepieceModel.SelfTestData.Sample.Builder
-
optional string input = 1;
- setInputBytes(ByteString) - Method in class sentencepiece.SentencepieceModel.SelfTestData.Sample.Builder
-
optional string input = 1;
- setInputFormat(String) - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
-
Input corpus format: "text": one-sentence-per-line text format (default) "tsv": sentence <tab> freq
- setInputFormatBytes(ByteString) - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
-
Input corpus format: "text": one-sentence-per-line text format (default) "tsv": sentence <tab> freq
- setInputSentenceSize(long) - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
-
Maximum size of sentences the trainer loads from `input` parameter.
- setMaxSentenceLength(int) - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
-
The maximum sentence length in byte.
- setMaxSentencepieceLength(int) - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
-
///////////////////////////////////////////////////////////////// SentencePiece parameters which control the shapes of sentence piece.
- setMiningSentenceSize(int) - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
-
Deprecated.
- setModelPrefix(String) - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
-
Output model file prefix.
- setModelPrefixBytes(ByteString) - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
-
Output model file prefix.
- setModelType(SentencepieceModel.TrainerSpec.ModelType) - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
-
optional .sentencepiece.TrainerSpec.ModelType model_type = 3 [default = UNIGRAM];
- setName(String) - Method in class sentencepiece.SentencepieceModel.NormalizerSpec.Builder
-
name of normalization rule.
- setNameBytes(ByteString) - Method in class sentencepiece.SentencepieceModel.NormalizerSpec.Builder
-
name of normalization rule.
- setNormalizationRuleTsv(String) - Method in class sentencepiece.SentencepieceModel.NormalizerSpec.Builder
-
Custom normalization rule file in TSV format.
- setNormalizationRuleTsvBytes(ByteString) - Method in class sentencepiece.SentencepieceModel.NormalizerSpec.Builder
-
Custom normalization rule file in TSV format.
- setNormalizerSpec(SentencepieceModel.NormalizerSpec) - Method in class sentencepiece.SentencepieceModel.ModelProto.Builder
-
Spec for text normalization.
- setNormalizerSpec(SentencepieceModel.NormalizerSpec.Builder) - Method in class sentencepiece.SentencepieceModel.ModelProto.Builder
-
Spec for text normalization.
- setNumSubIterations(int) - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
-
Number of EM sub iterations.
- setNumThreads(int) - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
-
Number of threads in the training.
- setPadId(int) - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
-
<pad> (padding)
- setPadPiece(String) - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
-
optional string pad_piece = 48 [default = "<pad>"];
- setPadPieceBytes(ByteString) - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
-
optional string pad_piece = 48 [default = "<pad>"];
- setPiece(String) - Method in class sentencepiece.SentencepieceModel.ModelProto.SentencePiece.Builder
-
piece must not be empty.
- setPieceBytes(ByteString) - Method in class sentencepiece.SentencepieceModel.ModelProto.SentencePiece.Builder
-
piece must not be empty.
- setPieces(int, SentencepieceModel.ModelProto.SentencePiece) - Method in class sentencepiece.SentencepieceModel.ModelProto.Builder
-
Sentence pieces with scores.
- setPieces(int, SentencepieceModel.ModelProto.SentencePiece.Builder) - Method in class sentencepiece.SentencepieceModel.ModelProto.Builder
-
Sentence pieces with scores.
- setPrecompiledCharsmap(ByteString) - Method in class sentencepiece.SentencepieceModel.NormalizerSpec.Builder
-
Pre-compiled normalization rule created by Builder::GetPrecompiledCharsMap() or Builder::CompileCharsMap() method.
- setRemoveExtraWhitespaces(boolean) - Method in class sentencepiece.SentencepieceModel.NormalizerSpec.Builder
-
Removes leading, trailing, and duplicate internal whitespace.
- setRepeatedField(Descriptors.FieldDescriptor, int, Object) - Method in class sentencepiece.SentencepieceModel.ModelProto.Builder
- setRepeatedField(Descriptors.FieldDescriptor, int, Object) - Method in class sentencepiece.SentencepieceModel.ModelProto.SentencePiece.Builder
- setRepeatedField(Descriptors.FieldDescriptor, int, Object) - Method in class sentencepiece.SentencepieceModel.NormalizerSpec.Builder
- setRepeatedField(Descriptors.FieldDescriptor, int, Object) - Method in class sentencepiece.SentencepieceModel.SelfTestData.Builder
- setRepeatedField(Descriptors.FieldDescriptor, int, Object) - Method in class sentencepiece.SentencepieceModel.SelfTestData.Sample.Builder
- setRepeatedField(Descriptors.FieldDescriptor, int, Object) - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
- setRequiredChars(String) - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
-
Defines required characters.
- setRequiredCharsBytes(ByteString) - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
-
Defines required characters.
- setSamples(int, SentencepieceModel.SelfTestData.Sample) - Method in class sentencepiece.SentencepieceModel.SelfTestData.Builder
-
repeated .sentencepiece.SelfTestData.Sample samples = 1;
- setSamples(int, SentencepieceModel.SelfTestData.Sample.Builder) - Method in class sentencepiece.SentencepieceModel.SelfTestData.Builder
-
repeated .sentencepiece.SelfTestData.Sample samples = 1;
- setScore(float) - Method in class sentencepiece.SentencepieceModel.ModelProto.SentencePiece.Builder
-
optional float score = 2;
- setScoring(Scoring) - Method in class com.yahoo.language.sentencepiece.SentencePieceEncoder.Builder
-
Sets the scoring strategy to use when picking a segmentation.
- setSeedSentencepieceSize(int) - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
-
The size of seed sentencepieces.
- setSelfTestData(SentencepieceModel.SelfTestData) - Method in class sentencepiece.SentencepieceModel.ModelProto.Builder
-
Stores sample input and its expected segmentation to verify the model.
- setSelfTestData(SentencepieceModel.SelfTestData.Builder) - Method in class sentencepiece.SentencepieceModel.ModelProto.Builder
-
Stores sample input and its expected segmentation to verify the model.
- setSelfTestSampleSize(int) - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
-
Size of self-test samples, which are encoded in the model file.
- setShrinkingFactor(float) - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
-
In every EM sub-iterations, keeps top `shrinking_factor` * `current sentencepieces size` with respect to the loss of the sentence piece.
- setShuffleInputSentence(boolean) - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
-
optional bool shuffle_input_sentence = 19 [default = true];
- setSplitByNumber(boolean) - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
-
When `split_by_number` is true, put a boundary between number and non-number transition.
- setSplitByUnicodeScript(boolean) - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
-
Uses Unicode script to split sentence pieces.
- setSplitByWhitespace(boolean) - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
-
Use a white space to split sentence pieces.
- setSplitDigits(boolean) - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
-
Split all digits (0-9) into separate pieces.
- SETSWANA - com.yahoo.language.Language
-
Language tag "tn".
- setTrainerSpec(SentencepieceModel.TrainerSpec) - Method in class sentencepiece.SentencepieceModel.ModelProto.Builder
-
Spec used to generate this model file.
- setTrainerSpec(SentencepieceModel.TrainerSpec.Builder) - Method in class sentencepiece.SentencepieceModel.ModelProto.Builder
-
Spec used to generate this model file.
- setTrainExtremelyLargeCorpus(boolean) - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
-
Increase bit depth to allow unigram model training on large (>10M sentences) corpora.
- setTrainingSentenceSize(int) - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
-
Deprecated.
- setTreatWhitespaceAsSuffix(boolean) - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
-
Adds whitespace symbol (_) as a suffix instead of prefix.
- setType(SentencepieceModel.ModelProto.SentencePiece.Type) - Method in class sentencepiece.SentencepieceModel.ModelProto.SentencePiece.Builder
-
optional .sentencepiece.ModelProto.SentencePiece.Type type = 3 [default = NORMAL];
- setUnkId(int) - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
-
///////////////////////////////////////////////////////////////// Reserved special meta tokens.
- setUnknownFields(UnknownFieldSet) - Method in class sentencepiece.SentencepieceModel.ModelProto.Builder
- setUnknownFields(UnknownFieldSet) - Method in class sentencepiece.SentencepieceModel.ModelProto.SentencePiece.Builder
- setUnknownFields(UnknownFieldSet) - Method in class sentencepiece.SentencepieceModel.NormalizerSpec.Builder
- setUnknownFields(UnknownFieldSet) - Method in class sentencepiece.SentencepieceModel.SelfTestData.Builder
- setUnknownFields(UnknownFieldSet) - Method in class sentencepiece.SentencepieceModel.SelfTestData.Sample.Builder
- setUnknownFields(UnknownFieldSet) - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
- setUnkPiece(String) - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
-
optional string unk_piece = 45 [default = "<unk>"];
- setUnkPieceBytes(ByteString) - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
-
optional string unk_piece = 45 [default = "<unk>"];
- setUnkSurface(String) - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
-
Encodes <unk> into U+2047 (DOUBLE QUESTION MARK), since this character can be useful both for user and developer.
- setUnkSurfaceBytes(ByteString) - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
-
Encodes <unk> into U+2047 (DOUBLE QUESTION MARK), since this character can be useful both for user and developer.
- setUseAllVocab(boolean) - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
-
use all symbols for vocab extraction.
- setUserDefinedSymbols(int, String) - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
-
Defines user defined symbols.
- setVocabSize(int) - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
-
Vocabulary size.
- setVocabularyOutputPieceScore(boolean) - Method in class sentencepiece.SentencepieceModel.TrainerSpec.Builder
-
When creating the vocabulary file, defines whether or not to additionally output the score for each piece.
- SHAVIAN - com.yahoo.language.process.TokenScript
- SHONA - com.yahoo.language.Language
-
Language tag "sn".
- SHORTEST - com.yahoo.language.process.StemMode
- SHRINKING_FACTOR_FIELD_NUMBER - Static variable in class sentencepiece.SentencepieceModel.TrainerSpec
- SHUFFLE_INPUT_SENTENCE_FIELD_NUMBER - Static variable in class sentencepiece.SentencepieceModel.TrainerSpec
- SICHUAN_YI - com.yahoo.language.Language
-
Language tag "ii".
- SINDHI - com.yahoo.language.Language
-
Language tag "sd".
- SINHALA - com.yahoo.language.process.TokenScript
- SINHALESE - com.yahoo.language.Language
-
Language tag "si".
- SISWATI - com.yahoo.language.Language
-
Language tag "ss".
- size() - Method in class com.yahoo.language.process.StemList
- SLOVAK - com.yahoo.language.Language
-
Language tag "sk".
- SLOVENIAN - com.yahoo.language.Language
-
Language tag "sl".
- SOMALI - com.yahoo.language.Language
-
Language tag "so".
- SPACE - com.yahoo.language.process.TokenType
- SPANISH - com.yahoo.language.Language
-
Language tag "es".
- SpecialTokenRegistry - Class in com.yahoo.language.process
-
Immutable named lists of "special tokens" - strings which should override the normal tokenizer semantics and be tokenized into a single token.
- SpecialTokenRegistry() - Constructor for class com.yahoo.language.process.SpecialTokenRegistry
-
Creates an empty special token registry
- SpecialTokenRegistry(SpecialtokensConfig) - Constructor for class com.yahoo.language.process.SpecialTokenRegistry
-
Create a special token registry from a configuration object.
- SpecialTokenRegistry(List<SpecialTokens>) - Constructor for class com.yahoo.language.process.SpecialTokenRegistry
- SpecialTokens - Class in com.yahoo.language.process
-
An immutable list of special tokens - strings which should override the normal tokenizer semantics and be tokenized into a single token.
- SpecialTokens(String, List<SpecialTokens.Token>) - Constructor for class com.yahoo.language.process.SpecialTokens
- SpecialTokens.Token - Class in com.yahoo.language.process
-
An immutable special token
- split(String, int) - Method in class com.yahoo.language.process.GramSplitter
-
Splits the input into grams of size n and returns an iterator over grams represented as [start index,length] pairs into the input string.
- SPLIT_BY_NUMBER_FIELD_NUMBER - Static variable in class sentencepiece.SentencepieceModel.TrainerSpec
- SPLIT_BY_UNICODE_SCRIPT_FIELD_NUMBER - Static variable in class sentencepiece.SentencepieceModel.TrainerSpec
- SPLIT_BY_WHITESPACE_FIELD_NUMBER - Static variable in class sentencepiece.SentencepieceModel.TrainerSpec
- SPLIT_DIGITS_FIELD_NUMBER - Static variable in class sentencepiece.SentencepieceModel.TrainerSpec
- stem(String, StemMode, Language) - Method in interface com.yahoo.language.process.Stemmer
-
Stem input according to specified stemming mode.
- stem(String, StemMode, Language) - Method in class com.yahoo.language.process.StemmerImpl
- StemList - Class in com.yahoo.language.process
-
A list of strings which does not allow for duplicate elements.
- StemList() - Constructor for class com.yahoo.language.process.StemList
- StemList(String...) - Constructor for class com.yahoo.language.process.StemList
- Stemmer - Interface in com.yahoo.language.process
-
Interface providing stemming of single words.
- STEMMER - com.yahoo.language.Linguistics.Component
- StemmerImpl - Class in com.yahoo.language.process
- StemmerImpl(Tokenizer) - Constructor for class com.yahoo.language.process.StemmerImpl
- StemMode - Enum in com.yahoo.language.process
-
An enum of the stemming modes which can be requested.
- SUNDANESE - com.yahoo.language.Language
-
Language tag "su".
- SWAHILI - com.yahoo.language.Language
-
Language tag "sw".
- SWEDISH - com.yahoo.language.Language
-
Language tag "sv".
- SYLOTINAGRI - com.yahoo.language.process.TokenScript
- SYMBOL - com.yahoo.language.process.TokenType
- SYRIAC - com.yahoo.language.Language
-
Language tag "syr".
- SYRIAC - com.yahoo.language.process.TokenScript
T
- TAGALOG - com.yahoo.language.Language
-
Language tag "fil".
- TAGALOG - com.yahoo.language.process.TokenScript
- TAGBANWA - com.yahoo.language.process.TokenScript
- TAILE - com.yahoo.language.process.TokenScript
- TAILUE - com.yahoo.language.process.TokenScript
- TAJIK - com.yahoo.language.Language
-
Language tag "tg".
- TAMIL - com.yahoo.language.Language
-
Language tag "ta".
- TAMIL - com.yahoo.language.process.TokenScript
- TATAR - com.yahoo.language.Language
-
Language tag "tt".
- TELUGU - com.yahoo.language.Language
-
Language tag "te".
- TELUGU - com.yahoo.language.process.TokenScript
- THAANA - com.yahoo.language.process.TokenScript
- THAI - com.yahoo.language.Language
-
Language tag "th".
- THAI - com.yahoo.language.process.TokenScript
- throwsOnUse - Static variable in interface com.yahoo.language.process.Encoder
-
An instance of this which throws IllegalStateException if attempted used
- TIBETAN - com.yahoo.language.Language
-
Language tag "bo".
- TIBETAN - com.yahoo.language.process.TokenScript
- TIFINAGH - com.yahoo.language.process.TokenScript
- TIGRINYA - com.yahoo.language.Language
-
Language tag "ti".
- toBuilder() - Method in class sentencepiece.SentencepieceModel.ModelProto.SentencePiece
- toBuilder() - Method in class sentencepiece.SentencepieceModel.ModelProto
- toBuilder() - Method in class sentencepiece.SentencepieceModel.NormalizerSpec
- toBuilder() - Method in class sentencepiece.SentencepieceModel.SelfTestData.Sample
- toBuilder() - Method in class sentencepiece.SentencepieceModel.SelfTestData
- toBuilder() - Method in class sentencepiece.SentencepieceModel.TrainerSpec
- toExtractedList() - Method in class com.yahoo.language.process.GramSplitter.GramSplitterIterator
-
Convenience list which splits the remaining items in this iterator into a list of gram strings
- token() - Method in class com.yahoo.language.process.SpecialTokens.Token
-
Returns the special token
- Token - Interface in com.yahoo.language.process
-
A single token produced by the tokenizer.
- Token(String) - Constructor for class com.yahoo.language.process.SpecialTokens.Token
-
Creates a special token
- Token(String, String) - Constructor for class com.yahoo.language.process.SpecialTokens.Token
-
Creates a special token which will be represented by the given replacement token
- tokenize(String, boolean) - Method in class com.yahoo.language.process.SpecialTokens
-
Returns the special token starting at the start of the given string, or null if no special token starts at this string
- tokenize(String, Language, StemMode, boolean) - Method in interface com.yahoo.language.process.Tokenizer
-
Returns the tokens produced from an input string under the rules of the given Language and additional options
- Tokenizer - Interface in com.yahoo.language.process
-
Language-sensitive tokenization of a text string.
- TOKENIZER - com.yahoo.language.Linguistics.Component
- TokenScript - Enum in com.yahoo.language.process
-
List of token scripts (e.g.
- TokenType - Enum in com.yahoo.language.process
-
An enumeration of token types.
- toLowerCase(String) - Static method in class com.yahoo.language.LinguisticsCase
-
The lower casing method to use in Vespa when doing language independent processing of natural language data.
- TONGA - com.yahoo.language.Language
-
Language tag "to".
- toString() - Method in class com.yahoo.language.process.SpecialTokens.Token
- TRAIN_EXTREMELY_LARGE_CORPUS_FIELD_NUMBER - Static variable in class sentencepiece.SentencepieceModel.TrainerSpec
- TRAINER_SPEC_FIELD_NUMBER - Static variable in class sentencepiece.SentencepieceModel.ModelProto
- TRAINING_SENTENCE_SIZE_FIELD_NUMBER - Static variable in class sentencepiece.SentencepieceModel.TrainerSpec
- Transformer - Interface in com.yahoo.language.process
-
Interface for providers of text transformations such as accent removal.
- TRANSFORMER - com.yahoo.language.Linguistics.Component
- TREAT_WHITESPACE_AS_SUFFIX_FIELD_NUMBER - Static variable in class sentencepiece.SentencepieceModel.TrainerSpec
- TSONGA - com.yahoo.language.Language
-
Language tag "ts".
- TURKISH - com.yahoo.language.Language
-
Language tag "tr".
- TURKMEN - com.yahoo.language.Language
-
Language tag "tk".
- TWI - com.yahoo.language.Language
-
Language tag "tw".
- TYPE_FIELD_NUMBER - Static variable in class sentencepiece.SentencepieceModel.ModelProto.SentencePiece
U
- UGARITIC - com.yahoo.language.Language
-
Language tag "uga".
- UGARITIC - com.yahoo.language.process.TokenScript
- UIGHUR - com.yahoo.language.Language
-
Language tag "ug".
- UKRAINIAN - com.yahoo.language.Language
-
Language tag "uk".
- UNIGRAM - sentencepiece.SentencepieceModel.TrainerSpec.ModelType
-
Unigram language model with dynamic algorithm
- UNIGRAM_VALUE - Static variable in enum sentencepiece.SentencepieceModel.TrainerSpec.ModelType
-
Unigram language model with dynamic algorithm
- UNK_ID_FIELD_NUMBER - Static variable in class sentencepiece.SentencepieceModel.TrainerSpec
- UNK_PIECE_FIELD_NUMBER - Static variable in class sentencepiece.SentencepieceModel.TrainerSpec
- UNK_SURFACE_FIELD_NUMBER - Static variable in class sentencepiece.SentencepieceModel.TrainerSpec
- UNKNOWN - com.yahoo.language.Language
-
Language tag "un".
- UNKNOWN - com.yahoo.language.process.TokenScript
- UNKNOWN - com.yahoo.language.process.TokenType
- UNKNOWN - sentencepiece.SentencepieceModel.ModelProto.SentencePiece.Type
-
unknown symbol.
- UNKNOWN_VALUE - Static variable in enum sentencepiece.SentencepieceModel.ModelProto.SentencePiece.Type
-
unknown symbol.
- UNUSED - sentencepiece.SentencepieceModel.ModelProto.SentencePiece.Type
-
this piece is not used.
- UNUSED_VALUE - Static variable in enum sentencepiece.SentencepieceModel.ModelProto.SentencePiece.Type
-
this piece is not used.
- URDU - com.yahoo.language.Language
-
Language tag "ur".
- USE_ALL_VOCAB_FIELD_NUMBER - Static variable in class sentencepiece.SentencepieceModel.TrainerSpec
- USER_DEFINED - sentencepiece.SentencepieceModel.ModelProto.SentencePiece.Type
-
user defined symbols.
- USER_DEFINED_SYMBOLS_FIELD_NUMBER - Static variable in class sentencepiece.SentencepieceModel.TrainerSpec
- USER_DEFINED_VALUE - Static variable in enum sentencepiece.SentencepieceModel.ModelProto.SentencePiece.Type
-
user defined symbols.
- UZBEK - com.yahoo.language.Language
-
Language tag "uz".
V
- valueOf(int) - Static method in enum com.yahoo.language.process.TokenType
-
Translates this from the int code representation returned from
TokenType.getValue()
- valueOf(int) - Static method in enum sentencepiece.SentencepieceModel.ModelProto.SentencePiece.Type
-
Deprecated.
- valueOf(int) - Static method in enum sentencepiece.SentencepieceModel.TrainerSpec.ModelType
-
Deprecated.
- valueOf(Descriptors.EnumValueDescriptor) - Static method in enum sentencepiece.SentencepieceModel.ModelProto.SentencePiece.Type
-
Returns the enum constant of this type with the specified name.
- valueOf(Descriptors.EnumValueDescriptor) - Static method in enum sentencepiece.SentencepieceModel.TrainerSpec.ModelType
-
Returns the enum constant of this type with the specified name.
- valueOf(String) - Static method in enum com.yahoo.language.Language
-
Returns the enum constant of this type with the specified name.
- valueOf(String) - Static method in enum com.yahoo.language.Linguistics.Component
-
Returns the enum constant of this type with the specified name.
- valueOf(String) - Static method in enum com.yahoo.language.process.StemMode
-
Returns the enum constant of this type with the specified name.
- valueOf(String) - Static method in enum com.yahoo.language.process.TokenScript
-
Returns the enum constant of this type with the specified name.
- valueOf(String) - Static method in enum com.yahoo.language.process.TokenType
-
Returns the enum constant of this type with the specified name.
- valueOf(String) - Static method in enum com.yahoo.language.sentencepiece.Scoring
-
Returns the enum constant of this type with the specified name.
- valueOf(String) - Static method in enum com.yahoo.language.sentencepiece.SentencePieceConfig.Scoring.Enum
-
Returns the enum constant of this type with the specified name.
- valueOf(String) - Static method in enum sentencepiece.SentencepieceModel.ModelProto.SentencePiece.Type
-
Returns the enum constant of this type with the specified name.
- valueOf(String) - Static method in enum sentencepiece.SentencepieceModel.TrainerSpec.ModelType
-
Returns the enum constant of this type with the specified name.
- values() - Static method in enum com.yahoo.language.Language
-
Returns an array containing the constants of this enum type, in the order they are declared.
- values() - Static method in enum com.yahoo.language.Linguistics.Component
-
Returns an array containing the constants of this enum type, in the order they are declared.
- values() - Static method in enum com.yahoo.language.process.StemMode
-
Returns an array containing the constants of this enum type, in the order they are declared.
- values() - Static method in enum com.yahoo.language.process.TokenScript
-
Returns an array containing the constants of this enum type, in the order they are declared.
- values() - Static method in enum com.yahoo.language.process.TokenType
-
Returns an array containing the constants of this enum type, in the order they are declared.
- values() - Static method in enum com.yahoo.language.sentencepiece.Scoring
-
Returns an array containing the constants of this enum type, in the order they are declared.
- values() - Static method in enum com.yahoo.language.sentencepiece.SentencePieceConfig.Scoring.Enum
-
Returns an array containing the constants of this enum type, in the order they are declared.
- values() - Static method in enum sentencepiece.SentencepieceModel.ModelProto.SentencePiece.Type
-
Returns an array containing the constants of this enum type, in the order they are declared.
- values() - Static method in enum sentencepiece.SentencepieceModel.TrainerSpec.ModelType
-
Returns an array containing the constants of this enum type, in the order they are declared.
- VIETNAMESE - com.yahoo.language.Language
-
Language tag "vi".
- VIETNAMESE - com.yahoo.language.process.TokenScript
- VOCAB_SIZE_FIELD_NUMBER - Static variable in class sentencepiece.SentencepieceModel.TrainerSpec
- VOCABULARY_OUTPUT_PIECE_SCORE_FIELD_NUMBER - Static variable in class sentencepiece.SentencepieceModel.TrainerSpec
- VOLAPUK - com.yahoo.language.Language
-
Language tag "vo".
W
- WELSH - com.yahoo.language.Language
-
Language tag "cy".
- WOLOF - com.yahoo.language.Language
-
Language tag "wo".
- WORD - sentencepiece.SentencepieceModel.TrainerSpec.ModelType
-
Delimitered by whitespace.
- WORD_VALUE - Static variable in enum sentencepiece.SentencepieceModel.TrainerSpec.ModelType
-
Delimitered by whitespace.
- writeTo(CodedOutputStream) - Method in class sentencepiece.SentencepieceModel.ModelProto.SentencePiece
- writeTo(CodedOutputStream) - Method in class sentencepiece.SentencepieceModel.ModelProto
- writeTo(CodedOutputStream) - Method in class sentencepiece.SentencepieceModel.NormalizerSpec
- writeTo(CodedOutputStream) - Method in class sentencepiece.SentencepieceModel.SelfTestData.Sample
- writeTo(CodedOutputStream) - Method in class sentencepiece.SentencepieceModel.SelfTestData
- writeTo(CodedOutputStream) - Method in class sentencepiece.SentencepieceModel.TrainerSpec
X
Y
- YI - com.yahoo.language.process.TokenScript
- YIDDISH - com.yahoo.language.Language
-
Language tag "yi".
- YORUBA - com.yahoo.language.Language
-
Language tag "yo".
Z
- ZHUANG - com.yahoo.language.Language
-
Language tag "za".
- ZULU - com.yahoo.language.Language
-
Language tag "zu".
All Classes All Packages