java.lang.Object
- com.google.protobuf.AbstractMessageLite.Builder
- - com.google.protobuf.AbstractMessage.Builder<BuilderType>
  - - com.google.protobuf.GeneratedMessageV3.Builder<BuilderType>
    - - com.google.protobuf.GeneratedMessageV3.ExtendableBuilder<SentencepieceModel.TrainerSpec,SentencepieceModel.TrainerSpec.Builder>
      - sentencepiece.SentencepieceModel.TrainerSpec.Builder

All Implemented Interfaces:

com.google.protobuf.GeneratedMessageV3.ExtendableMessageOrBuilder<SentencepieceModel.TrainerSpec>, com.google.protobuf.Message.Builder, com.google.protobuf.MessageLite.Builder, com.google.protobuf.MessageLiteOrBuilder, com.google.protobuf.MessageOrBuilder, java.lang.Cloneable, SentencepieceModel.TrainerSpecOrBuilder

Enclosing class:

SentencepieceModel.TrainerSpec
```
public static final class SentencepieceModel.TrainerSpec.Builder
extends com.google.protobuf.GeneratedMessageV3.ExtendableBuilder<SentencepieceModel.TrainerSpec,SentencepieceModel.TrainerSpec.Builder>
implements SentencepieceModel.TrainerSpecOrBuilder
```
```
 TrainerSpec encodes a various parameters for SentencePiece training.
 
```
Protobuf type sentencepiece.TrainerSpec

Method Summary

All Methods Static Methods Instance Methods Concrete Methods Deprecated Methods
Modifier and Type	Method	Description
`SentencepieceModel.TrainerSpec.Builder`	`addAcceptLanguage(java.lang.String value)`	List of the languages this model can accept.
`SentencepieceModel.TrainerSpec.Builder`	`addAcceptLanguageBytes(com.google.protobuf.ByteString value)`	List of the languages this model can accept.
`SentencepieceModel.TrainerSpec.Builder`	`addAllAcceptLanguage(java.lang.Iterable<java.lang.String> values)`	List of the languages this model can accept.
`SentencepieceModel.TrainerSpec.Builder`	`addAllControlSymbols(java.lang.Iterable<java.lang.String> values)`	///////////////////////////////////////////////////////////////// Vocabulary management Defines control symbols used as an indicator to change the behavior of the decoder.
`SentencepieceModel.TrainerSpec.Builder`	`addAllInput(java.lang.Iterable<java.lang.String> values)`	///////////////////////////////////////////////////////////////// General parameters Input corpus files.
`SentencepieceModel.TrainerSpec.Builder`	`addAllUserDefinedSymbols(java.lang.Iterable<java.lang.String> values)`	Defines user defined symbols.
`SentencepieceModel.TrainerSpec.Builder`	`addControlSymbols(java.lang.String value)`	///////////////////////////////////////////////////////////////// Vocabulary management Defines control symbols used as an indicator to change the behavior of the decoder.
`SentencepieceModel.TrainerSpec.Builder`	`addControlSymbolsBytes(com.google.protobuf.ByteString value)`	///////////////////////////////////////////////////////////////// Vocabulary management Defines control symbols used as an indicator to change the behavior of the decoder.
`<Type> SentencepieceModel.TrainerSpec.Builder`	`addExtension(com.google.protobuf.GeneratedMessage.GeneratedExtension<SentencepieceModel.TrainerSpec,java.util.List<Type>> extension, Type value)`
`SentencepieceModel.TrainerSpec.Builder`	`addInput(java.lang.String value)`	///////////////////////////////////////////////////////////////// General parameters Input corpus files.
`SentencepieceModel.TrainerSpec.Builder`	`addInputBytes(com.google.protobuf.ByteString value)`	///////////////////////////////////////////////////////////////// General parameters Input corpus files.
`SentencepieceModel.TrainerSpec.Builder`	`addRepeatedField(com.google.protobuf.Descriptors.FieldDescriptor field, java.lang.Object value)`
`SentencepieceModel.TrainerSpec.Builder`	`addUserDefinedSymbols(java.lang.String value)`	Defines user defined symbols.
`SentencepieceModel.TrainerSpec.Builder`	`addUserDefinedSymbolsBytes(com.google.protobuf.ByteString value)`	Defines user defined symbols.
`SentencepieceModel.TrainerSpec`	`build()`
`SentencepieceModel.TrainerSpec`	`buildPartial()`
`SentencepieceModel.TrainerSpec.Builder`	`clear()`
`SentencepieceModel.TrainerSpec.Builder`	`clearAcceptLanguage()`	List of the languages this model can accept.
`SentencepieceModel.TrainerSpec.Builder`	`clearAllowWhitespaceOnlyPieces()`	Allows pieces that only contain whitespaces instead of appearing only as prefix or suffix of other pieces.
`SentencepieceModel.TrainerSpec.Builder`	`clearBosId()`	<s>
`SentencepieceModel.TrainerSpec.Builder`	`clearBosPiece()`	`optional string bos_piece = 46 [default = "<s>"];`
`SentencepieceModel.TrainerSpec.Builder`	`clearByteFallback()`	Decomposes unknown pieces into UTF-8 bytes.
`SentencepieceModel.TrainerSpec.Builder`	`clearCharacterCoverage()`	///////////////////////////////////////////////////////////////// Training parameters.
`SentencepieceModel.TrainerSpec.Builder`	`clearControlSymbols()`	///////////////////////////////////////////////////////////////// Vocabulary management Defines control symbols used as an indicator to change the behavior of the decoder.
`SentencepieceModel.TrainerSpec.Builder`	`clearEosId()`	</s>
`SentencepieceModel.TrainerSpec.Builder`	`clearEosPiece()`	`optional string eos_piece = 47 [default = "</s>"];`
`<Type> SentencepieceModel.TrainerSpec.Builder`	`clearExtension(com.google.protobuf.GeneratedMessage.GeneratedExtension<SentencepieceModel.TrainerSpec,?> extension)`
`SentencepieceModel.TrainerSpec.Builder`	`clearField(com.google.protobuf.Descriptors.FieldDescriptor field)`
`SentencepieceModel.TrainerSpec.Builder`	`clearHardVocabLimit()`	`vocab_size` is treated as hard limit.
`SentencepieceModel.TrainerSpec.Builder`	`clearInput()`	///////////////////////////////////////////////////////////////// General parameters Input corpus files.
`SentencepieceModel.TrainerSpec.Builder`	`clearInputFormat()`	Input corpus format: "text": one-sentence-per-line text format (default) "tsv": sentence <tab> freq
`SentencepieceModel.TrainerSpec.Builder`	`clearInputSentenceSize()`	Maximum size of sentences the trainer loads from `input` parameter.
`SentencepieceModel.TrainerSpec.Builder`	`clearMaxSentenceLength()`	The maximum sentence length in byte.
`SentencepieceModel.TrainerSpec.Builder`	`clearMaxSentencepieceLength()`	///////////////////////////////////////////////////////////////// SentencePiece parameters which control the shapes of sentence piece.
`SentencepieceModel.TrainerSpec.Builder`	`clearMiningSentenceSize()`	Deprecated.
`SentencepieceModel.TrainerSpec.Builder`	`clearModelPrefix()`	Output model file prefix.
`SentencepieceModel.TrainerSpec.Builder`	`clearModelType()`	`optional .sentencepiece.TrainerSpec.ModelType model_type = 3 [default = UNIGRAM];`
`SentencepieceModel.TrainerSpec.Builder`	`clearNumSubIterations()`	Number of EM sub iterations.
`SentencepieceModel.TrainerSpec.Builder`	`clearNumThreads()`	Number of threads in the training.
`SentencepieceModel.TrainerSpec.Builder`	`clearOneof(com.google.protobuf.Descriptors.OneofDescriptor oneof)`
`SentencepieceModel.TrainerSpec.Builder`	`clearPadId()`	<pad> (padding)
`SentencepieceModel.TrainerSpec.Builder`	`clearPadPiece()`	`optional string pad_piece = 48 [default = "<pad>"];`
`SentencepieceModel.TrainerSpec.Builder`	`clearRequiredChars()`	Defines required characters.
`SentencepieceModel.TrainerSpec.Builder`	`clearSeedSentencepieceSize()`	The size of seed sentencepieces.
`SentencepieceModel.TrainerSpec.Builder`	`clearSelfTestSampleSize()`	Size of self-test samples, which are encoded in the model file.
`SentencepieceModel.TrainerSpec.Builder`	`clearShrinkingFactor()`	In every EM sub-iterations, keeps top `shrinking_factor` * `current sentencepieces size` with respect to the loss of the sentence piece.
`SentencepieceModel.TrainerSpec.Builder`	`clearShuffleInputSentence()`	`optional bool shuffle_input_sentence = 19 [default = true];`
`SentencepieceModel.TrainerSpec.Builder`	`clearSplitByNumber()`	When `split_by_number` is true, put a boundary between number and non-number transition.
`SentencepieceModel.TrainerSpec.Builder`	`clearSplitByUnicodeScript()`	Uses Unicode script to split sentence pieces.
`SentencepieceModel.TrainerSpec.Builder`	`clearSplitByWhitespace()`	Use a white space to split sentence pieces.
`SentencepieceModel.TrainerSpec.Builder`	`clearSplitDigits()`	Split all digits (0-9) into separate pieces.
`SentencepieceModel.TrainerSpec.Builder`	`clearTrainExtremelyLargeCorpus()`	Increase bit depth to allow unigram model training on large (>10M sentences) corpora.
`SentencepieceModel.TrainerSpec.Builder`	`clearTrainingSentenceSize()`	Deprecated.
`SentencepieceModel.TrainerSpec.Builder`	`clearTreatWhitespaceAsSuffix()`	Adds whitespace symbol (_) as a suffix instead of prefix.
`SentencepieceModel.TrainerSpec.Builder`	`clearUnkId()`	///////////////////////////////////////////////////////////////// Reserved special meta tokens.
`SentencepieceModel.TrainerSpec.Builder`	`clearUnkPiece()`	`optional string unk_piece = 45 [default = "<unk>"];`
`SentencepieceModel.TrainerSpec.Builder`	`clearUnkSurface()`	Encodes <unk> into U+2047 (DOUBLE QUESTION MARK), since this character can be useful both for user and developer.
`SentencepieceModel.TrainerSpec.Builder`	`clearUseAllVocab()`	use all symbols for vocab extraction.
`SentencepieceModel.TrainerSpec.Builder`	`clearUserDefinedSymbols()`	Defines user defined symbols.
`SentencepieceModel.TrainerSpec.Builder`	`clearVocabSize()`	Vocabulary size.
`SentencepieceModel.TrainerSpec.Builder`	`clearVocabularyOutputPieceScore()`	When creating the vocabulary file, defines whether or not to additionally output the score for each piece.
`SentencepieceModel.TrainerSpec.Builder`	`clone()`
`java.lang.String`	`getAcceptLanguage(int index)`	List of the languages this model can accept.
`com.google.protobuf.ByteString`	`getAcceptLanguageBytes(int index)`	List of the languages this model can accept.
`int`	`getAcceptLanguageCount()`	List of the languages this model can accept.
`com.google.protobuf.ProtocolStringList`	`getAcceptLanguageList()`	List of the languages this model can accept.
`boolean`	`getAllowWhitespaceOnlyPieces()`	Allows pieces that only contain whitespaces instead of appearing only as prefix or suffix of other pieces.
`int`	`getBosId()`	<s>
`java.lang.String`	`getBosPiece()`	`optional string bos_piece = 46 [default = "<s>"];`
`com.google.protobuf.ByteString`	`getBosPieceBytes()`	`optional string bos_piece = 46 [default = "<s>"];`
`boolean`	`getByteFallback()`	Decomposes unknown pieces into UTF-8 bytes.
`float`	`getCharacterCoverage()`	///////////////////////////////////////////////////////////////// Training parameters.
`java.lang.String`	`getControlSymbols(int index)`	///////////////////////////////////////////////////////////////// Vocabulary management Defines control symbols used as an indicator to change the behavior of the decoder.
`com.google.protobuf.ByteString`	`getControlSymbolsBytes(int index)`	///////////////////////////////////////////////////////////////// Vocabulary management Defines control symbols used as an indicator to change the behavior of the decoder.
`int`	`getControlSymbolsCount()`	///////////////////////////////////////////////////////////////// Vocabulary management Defines control symbols used as an indicator to change the behavior of the decoder.
`com.google.protobuf.ProtocolStringList`	`getControlSymbolsList()`	///////////////////////////////////////////////////////////////// Vocabulary management Defines control symbols used as an indicator to change the behavior of the decoder.
`SentencepieceModel.TrainerSpec`	`getDefaultInstanceForType()`
`static com.google.protobuf.Descriptors.Descriptor`	`getDescriptor()`
`com.google.protobuf.Descriptors.Descriptor`	`getDescriptorForType()`
`int`	`getEosId()`	</s>
`java.lang.String`	`getEosPiece()`	`optional string eos_piece = 47 [default = "</s>"];`
`com.google.protobuf.ByteString`	`getEosPieceBytes()`	`optional string eos_piece = 47 [default = "</s>"];`
`boolean`	`getHardVocabLimit()`	`vocab_size` is treated as hard limit.
`java.lang.String`	`getInput(int index)`	///////////////////////////////////////////////////////////////// General parameters Input corpus files.
`com.google.protobuf.ByteString`	`getInputBytes(int index)`	///////////////////////////////////////////////////////////////// General parameters Input corpus files.
`int`	`getInputCount()`	///////////////////////////////////////////////////////////////// General parameters Input corpus files.
`java.lang.String`	`getInputFormat()`	Input corpus format: "text": one-sentence-per-line text format (default) "tsv": sentence <tab> freq
`com.google.protobuf.ByteString`	`getInputFormatBytes()`	Input corpus format: "text": one-sentence-per-line text format (default) "tsv": sentence <tab> freq
`com.google.protobuf.ProtocolStringList`	`getInputList()`	///////////////////////////////////////////////////////////////// General parameters Input corpus files.
`long`	`getInputSentenceSize()`	Maximum size of sentences the trainer loads from `input` parameter.
`int`	`getMaxSentenceLength()`	The maximum sentence length in byte.
`int`	`getMaxSentencepieceLength()`	///////////////////////////////////////////////////////////////// SentencePiece parameters which control the shapes of sentence piece.
`int`	`getMiningSentenceSize()`	Deprecated.
`java.lang.String`	`getModelPrefix()`	Output model file prefix.
`com.google.protobuf.ByteString`	`getModelPrefixBytes()`	Output model file prefix.
`SentencepieceModel.TrainerSpec.ModelType`	`getModelType()`	`optional .sentencepiece.TrainerSpec.ModelType model_type = 3 [default = UNIGRAM];`
`int`	`getNumSubIterations()`	Number of EM sub iterations.
`int`	`getNumThreads()`	Number of threads in the training.
`int`	`getPadId()`	<pad> (padding)
`java.lang.String`	`getPadPiece()`	`optional string pad_piece = 48 [default = "<pad>"];`
`com.google.protobuf.ByteString`	`getPadPieceBytes()`	`optional string pad_piece = 48 [default = "<pad>"];`
`java.lang.String`	`getRequiredChars()`	Defines required characters.
`com.google.protobuf.ByteString`	`getRequiredCharsBytes()`	Defines required characters.
`int`	`getSeedSentencepieceSize()`	The size of seed sentencepieces.
`int`	`getSelfTestSampleSize()`	Size of self-test samples, which are encoded in the model file.
`float`	`getShrinkingFactor()`	In every EM sub-iterations, keeps top `shrinking_factor` * `current sentencepieces size` with respect to the loss of the sentence piece.
`boolean`	`getShuffleInputSentence()`	`optional bool shuffle_input_sentence = 19 [default = true];`
`boolean`	`getSplitByNumber()`	When `split_by_number` is true, put a boundary between number and non-number transition.
`boolean`	`getSplitByUnicodeScript()`	Uses Unicode script to split sentence pieces.
`boolean`	`getSplitByWhitespace()`	Use a white space to split sentence pieces.
`boolean`	`getSplitDigits()`	Split all digits (0-9) into separate pieces.
`boolean`	`getTrainExtremelyLargeCorpus()`	Increase bit depth to allow unigram model training on large (>10M sentences) corpora.
`int`	`getTrainingSentenceSize()`	Deprecated.
`boolean`	`getTreatWhitespaceAsSuffix()`	Adds whitespace symbol (_) as a suffix instead of prefix.
`int`	`getUnkId()`	///////////////////////////////////////////////////////////////// Reserved special meta tokens.
`java.lang.String`	`getUnkPiece()`	`optional string unk_piece = 45 [default = "<unk>"];`
`com.google.protobuf.ByteString`	`getUnkPieceBytes()`	`optional string unk_piece = 45 [default = "<unk>"];`
`java.lang.String`	`getUnkSurface()`	Encodes <unk> into U+2047 (DOUBLE QUESTION MARK), since this character can be useful both for user and developer.
`com.google.protobuf.ByteString`	`getUnkSurfaceBytes()`	Encodes <unk> into U+2047 (DOUBLE QUESTION MARK), since this character can be useful both for user and developer.
`boolean`	`getUseAllVocab()`	use all symbols for vocab extraction.
`java.lang.String`	`getUserDefinedSymbols(int index)`	Defines user defined symbols.
`com.google.protobuf.ByteString`	`getUserDefinedSymbolsBytes(int index)`	Defines user defined symbols.
`int`	`getUserDefinedSymbolsCount()`	Defines user defined symbols.
`com.google.protobuf.ProtocolStringList`	`getUserDefinedSymbolsList()`	Defines user defined symbols.
`int`	`getVocabSize()`	Vocabulary size.
`boolean`	`getVocabularyOutputPieceScore()`	When creating the vocabulary file, defines whether or not to additionally output the score for each piece.
`boolean`	`hasAllowWhitespaceOnlyPieces()`	Allows pieces that only contain whitespaces instead of appearing only as prefix or suffix of other pieces.
`boolean`	`hasBosId()`	<s>
`boolean`	`hasBosPiece()`	`optional string bos_piece = 46 [default = "<s>"];`
`boolean`	`hasByteFallback()`	Decomposes unknown pieces into UTF-8 bytes.
`boolean`	`hasCharacterCoverage()`	///////////////////////////////////////////////////////////////// Training parameters.
`boolean`	`hasEosId()`	</s>
`boolean`	`hasEosPiece()`	`optional string eos_piece = 47 [default = "</s>"];`
`boolean`	`hasHardVocabLimit()`	`vocab_size` is treated as hard limit.
`boolean`	`hasInputFormat()`	Input corpus format: "text": one-sentence-per-line text format (default) "tsv": sentence <tab> freq
`boolean`	`hasInputSentenceSize()`	Maximum size of sentences the trainer loads from `input` parameter.
`boolean`	`hasMaxSentenceLength()`	The maximum sentence length in byte.
`boolean`	`hasMaxSentencepieceLength()`	///////////////////////////////////////////////////////////////// SentencePiece parameters which control the shapes of sentence piece.
`boolean`	`hasMiningSentenceSize()`	Deprecated.
`boolean`	`hasModelPrefix()`	Output model file prefix.
`boolean`	`hasModelType()`	`optional .sentencepiece.TrainerSpec.ModelType model_type = 3 [default = UNIGRAM];`
`boolean`	`hasNumSubIterations()`	Number of EM sub iterations.
`boolean`	`hasNumThreads()`	Number of threads in the training.
`boolean`	`hasPadId()`	<pad> (padding)
`boolean`	`hasPadPiece()`	`optional string pad_piece = 48 [default = "<pad>"];`
`boolean`	`hasRequiredChars()`	Defines required characters.
`boolean`	`hasSeedSentencepieceSize()`	The size of seed sentencepieces.
`boolean`	`hasSelfTestSampleSize()`	Size of self-test samples, which are encoded in the model file.
`boolean`	`hasShrinkingFactor()`	In every EM sub-iterations, keeps top `shrinking_factor` * `current sentencepieces size` with respect to the loss of the sentence piece.
`boolean`	`hasShuffleInputSentence()`	`optional bool shuffle_input_sentence = 19 [default = true];`
`boolean`	`hasSplitByNumber()`	When `split_by_number` is true, put a boundary between number and non-number transition.
`boolean`	`hasSplitByUnicodeScript()`	Uses Unicode script to split sentence pieces.
`boolean`	`hasSplitByWhitespace()`	Use a white space to split sentence pieces.
`boolean`	`hasSplitDigits()`	Split all digits (0-9) into separate pieces.
`boolean`	`hasTrainExtremelyLargeCorpus()`	Increase bit depth to allow unigram model training on large (>10M sentences) corpora.
`boolean`	`hasTrainingSentenceSize()`	Deprecated.
`boolean`	`hasTreatWhitespaceAsSuffix()`	Adds whitespace symbol (_) as a suffix instead of prefix.
`boolean`	`hasUnkId()`	///////////////////////////////////////////////////////////////// Reserved special meta tokens.
`boolean`	`hasUnkPiece()`	`optional string unk_piece = 45 [default = "<unk>"];`
`boolean`	`hasUnkSurface()`	Encodes <unk> into U+2047 (DOUBLE QUESTION MARK), since this character can be useful both for user and developer.
`boolean`	`hasUseAllVocab()`	use all symbols for vocab extraction.
`boolean`	`hasVocabSize()`	Vocabulary size.
`boolean`	`hasVocabularyOutputPieceScore()`	When creating the vocabulary file, defines whether or not to additionally output the score for each piece.
`protected com.google.protobuf.GeneratedMessageV3.FieldAccessorTable`	`internalGetFieldAccessorTable()`
`boolean`	`isInitialized()`
`SentencepieceModel.TrainerSpec.Builder`	`mergeFrom(com.google.protobuf.CodedInputStream input, com.google.protobuf.ExtensionRegistryLite extensionRegistry)`
`SentencepieceModel.TrainerSpec.Builder`	`mergeFrom(com.google.protobuf.Message other)`
`SentencepieceModel.TrainerSpec.Builder`	`mergeFrom(SentencepieceModel.TrainerSpec other)`
`SentencepieceModel.TrainerSpec.Builder`	`mergeUnknownFields(com.google.protobuf.UnknownFieldSet unknownFields)`
`SentencepieceModel.TrainerSpec.Builder`	`setAcceptLanguage(int index, java.lang.String value)`	List of the languages this model can accept.
`SentencepieceModel.TrainerSpec.Builder`	`setAllowWhitespaceOnlyPieces(boolean value)`	Allows pieces that only contain whitespaces instead of appearing only as prefix or suffix of other pieces.
`SentencepieceModel.TrainerSpec.Builder`	`setBosId(int value)`	<s>
`SentencepieceModel.TrainerSpec.Builder`	`setBosPiece(java.lang.String value)`	`optional string bos_piece = 46 [default = "<s>"];`
`SentencepieceModel.TrainerSpec.Builder`	`setBosPieceBytes(com.google.protobuf.ByteString value)`	`optional string bos_piece = 46 [default = "<s>"];`
`SentencepieceModel.TrainerSpec.Builder`	`setByteFallback(boolean value)`	Decomposes unknown pieces into UTF-8 bytes.
`SentencepieceModel.TrainerSpec.Builder`	`setCharacterCoverage(float value)`	///////////////////////////////////////////////////////////////// Training parameters.
`SentencepieceModel.TrainerSpec.Builder`	`setControlSymbols(int index, java.lang.String value)`	///////////////////////////////////////////////////////////////// Vocabulary management Defines control symbols used as an indicator to change the behavior of the decoder.
`SentencepieceModel.TrainerSpec.Builder`	`setEosId(int value)`	</s>
`SentencepieceModel.TrainerSpec.Builder`	`setEosPiece(java.lang.String value)`	`optional string eos_piece = 47 [default = "</s>"];`
`SentencepieceModel.TrainerSpec.Builder`	`setEosPieceBytes(com.google.protobuf.ByteString value)`	`optional string eos_piece = 47 [default = "</s>"];`
`<Type> SentencepieceModel.TrainerSpec.Builder`	`setExtension(com.google.protobuf.GeneratedMessage.GeneratedExtension<SentencepieceModel.TrainerSpec,java.util.List<Type>> extension, int index, Type value)`
`<Type> SentencepieceModel.TrainerSpec.Builder`	`setExtension(com.google.protobuf.GeneratedMessage.GeneratedExtension<SentencepieceModel.TrainerSpec,Type> extension, Type value)`
`SentencepieceModel.TrainerSpec.Builder`	`setField(com.google.protobuf.Descriptors.FieldDescriptor field, java.lang.Object value)`
`SentencepieceModel.TrainerSpec.Builder`	`setHardVocabLimit(boolean value)`	`vocab_size` is treated as hard limit.
`SentencepieceModel.TrainerSpec.Builder`	`setInput(int index, java.lang.String value)`	///////////////////////////////////////////////////////////////// General parameters Input corpus files.
`SentencepieceModel.TrainerSpec.Builder`	`setInputFormat(java.lang.String value)`	Input corpus format: "text": one-sentence-per-line text format (default) "tsv": sentence <tab> freq
`SentencepieceModel.TrainerSpec.Builder`	`setInputFormatBytes(com.google.protobuf.ByteString value)`	Input corpus format: "text": one-sentence-per-line text format (default) "tsv": sentence <tab> freq
`SentencepieceModel.TrainerSpec.Builder`	`setInputSentenceSize(long value)`	Maximum size of sentences the trainer loads from `input` parameter.
`SentencepieceModel.TrainerSpec.Builder`	`setMaxSentenceLength(int value)`	The maximum sentence length in byte.
`SentencepieceModel.TrainerSpec.Builder`	`setMaxSentencepieceLength(int value)`	///////////////////////////////////////////////////////////////// SentencePiece parameters which control the shapes of sentence piece.
`SentencepieceModel.TrainerSpec.Builder`	`setMiningSentenceSize(int value)`	Deprecated.
`SentencepieceModel.TrainerSpec.Builder`	`setModelPrefix(java.lang.String value)`	Output model file prefix.
`SentencepieceModel.TrainerSpec.Builder`	`setModelPrefixBytes(com.google.protobuf.ByteString value)`	Output model file prefix.
`SentencepieceModel.TrainerSpec.Builder`	`setModelType(SentencepieceModel.TrainerSpec.ModelType value)`	`optional .sentencepiece.TrainerSpec.ModelType model_type = 3 [default = UNIGRAM];`
`SentencepieceModel.TrainerSpec.Builder`	`setNumSubIterations(int value)`	Number of EM sub iterations.
`SentencepieceModel.TrainerSpec.Builder`	`setNumThreads(int value)`	Number of threads in the training.
`SentencepieceModel.TrainerSpec.Builder`	`setPadId(int value)`	<pad> (padding)
`SentencepieceModel.TrainerSpec.Builder`	`setPadPiece(java.lang.String value)`	`optional string pad_piece = 48 [default = "<pad>"];`
`SentencepieceModel.TrainerSpec.Builder`	`setPadPieceBytes(com.google.protobuf.ByteString value)`	`optional string pad_piece = 48 [default = "<pad>"];`
`SentencepieceModel.TrainerSpec.Builder`	`setRepeatedField(com.google.protobuf.Descriptors.FieldDescriptor field, int index, java.lang.Object value)`
`SentencepieceModel.TrainerSpec.Builder`	`setRequiredChars(java.lang.String value)`	Defines required characters.
`SentencepieceModel.TrainerSpec.Builder`	`setRequiredCharsBytes(com.google.protobuf.ByteString value)`	Defines required characters.
`SentencepieceModel.TrainerSpec.Builder`	`setSeedSentencepieceSize(int value)`	The size of seed sentencepieces.
`SentencepieceModel.TrainerSpec.Builder`	`setSelfTestSampleSize(int value)`	Size of self-test samples, which are encoded in the model file.
`SentencepieceModel.TrainerSpec.Builder`	`setShrinkingFactor(float value)`	In every EM sub-iterations, keeps top `shrinking_factor` * `current sentencepieces size` with respect to the loss of the sentence piece.
`SentencepieceModel.TrainerSpec.Builder`	`setShuffleInputSentence(boolean value)`	`optional bool shuffle_input_sentence = 19 [default = true];`
`SentencepieceModel.TrainerSpec.Builder`	`setSplitByNumber(boolean value)`	When `split_by_number` is true, put a boundary between number and non-number transition.
`SentencepieceModel.TrainerSpec.Builder`	`setSplitByUnicodeScript(boolean value)`	Uses Unicode script to split sentence pieces.
`SentencepieceModel.TrainerSpec.Builder`	`setSplitByWhitespace(boolean value)`	Use a white space to split sentence pieces.
`SentencepieceModel.TrainerSpec.Builder`	`setSplitDigits(boolean value)`	Split all digits (0-9) into separate pieces.
`SentencepieceModel.TrainerSpec.Builder`	`setTrainExtremelyLargeCorpus(boolean value)`	Increase bit depth to allow unigram model training on large (>10M sentences) corpora.
`SentencepieceModel.TrainerSpec.Builder`	`setTrainingSentenceSize(int value)`	Deprecated.
`SentencepieceModel.TrainerSpec.Builder`	`setTreatWhitespaceAsSuffix(boolean value)`	Adds whitespace symbol (_) as a suffix instead of prefix.
`SentencepieceModel.TrainerSpec.Builder`	`setUnkId(int value)`	///////////////////////////////////////////////////////////////// Reserved special meta tokens.
`SentencepieceModel.TrainerSpec.Builder`	`setUnknownFields(com.google.protobuf.UnknownFieldSet unknownFields)`
`SentencepieceModel.TrainerSpec.Builder`	`setUnkPiece(java.lang.String value)`	`optional string unk_piece = 45 [default = "<unk>"];`
`SentencepieceModel.TrainerSpec.Builder`	`setUnkPieceBytes(com.google.protobuf.ByteString value)`	`optional string unk_piece = 45 [default = "<unk>"];`
`SentencepieceModel.TrainerSpec.Builder`	`setUnkSurface(java.lang.String value)`	Encodes <unk> into U+2047 (DOUBLE QUESTION MARK), since this character can be useful both for user and developer.
`SentencepieceModel.TrainerSpec.Builder`	`setUnkSurfaceBytes(com.google.protobuf.ByteString value)`	Encodes <unk> into U+2047 (DOUBLE QUESTION MARK), since this character can be useful both for user and developer.
`SentencepieceModel.TrainerSpec.Builder`	`setUseAllVocab(boolean value)`	use all symbols for vocab extraction.
`SentencepieceModel.TrainerSpec.Builder`	`setUserDefinedSymbols(int index, java.lang.String value)`	Defines user defined symbols.
`SentencepieceModel.TrainerSpec.Builder`	`setVocabSize(int value)`	Vocabulary size.
`SentencepieceModel.TrainerSpec.Builder`	`setVocabularyOutputPieceScore(boolean value)`	When creating the vocabulary file, defines whether or not to additionally output the score for each piece.

Methods inherited from class com.google.protobuf.GeneratedMessageV3.ExtendableBuilder
addExtension, addExtension, clearExtension, clearExtension, extensionsAreInitialized, getAllFields, getExtension, getExtension, getExtension, getExtension, getExtension, getExtension, getExtensionCount, getExtensionCount, getExtensionCount, getField, getFieldBuilder, getRepeatedField, getRepeatedFieldBuilder, getRepeatedFieldCount, hasExtension, hasExtension, hasExtension, hasField, mergeExtensionFields, newBuilderForField, setExtension, setExtension, setExtension, setExtension

Methods inherited from class com.google.protobuf.GeneratedMessageV3.Builder
getOneofFieldDescriptor, getParentForChildren, getUnknownFields, hasOneof, internalGetMapField, internalGetMutableMapField, isClean, markClean, onBuilt, onChanged, setUnknownFieldsProto3

Methods inherited from class com.google.protobuf.AbstractMessage.Builder
findInitializationErrors, getInitializationErrorString, internalMergeFrom, mergeDelimitedFrom, mergeDelimitedFrom, mergeFrom, mergeFrom, mergeFrom, mergeFrom, mergeFrom, mergeFrom, mergeFrom, mergeFrom, mergeFrom, newUninitializedMessageException, toString

Methods inherited from class com.google.protobuf.AbstractMessageLite.Builder
addAll, addAll, mergeFrom, newUninitializedMessageException

Methods inherited from class java.lang.Object
equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait

Methods inherited from interface com.google.protobuf.GeneratedMessageV3.ExtendableMessageOrBuilder
getExtension, getExtension, getExtension, getExtension, getExtension, getExtension, getExtensionCount, getExtensionCount, getExtensionCount, hasExtension, hasExtension, hasExtension

Methods inherited from interface com.google.protobuf.MessageLite.Builder
mergeFrom

Methods inherited from interface com.google.protobuf.MessageOrBuilder
findInitializationErrors, getAllFields, getField, getInitializationErrorString, getOneofFieldDescriptor, getRepeatedField, getRepeatedFieldCount, getUnknownFields, hasField, hasOneof

Method Detail

getDescriptor

public static final com.google.protobuf.Descriptors.Descriptor getDescriptor()

internalGetFieldAccessorTable
```
protected com.google.protobuf.GeneratedMessageV3.FieldAccessorTable internalGetFieldAccessorTable()
```
Specified by:

internalGetFieldAccessorTable in class com.google.protobuf.GeneratedMessageV3.Builder<SentencepieceModel.TrainerSpec.Builder>

clear
```
public SentencepieceModel.TrainerSpec.Builder clear()
```
Specified by:

clear in interface com.google.protobuf.Message.Builder

Specified by:

clear in interface com.google.protobuf.MessageLite.Builder

Overrides:

clear in class com.google.protobuf.GeneratedMessageV3.ExtendableBuilder<SentencepieceModel.TrainerSpec,SentencepieceModel.TrainerSpec.Builder>

getDescriptorForType
```
public com.google.protobuf.Descriptors.Descriptor getDescriptorForType()
```
Specified by:

getDescriptorForType in interface com.google.protobuf.Message.Builder

Specified by:

getDescriptorForType in interface com.google.protobuf.MessageOrBuilder

Overrides:

getDescriptorForType in class com.google.protobuf.GeneratedMessageV3.Builder<SentencepieceModel.TrainerSpec.Builder>

getDefaultInstanceForType
```
public SentencepieceModel.TrainerSpec getDefaultInstanceForType()
```
Specified by:

getDefaultInstanceForType in interface com.google.protobuf.GeneratedMessageV3.ExtendableMessageOrBuilder<SentencepieceModel.TrainerSpec>

Specified by:

getDefaultInstanceForType in interface com.google.protobuf.MessageLiteOrBuilder

Specified by:

getDefaultInstanceForType in interface com.google.protobuf.MessageOrBuilder

build
```
public SentencepieceModel.TrainerSpec build()
```
Specified by:

build in interface com.google.protobuf.Message.Builder

Specified by:

build in interface com.google.protobuf.MessageLite.Builder

buildPartial
```
public SentencepieceModel.TrainerSpec buildPartial()
```
Specified by:

buildPartial in interface com.google.protobuf.Message.Builder

Specified by:

buildPartial in interface com.google.protobuf.MessageLite.Builder

clone
```
public SentencepieceModel.TrainerSpec.Builder clone()
```
Specified by:

clone in interface com.google.protobuf.Message.Builder

Specified by:

clone in interface com.google.protobuf.MessageLite.Builder

Overrides:

clone in class com.google.protobuf.GeneratedMessageV3.Builder<SentencepieceModel.TrainerSpec.Builder>

setField
```
public SentencepieceModel.TrainerSpec.Builder setField(com.google.protobuf.Descriptors.FieldDescriptor field,
                                                       java.lang.Object value)
```
Specified by:

setField in interface com.google.protobuf.Message.Builder

Overrides:

setField in class com.google.protobuf.GeneratedMessageV3.ExtendableBuilder<SentencepieceModel.TrainerSpec,SentencepieceModel.TrainerSpec.Builder>

clearField
```
public SentencepieceModel.TrainerSpec.Builder clearField(com.google.protobuf.Descriptors.FieldDescriptor field)
```
Specified by:

clearField in interface com.google.protobuf.Message.Builder

Overrides:

clearField in class com.google.protobuf.GeneratedMessageV3.ExtendableBuilder<SentencepieceModel.TrainerSpec,SentencepieceModel.TrainerSpec.Builder>

clearOneof
```
public SentencepieceModel.TrainerSpec.Builder clearOneof(com.google.protobuf.Descriptors.OneofDescriptor oneof)
```
Specified by:

clearOneof in interface com.google.protobuf.Message.Builder

Overrides:

clearOneof in class com.google.protobuf.GeneratedMessageV3.Builder<SentencepieceModel.TrainerSpec.Builder>

setRepeatedField

public SentencepieceModel.TrainerSpec.Builder setRepeatedField(com.google.protobuf.Descriptors.FieldDescriptor field,
                                                               int index,
                                                               java.lang.Object value)

Specified by:: setRepeatedField in interface com.google.protobuf.Message.Builder
Overrides:: setRepeatedField in class com.google.protobuf.GeneratedMessageV3.ExtendableBuilder<SentencepieceModel.TrainerSpec,SentencepieceModel.TrainerSpec.Builder>

addRepeatedField
```
public SentencepieceModel.TrainerSpec.Builder addRepeatedField(com.google.protobuf.Descriptors.FieldDescriptor field,
                                                               java.lang.Object value)
```
Specified by:

addRepeatedField in interface com.google.protobuf.Message.Builder

Overrides:

addRepeatedField in class com.google.protobuf.GeneratedMessageV3.ExtendableBuilder<SentencepieceModel.TrainerSpec,SentencepieceModel.TrainerSpec.Builder>

setExtension

public <Type> SentencepieceModel.TrainerSpec.Builder setExtension(com.google.protobuf.GeneratedMessage.GeneratedExtension<SentencepieceModel.TrainerSpec,Type> extension,
                                                                  Type value)

Overrides:: setExtension in class com.google.protobuf.GeneratedMessageV3.ExtendableBuilder<SentencepieceModel.TrainerSpec,SentencepieceModel.TrainerSpec.Builder>

setExtension

public <Type> SentencepieceModel.TrainerSpec.Builder setExtension(com.google.protobuf.GeneratedMessage.GeneratedExtension<SentencepieceModel.TrainerSpec,java.util.List<Type>> extension,
                                                                  int index,
                                                                  Type value)

Overrides:: setExtension in class com.google.protobuf.GeneratedMessageV3.ExtendableBuilder<SentencepieceModel.TrainerSpec,SentencepieceModel.TrainerSpec.Builder>

addExtension

public <Type> SentencepieceModel.TrainerSpec.Builder addExtension(com.google.protobuf.GeneratedMessage.GeneratedExtension<SentencepieceModel.TrainerSpec,java.util.List<Type>> extension,
                                                                  Type value)

Overrides:: addExtension in class com.google.protobuf.GeneratedMessageV3.ExtendableBuilder<SentencepieceModel.TrainerSpec,SentencepieceModel.TrainerSpec.Builder>

clearExtension

public <Type> SentencepieceModel.TrainerSpec.Builder clearExtension(com.google.protobuf.GeneratedMessage.GeneratedExtension<SentencepieceModel.TrainerSpec,?> extension)

Overrides:: clearExtension in class com.google.protobuf.GeneratedMessageV3.ExtendableBuilder<SentencepieceModel.TrainerSpec,SentencepieceModel.TrainerSpec.Builder>

mergeFrom
```
public SentencepieceModel.TrainerSpec.Builder mergeFrom(com.google.protobuf.Message other)
```
Specified by:

mergeFrom in interface com.google.protobuf.Message.Builder

Overrides:

mergeFrom in class com.google.protobuf.AbstractMessage.Builder<SentencepieceModel.TrainerSpec.Builder>

mergeFrom

public SentencepieceModel.TrainerSpec.Builder mergeFrom(SentencepieceModel.TrainerSpec other)

isInitialized
```
public final boolean isInitialized()
```
Specified by:

isInitialized in interface com.google.protobuf.MessageLiteOrBuilder

Overrides:

isInitialized in class com.google.protobuf.GeneratedMessageV3.ExtendableBuilder<SentencepieceModel.TrainerSpec,SentencepieceModel.TrainerSpec.Builder>

mergeFrom

public SentencepieceModel.TrainerSpec.Builder mergeFrom(com.google.protobuf.CodedInputStream input,
                                                        com.google.protobuf.ExtensionRegistryLite extensionRegistry)
                                                 throws java.io.IOException

Specified by:: mergeFrom in interface com.google.protobuf.Message.Builder
Specified by:: mergeFrom in interface com.google.protobuf.MessageLite.Builder
Overrides:: mergeFrom in class com.google.protobuf.AbstractMessage.Builder<SentencepieceModel.TrainerSpec.Builder>
Throws:: java.io.IOException

getInputList

public com.google.protobuf.ProtocolStringList getInputList()

/////////////////////////////////////////////////////////////////
 General parameters
 Input corpus files.
  Trainer accepts the following two formats:
  A) Monolingual: plain text, one sentence per line.
  B) Bilingual:   TSV, source sentence <tab> target sentence
  When bilingual data is passed, shared vocabulary model is built.
  Note that the input file must be raw corpus, not a preprocessed corpus.
  Trainer only loads the first `input_sentence_size` sentences specified
  with this parameter.

repeated string input = 1;

Specified by:: getInputList in interface SentencepieceModel.TrainerSpecOrBuilder
Returns:: A list containing the input.

getInputCount

public int getInputCount()

/////////////////////////////////////////////////////////////////
 General parameters
 Input corpus files.
  Trainer accepts the following two formats:
  A) Monolingual: plain text, one sentence per line.
  B) Bilingual:   TSV, source sentence <tab> target sentence
  When bilingual data is passed, shared vocabulary model is built.
  Note that the input file must be raw corpus, not a preprocessed corpus.
  Trainer only loads the first `input_sentence_size` sentences specified
  with this parameter.

repeated string input = 1;

Specified by:: getInputCount in interface SentencepieceModel.TrainerSpecOrBuilder
Returns:: The count of input.

getInput

public java.lang.String getInput(int index)

/////////////////////////////////////////////////////////////////
 General parameters
 Input corpus files.
  Trainer accepts the following two formats:
  A) Monolingual: plain text, one sentence per line.
  B) Bilingual:   TSV, source sentence <tab> target sentence
  When bilingual data is passed, shared vocabulary model is built.
  Note that the input file must be raw corpus, not a preprocessed corpus.
  Trainer only loads the first `input_sentence_size` sentences specified
  with this parameter.

repeated string input = 1;

Specified by:: getInput in interface SentencepieceModel.TrainerSpecOrBuilder
Parameters:: index - The index of the element to return.
Returns:: The input at the given index.

getInputBytes

public com.google.protobuf.ByteString getInputBytes(int index)

/////////////////////////////////////////////////////////////////
 General parameters
 Input corpus files.
  Trainer accepts the following two formats:
  A) Monolingual: plain text, one sentence per line.
  B) Bilingual:   TSV, source sentence <tab> target sentence
  When bilingual data is passed, shared vocabulary model is built.
  Note that the input file must be raw corpus, not a preprocessed corpus.
  Trainer only loads the first `input_sentence_size` sentences specified
  with this parameter.

repeated string input = 1;

Specified by:: getInputBytes in interface SentencepieceModel.TrainerSpecOrBuilder
Parameters:: index - The index of the value to return.
Returns:: The bytes of the input at the given index.

setInput

public SentencepieceModel.TrainerSpec.Builder setInput(int index,
                                                       java.lang.String value)

/////////////////////////////////////////////////////////////////
 General parameters
 Input corpus files.
  Trainer accepts the following two formats:
  A) Monolingual: plain text, one sentence per line.
  B) Bilingual:   TSV, source sentence <tab> target sentence
  When bilingual data is passed, shared vocabulary model is built.
  Note that the input file must be raw corpus, not a preprocessed corpus.
  Trainer only loads the first `input_sentence_size` sentences specified
  with this parameter.

repeated string input = 1;

Parameters:: index - The index to set the value at.; value - The input to set.
Returns:: This builder for chaining.

addInput

public SentencepieceModel.TrainerSpec.Builder addInput(java.lang.String value)

/////////////////////////////////////////////////////////////////
 General parameters
 Input corpus files.
  Trainer accepts the following two formats:
  A) Monolingual: plain text, one sentence per line.
  B) Bilingual:   TSV, source sentence <tab> target sentence
  When bilingual data is passed, shared vocabulary model is built.
  Note that the input file must be raw corpus, not a preprocessed corpus.
  Trainer only loads the first `input_sentence_size` sentences specified
  with this parameter.

repeated string input = 1;

Parameters:: value - The input to add.
Returns:: This builder for chaining.

addAllInput

public SentencepieceModel.TrainerSpec.Builder addAllInput(java.lang.Iterable<java.lang.String> values)

/////////////////////////////////////////////////////////////////
 General parameters
 Input corpus files.
  Trainer accepts the following two formats:
  A) Monolingual: plain text, one sentence per line.
  B) Bilingual:   TSV, source sentence <tab> target sentence
  When bilingual data is passed, shared vocabulary model is built.
  Note that the input file must be raw corpus, not a preprocessed corpus.
  Trainer only loads the first `input_sentence_size` sentences specified
  with this parameter.

repeated string input = 1;

Parameters:: values - The input to add.
Returns:: This builder for chaining.

clearInput

public SentencepieceModel.TrainerSpec.Builder clearInput()

/////////////////////////////////////////////////////////////////
 General parameters
 Input corpus files.
  Trainer accepts the following two formats:
  A) Monolingual: plain text, one sentence per line.
  B) Bilingual:   TSV, source sentence <tab> target sentence
  When bilingual data is passed, shared vocabulary model is built.
  Note that the input file must be raw corpus, not a preprocessed corpus.
  Trainer only loads the first `input_sentence_size` sentences specified
  with this parameter.

repeated string input = 1;

Returns:: This builder for chaining.

addInputBytes

public SentencepieceModel.TrainerSpec.Builder addInputBytes(com.google.protobuf.ByteString value)

/////////////////////////////////////////////////////////////////
 General parameters
 Input corpus files.
  Trainer accepts the following two formats:
  A) Monolingual: plain text, one sentence per line.
  B) Bilingual:   TSV, source sentence <tab> target sentence
  When bilingual data is passed, shared vocabulary model is built.
  Note that the input file must be raw corpus, not a preprocessed corpus.
  Trainer only loads the first `input_sentence_size` sentences specified
  with this parameter.

repeated string input = 1;

Parameters:: value - The bytes of the input to add.
Returns:: This builder for chaining.

hasInputFormat
```
public boolean hasInputFormat()
```
```
 Input corpus format:
 "text": one-sentence-per-line text format (default)
 "tsv":  sentence <tab> freq
 
```
optional string input_format = 7;
Specified by:

hasInputFormat in interface SentencepieceModel.TrainerSpecOrBuilder

Returns:

Whether the inputFormat field is set.

getInputFormat

public java.lang.String getInputFormat()

 Input corpus format:
 "text": one-sentence-per-line text format (default)
 "tsv":  sentence <tab> freq

optional string input_format = 7;

Specified by:: getInputFormat in interface SentencepieceModel.TrainerSpecOrBuilder
Returns:: The inputFormat.

getInputFormatBytes

public com.google.protobuf.ByteString getInputFormatBytes()

 Input corpus format:
 "text": one-sentence-per-line text format (default)
 "tsv":  sentence <tab> freq

optional string input_format = 7;

Specified by:: getInputFormatBytes in interface SentencepieceModel.TrainerSpecOrBuilder
Returns:: The bytes for inputFormat.

setInputFormat

public SentencepieceModel.TrainerSpec.Builder setInputFormat(java.lang.String value)

 Input corpus format:
 "text": one-sentence-per-line text format (default)
 "tsv":  sentence <tab> freq

optional string input_format = 7;

Parameters:: value - The inputFormat to set.
Returns:: This builder for chaining.

clearInputFormat

public SentencepieceModel.TrainerSpec.Builder clearInputFormat()

 Input corpus format:
 "text": one-sentence-per-line text format (default)
 "tsv":  sentence <tab> freq

optional string input_format = 7;

Returns:: This builder for chaining.

setInputFormatBytes

public SentencepieceModel.TrainerSpec.Builder setInputFormatBytes(com.google.protobuf.ByteString value)

 Input corpus format:
 "text": one-sentence-per-line text format (default)
 "tsv":  sentence <tab> freq

optional string input_format = 7;

Parameters:: value - The bytes for inputFormat to set.
Returns:: This builder for chaining.

hasModelPrefix
```
public boolean hasModelPrefix()
```
```
 Output model file prefix.
 <model_prefix>.model and <model_prefix>.vocab are generated.
 
```
optional string model_prefix = 2;
Specified by:

hasModelPrefix in interface SentencepieceModel.TrainerSpecOrBuilder

Returns:

Whether the modelPrefix field is set.

getModelPrefix
```
public java.lang.String getModelPrefix()
```
```
 Output model file prefix.
 <model_prefix>.model and <model_prefix>.vocab are generated.
 
```
optional string model_prefix = 2;
Specified by:

getModelPrefix in interface SentencepieceModel.TrainerSpecOrBuilder

Returns:

The modelPrefix.

getModelPrefixBytes
```
public com.google.protobuf.ByteString getModelPrefixBytes()
```
```
 Output model file prefix.
 <model_prefix>.model and <model_prefix>.vocab are generated.
 
```
optional string model_prefix = 2;
Specified by:

getModelPrefixBytes in interface SentencepieceModel.TrainerSpecOrBuilder

Returns:

The bytes for modelPrefix.

setModelPrefix

public SentencepieceModel.TrainerSpec.Builder setModelPrefix(java.lang.String value)

 Output model file prefix.
 <model_prefix>.model and <model_prefix>.vocab are generated.

optional string model_prefix = 2;

Parameters:: value - The modelPrefix to set.
Returns:: This builder for chaining.

clearModelPrefix

public SentencepieceModel.TrainerSpec.Builder clearModelPrefix()

 Output model file prefix.
 <model_prefix>.model and <model_prefix>.vocab are generated.

optional string model_prefix = 2;

Returns:: This builder for chaining.

setModelPrefixBytes

public SentencepieceModel.TrainerSpec.Builder setModelPrefixBytes(com.google.protobuf.ByteString value)

 Output model file prefix.
 <model_prefix>.model and <model_prefix>.vocab are generated.

optional string model_prefix = 2;

Parameters:: value - The bytes for modelPrefix to set.
Returns:: This builder for chaining.

hasModelType
```
public boolean hasModelType()
```
optional .sentencepiece.TrainerSpec.ModelType model_type = 3 [default = UNIGRAM];

Specified by:

hasModelType in interface SentencepieceModel.TrainerSpecOrBuilder

Returns:

Whether the modelType field is set.

getModelType
```
public SentencepieceModel.TrainerSpec.ModelType getModelType()
```
optional .sentencepiece.TrainerSpec.ModelType model_type = 3 [default = UNIGRAM];

Specified by:

getModelType in interface SentencepieceModel.TrainerSpecOrBuilder

Returns:

The modelType.

setModelType
```
public SentencepieceModel.TrainerSpec.Builder setModelType(SentencepieceModel.TrainerSpec.ModelType value)
```
optional .sentencepiece.TrainerSpec.ModelType model_type = 3 [default = UNIGRAM];

Parameters:

value - The modelType to set.

Returns:

This builder for chaining.

clearModelType
```
public SentencepieceModel.TrainerSpec.Builder clearModelType()
```
optional .sentencepiece.TrainerSpec.ModelType model_type = 3 [default = UNIGRAM];

Returns:

This builder for chaining.

hasVocabSize
```
public boolean hasVocabSize()
```
```
 Vocabulary size. 8k is the default size.
 
```
optional int32 vocab_size = 4 [default = 8000];
Specified by:

hasVocabSize in interface SentencepieceModel.TrainerSpecOrBuilder

Returns:

Whether the vocabSize field is set.

getVocabSize
```
public int getVocabSize()
```
```
 Vocabulary size. 8k is the default size.
 
```
optional int32 vocab_size = 4 [default = 8000];
Specified by:

getVocabSize in interface SentencepieceModel.TrainerSpecOrBuilder

Returns:

The vocabSize.

setVocabSize
```
public SentencepieceModel.TrainerSpec.Builder setVocabSize(int value)
```
```
 Vocabulary size. 8k is the default size.
 
```
optional int32 vocab_size = 4 [default = 8000];
Parameters:

value - The vocabSize to set.

Returns:

This builder for chaining.

clearVocabSize

public SentencepieceModel.TrainerSpec.Builder clearVocabSize()

 Vocabulary size. 8k is the default size.

optional int32 vocab_size = 4 [default = 8000];

Returns:: This builder for chaining.

getAcceptLanguageList

public com.google.protobuf.ProtocolStringList getAcceptLanguageList()

 List of the languages this model can accept.
 Since the model is language-agnostic, this field is used as a reference.

repeated string accept_language = 5;

Specified by:: getAcceptLanguageList in interface SentencepieceModel.TrainerSpecOrBuilder
Returns:: A list containing the acceptLanguage.

getAcceptLanguageCount
```
public int getAcceptLanguageCount()
```
```
 List of the languages this model can accept.
 Since the model is language-agnostic, this field is used as a reference.
 
```
repeated string accept_language = 5;
Specified by:

getAcceptLanguageCount in interface SentencepieceModel.TrainerSpecOrBuilder

Returns:

The count of acceptLanguage.

getAcceptLanguage
```
public java.lang.String getAcceptLanguage(int index)
```
```
 List of the languages this model can accept.
 Since the model is language-agnostic, this field is used as a reference.
 
```
repeated string accept_language = 5;
Specified by:

getAcceptLanguage in interface SentencepieceModel.TrainerSpecOrBuilder

Parameters:

index - The index of the element to return.

Returns:

The acceptLanguage at the given index.

getAcceptLanguageBytes
```
public com.google.protobuf.ByteString getAcceptLanguageBytes(int index)
```
```
 List of the languages this model can accept.
 Since the model is language-agnostic, this field is used as a reference.
 
```
repeated string accept_language = 5;
Specified by:

getAcceptLanguageBytes in interface SentencepieceModel.TrainerSpecOrBuilder

Parameters:

index - The index of the value to return.

Returns:

The bytes of the acceptLanguage at the given index.

setAcceptLanguage

public SentencepieceModel.TrainerSpec.Builder setAcceptLanguage(int index,
                                                                java.lang.String value)

 List of the languages this model can accept.
 Since the model is language-agnostic, this field is used as a reference.

repeated string accept_language = 5;

Parameters:: index - The index to set the value at.; value - The acceptLanguage to set.
Returns:: This builder for chaining.

addAcceptLanguage

public SentencepieceModel.TrainerSpec.Builder addAcceptLanguage(java.lang.String value)

 List of the languages this model can accept.
 Since the model is language-agnostic, this field is used as a reference.

repeated string accept_language = 5;

Parameters:: value - The acceptLanguage to add.
Returns:: This builder for chaining.

addAllAcceptLanguage

public SentencepieceModel.TrainerSpec.Builder addAllAcceptLanguage(java.lang.Iterable<java.lang.String> values)

 List of the languages this model can accept.
 Since the model is language-agnostic, this field is used as a reference.

repeated string accept_language = 5;

Parameters:: values - The acceptLanguage to add.
Returns:: This builder for chaining.

clearAcceptLanguage

public SentencepieceModel.TrainerSpec.Builder clearAcceptLanguage()

 List of the languages this model can accept.
 Since the model is language-agnostic, this field is used as a reference.

repeated string accept_language = 5;

Returns:: This builder for chaining.

addAcceptLanguageBytes

public SentencepieceModel.TrainerSpec.Builder addAcceptLanguageBytes(com.google.protobuf.ByteString value)

 List of the languages this model can accept.
 Since the model is language-agnostic, this field is used as a reference.

repeated string accept_language = 5;

Parameters:: value - The bytes of the acceptLanguage to add.
Returns:: This builder for chaining.

hasSelfTestSampleSize
```
public boolean hasSelfTestSampleSize()
```
```
 Size of self-test samples, which are encoded in the model file.
 
```
optional int32 self_test_sample_size = 6 [default = 0];
Specified by:

hasSelfTestSampleSize in interface SentencepieceModel.TrainerSpecOrBuilder

Returns:

Whether the selfTestSampleSize field is set.

getSelfTestSampleSize
```
public int getSelfTestSampleSize()
```
```
 Size of self-test samples, which are encoded in the model file.
 
```
optional int32 self_test_sample_size = 6 [default = 0];
Specified by:

getSelfTestSampleSize in interface SentencepieceModel.TrainerSpecOrBuilder

Returns:

The selfTestSampleSize.

setSelfTestSampleSize
```
public SentencepieceModel.TrainerSpec.Builder setSelfTestSampleSize(int value)
```
```
 Size of self-test samples, which are encoded in the model file.
 
```
optional int32 self_test_sample_size = 6 [default = 0];
Parameters:

value - The selfTestSampleSize to set.

Returns:

This builder for chaining.

clearSelfTestSampleSize

public SentencepieceModel.TrainerSpec.Builder clearSelfTestSampleSize()

 Size of self-test samples, which are encoded in the model file.

optional int32 self_test_sample_size = 6 [default = 0];

Returns:: This builder for chaining.

hasCharacterCoverage

public boolean hasCharacterCoverage()

/////////////////////////////////////////////////////////////////
 Training parameters.
 Uses characters which cover the corpus with the ratio of `chars_coverage`.
 This parameter determines the set of basic Alphabet of sentence piece.
 1.0 - `chars_coverage` characters are treated as UNK.
 See also required_chars field.

optional float character_coverage = 10 [default = 0.9995];

Specified by:: hasCharacterCoverage in interface SentencepieceModel.TrainerSpecOrBuilder
Returns:: Whether the characterCoverage field is set.

getCharacterCoverage

public float getCharacterCoverage()

/////////////////////////////////////////////////////////////////
 Training parameters.
 Uses characters which cover the corpus with the ratio of `chars_coverage`.
 This parameter determines the set of basic Alphabet of sentence piece.
 1.0 - `chars_coverage` characters are treated as UNK.
 See also required_chars field.

optional float character_coverage = 10 [default = 0.9995];

Specified by:: getCharacterCoverage in interface SentencepieceModel.TrainerSpecOrBuilder
Returns:: The characterCoverage.

setCharacterCoverage

public SentencepieceModel.TrainerSpec.Builder setCharacterCoverage(float value)

/////////////////////////////////////////////////////////////////
 Training parameters.
 Uses characters which cover the corpus with the ratio of `chars_coverage`.
 This parameter determines the set of basic Alphabet of sentence piece.
 1.0 - `chars_coverage` characters are treated as UNK.
 See also required_chars field.

optional float character_coverage = 10 [default = 0.9995];

Parameters:: value - The characterCoverage to set.
Returns:: This builder for chaining.

clearCharacterCoverage

public SentencepieceModel.TrainerSpec.Builder clearCharacterCoverage()

/////////////////////////////////////////////////////////////////
 Training parameters.
 Uses characters which cover the corpus with the ratio of `chars_coverage`.
 This parameter determines the set of basic Alphabet of sentence piece.
 1.0 - `chars_coverage` characters are treated as UNK.
 See also required_chars field.

optional float character_coverage = 10 [default = 0.9995];

Returns:: This builder for chaining.

hasInputSentenceSize

public boolean hasInputSentenceSize()

 Maximum size of sentences the trainer loads from `input` parameter.
 Trainer simply loads the `input` files in sequence.
 It is better to shuffle the input corpus randomly.

optional uint64 input_sentence_size = 11 [default = 0];

Specified by:: hasInputSentenceSize in interface SentencepieceModel.TrainerSpecOrBuilder
Returns:: Whether the inputSentenceSize field is set.

getInputSentenceSize

public long getInputSentenceSize()

 Maximum size of sentences the trainer loads from `input` parameter.
 Trainer simply loads the `input` files in sequence.
 It is better to shuffle the input corpus randomly.

optional uint64 input_sentence_size = 11 [default = 0];

Specified by:: getInputSentenceSize in interface SentencepieceModel.TrainerSpecOrBuilder
Returns:: The inputSentenceSize.

setInputSentenceSize

public SentencepieceModel.TrainerSpec.Builder setInputSentenceSize(long value)

 Maximum size of sentences the trainer loads from `input` parameter.
 Trainer simply loads the `input` files in sequence.
 It is better to shuffle the input corpus randomly.

optional uint64 input_sentence_size = 11 [default = 0];

Parameters:: value - The inputSentenceSize to set.
Returns:: This builder for chaining.

clearInputSentenceSize

public SentencepieceModel.TrainerSpec.Builder clearInputSentenceSize()

 Maximum size of sentences the trainer loads from `input` parameter.
 Trainer simply loads the `input` files in sequence.
 It is better to shuffle the input corpus randomly.

optional uint64 input_sentence_size = 11 [default = 0];

Returns:: This builder for chaining.

hasShuffleInputSentence
```
public boolean hasShuffleInputSentence()
```
optional bool shuffle_input_sentence = 19 [default = true];

Specified by:

hasShuffleInputSentence in interface SentencepieceModel.TrainerSpecOrBuilder

Returns:

Whether the shuffleInputSentence field is set.

getShuffleInputSentence
```
public boolean getShuffleInputSentence()
```
optional bool shuffle_input_sentence = 19 [default = true];

Specified by:

getShuffleInputSentence in interface SentencepieceModel.TrainerSpecOrBuilder

Returns:

The shuffleInputSentence.

setShuffleInputSentence
```
public SentencepieceModel.TrainerSpec.Builder setShuffleInputSentence(boolean value)
```
optional bool shuffle_input_sentence = 19 [default = true];

Parameters:

value - The shuffleInputSentence to set.

Returns:

This builder for chaining.

clearShuffleInputSentence
```
public SentencepieceModel.TrainerSpec.Builder clearShuffleInputSentence()
```
optional bool shuffle_input_sentence = 19 [default = true];

Returns:

This builder for chaining.

hasMiningSentenceSize

@Deprecated
public boolean hasMiningSentenceSize()

Deprecated.

 Maximum size of sentences to make seed sentence pieces.
 Extended suffix array is constructed to extract frequent
 sub-strings from the corpus. This uses 20N working space,
 where N is the size of corpus.

optional int32 mining_sentence_size = 12 [deprecated = true];

Specified by:: hasMiningSentenceSize in interface SentencepieceModel.TrainerSpecOrBuilder
Returns:: Whether the miningSentenceSize field is set.

getMiningSentenceSize

@Deprecated
public int getMiningSentenceSize()

Deprecated.

 Maximum size of sentences to make seed sentence pieces.
 Extended suffix array is constructed to extract frequent
 sub-strings from the corpus. This uses 20N working space,
 where N is the size of corpus.

optional int32 mining_sentence_size = 12 [deprecated = true];

Specified by:: getMiningSentenceSize in interface SentencepieceModel.TrainerSpecOrBuilder
Returns:: The miningSentenceSize.

setMiningSentenceSize

@Deprecated
public SentencepieceModel.TrainerSpec.Builder setMiningSentenceSize(int value)

Deprecated.

 Maximum size of sentences to make seed sentence pieces.
 Extended suffix array is constructed to extract frequent
 sub-strings from the corpus. This uses 20N working space,
 where N is the size of corpus.

optional int32 mining_sentence_size = 12 [deprecated = true];

Parameters:: value - The miningSentenceSize to set.
Returns:: This builder for chaining.

clearMiningSentenceSize

@Deprecated
public SentencepieceModel.TrainerSpec.Builder clearMiningSentenceSize()

Deprecated.

 Maximum size of sentences to make seed sentence pieces.
 Extended suffix array is constructed to extract frequent
 sub-strings from the corpus. This uses 20N working space,
 where N is the size of corpus.

optional int32 mining_sentence_size = 12 [deprecated = true];

Returns:: This builder for chaining.

hasTrainingSentenceSize
```
@Deprecated
public boolean hasTrainingSentenceSize()
```
Deprecated.
```
 Maximum size of sentences to train sentence pieces.
 
```
optional int32 training_sentence_size = 13 [deprecated = true];
Specified by:

hasTrainingSentenceSize in interface SentencepieceModel.TrainerSpecOrBuilder

Returns:

Whether the trainingSentenceSize field is set.

getTrainingSentenceSize
```
@Deprecated
public int getTrainingSentenceSize()
```
Deprecated.
```
 Maximum size of sentences to train sentence pieces.
 
```
optional int32 training_sentence_size = 13 [deprecated = true];
Specified by:

getTrainingSentenceSize in interface SentencepieceModel.TrainerSpecOrBuilder

Returns:

The trainingSentenceSize.

setTrainingSentenceSize
```
@Deprecated
public SentencepieceModel.TrainerSpec.Builder setTrainingSentenceSize(int value)
```
Deprecated.
```
 Maximum size of sentences to train sentence pieces.
 
```
optional int32 training_sentence_size = 13 [deprecated = true];
Parameters:

value - The trainingSentenceSize to set.

Returns:

This builder for chaining.

clearTrainingSentenceSize

@Deprecated
public SentencepieceModel.TrainerSpec.Builder clearTrainingSentenceSize()

Deprecated.

 Maximum size of sentences to train sentence pieces.

optional int32 training_sentence_size = 13 [deprecated = true];

Returns:: This builder for chaining.

hasSeedSentencepieceSize
```
public boolean hasSeedSentencepieceSize()
```
```
 The size of seed sentencepieces.
 `seed_sentencepiece_size` must be larger than `vocab_size`.
 
```
optional int32 seed_sentencepiece_size = 14 [default = 1000000];
Specified by:

hasSeedSentencepieceSize in interface SentencepieceModel.TrainerSpecOrBuilder

Returns:

Whether the seedSentencepieceSize field is set.

getSeedSentencepieceSize
```
public int getSeedSentencepieceSize()
```
```
 The size of seed sentencepieces.
 `seed_sentencepiece_size` must be larger than `vocab_size`.
 
```
optional int32 seed_sentencepiece_size = 14 [default = 1000000];
Specified by:

getSeedSentencepieceSize in interface SentencepieceModel.TrainerSpecOrBuilder

Returns:

The seedSentencepieceSize.

setSeedSentencepieceSize

public SentencepieceModel.TrainerSpec.Builder setSeedSentencepieceSize(int value)

 The size of seed sentencepieces.
 `seed_sentencepiece_size` must be larger than `vocab_size`.

optional int32 seed_sentencepiece_size = 14 [default = 1000000];

Parameters:: value - The seedSentencepieceSize to set.
Returns:: This builder for chaining.

clearSeedSentencepieceSize

public SentencepieceModel.TrainerSpec.Builder clearSeedSentencepieceSize()

 The size of seed sentencepieces.
 `seed_sentencepiece_size` must be larger than `vocab_size`.

optional int32 seed_sentencepiece_size = 14 [default = 1000000];

Returns:: This builder for chaining.

hasShrinkingFactor

public boolean hasShrinkingFactor()

 In every EM sub-iterations, keeps top
 `shrinking_factor` * `current sentencepieces size` with respect to
 the loss of the sentence piece. This value should be smaller than 1.0.

optional float shrinking_factor = 15 [default = 0.75];

Specified by:: hasShrinkingFactor in interface SentencepieceModel.TrainerSpecOrBuilder
Returns:: Whether the shrinkingFactor field is set.

getShrinkingFactor

public float getShrinkingFactor()

 In every EM sub-iterations, keeps top
 `shrinking_factor` * `current sentencepieces size` with respect to
 the loss of the sentence piece. This value should be smaller than 1.0.

optional float shrinking_factor = 15 [default = 0.75];

Specified by:: getShrinkingFactor in interface SentencepieceModel.TrainerSpecOrBuilder
Returns:: The shrinkingFactor.

setShrinkingFactor

public SentencepieceModel.TrainerSpec.Builder setShrinkingFactor(float value)

 In every EM sub-iterations, keeps top
 `shrinking_factor` * `current sentencepieces size` with respect to
 the loss of the sentence piece. This value should be smaller than 1.0.

optional float shrinking_factor = 15 [default = 0.75];

Parameters:: value - The shrinkingFactor to set.
Returns:: This builder for chaining.

clearShrinkingFactor

public SentencepieceModel.TrainerSpec.Builder clearShrinkingFactor()

 In every EM sub-iterations, keeps top
 `shrinking_factor` * `current sentencepieces size` with respect to
 the loss of the sentence piece. This value should be smaller than 1.0.

optional float shrinking_factor = 15 [default = 0.75];

Returns:: This builder for chaining.

hasMaxSentenceLength

public boolean hasMaxSentenceLength()

 The maximum sentence length in byte. The sentences with the length
 larger than `max_sentence_length` is simply ignored.
 Longer input tends to bring the following risks:
  * Overflow during EM training (unigram language model only)
  * Performance drop because of O(n log n) cost in BPE.

optional int32 max_sentence_length = 18 [default = 4192];

Specified by:: hasMaxSentenceLength in interface SentencepieceModel.TrainerSpecOrBuilder
Returns:: Whether the maxSentenceLength field is set.

getMaxSentenceLength

public int getMaxSentenceLength()

 The maximum sentence length in byte. The sentences with the length
 larger than `max_sentence_length` is simply ignored.
 Longer input tends to bring the following risks:
  * Overflow during EM training (unigram language model only)
  * Performance drop because of O(n log n) cost in BPE.

optional int32 max_sentence_length = 18 [default = 4192];

Specified by:: getMaxSentenceLength in interface SentencepieceModel.TrainerSpecOrBuilder
Returns:: The maxSentenceLength.

setMaxSentenceLength

public SentencepieceModel.TrainerSpec.Builder setMaxSentenceLength(int value)

 The maximum sentence length in byte. The sentences with the length
 larger than `max_sentence_length` is simply ignored.
 Longer input tends to bring the following risks:
  * Overflow during EM training (unigram language model only)
  * Performance drop because of O(n log n) cost in BPE.

optional int32 max_sentence_length = 18 [default = 4192];

Parameters:: value - The maxSentenceLength to set.
Returns:: This builder for chaining.

clearMaxSentenceLength

public SentencepieceModel.TrainerSpec.Builder clearMaxSentenceLength()

 The maximum sentence length in byte. The sentences with the length
 larger than `max_sentence_length` is simply ignored.
 Longer input tends to bring the following risks:
  * Overflow during EM training (unigram language model only)
  * Performance drop because of O(n log n) cost in BPE.

optional int32 max_sentence_length = 18 [default = 4192];

Returns:: This builder for chaining.

hasNumThreads
```
public boolean hasNumThreads()
```
```
 Number of threads in the training.
 
```
optional int32 num_threads = 16 [default = 16];
Specified by:

hasNumThreads in interface SentencepieceModel.TrainerSpecOrBuilder

Returns:

Whether the numThreads field is set.

getNumThreads
```
public int getNumThreads()
```
```
 Number of threads in the training.
 
```
optional int32 num_threads = 16 [default = 16];
Specified by:

getNumThreads in interface SentencepieceModel.TrainerSpecOrBuilder

Returns:

The numThreads.

setNumThreads
```
public SentencepieceModel.TrainerSpec.Builder setNumThreads(int value)
```
```
 Number of threads in the training.
 
```
optional int32 num_threads = 16 [default = 16];
Parameters:

value - The numThreads to set.

Returns:

This builder for chaining.

clearNumThreads

public SentencepieceModel.TrainerSpec.Builder clearNumThreads()

 Number of threads in the training.

optional int32 num_threads = 16 [default = 16];

Returns:: This builder for chaining.

hasNumSubIterations
```
public boolean hasNumSubIterations()
```
```
 Number of EM sub iterations.
 
```
optional int32 num_sub_iterations = 17 [default = 2];
Specified by:

hasNumSubIterations in interface SentencepieceModel.TrainerSpecOrBuilder

Returns:

Whether the numSubIterations field is set.

getNumSubIterations
```
public int getNumSubIterations()
```
```
 Number of EM sub iterations.
 
```
optional int32 num_sub_iterations = 17 [default = 2];
Specified by:

getNumSubIterations in interface SentencepieceModel.TrainerSpecOrBuilder

Returns:

The numSubIterations.

setNumSubIterations
```
public SentencepieceModel.TrainerSpec.Builder setNumSubIterations(int value)
```
```
 Number of EM sub iterations.
 
```
optional int32 num_sub_iterations = 17 [default = 2];
Parameters:

value - The numSubIterations to set.

Returns:

This builder for chaining.

clearNumSubIterations

public SentencepieceModel.TrainerSpec.Builder clearNumSubIterations()

 Number of EM sub iterations.

optional int32 num_sub_iterations = 17 [default = 2];

Returns:: This builder for chaining.

hasMaxSentencepieceLength
```
public boolean hasMaxSentencepieceLength()
```
```
/////////////////////////////////////////////////////////////////
 SentencePiece parameters which control the shapes of sentence piece.
 Maximum length of sentencepiece.
 
```
optional int32 max_sentencepiece_length = 20 [default = 16];
Specified by:

hasMaxSentencepieceLength in interface SentencepieceModel.TrainerSpecOrBuilder

Returns:

Whether the maxSentencepieceLength field is set.

getMaxSentencepieceLength

public int getMaxSentencepieceLength()

/////////////////////////////////////////////////////////////////
 SentencePiece parameters which control the shapes of sentence piece.
 Maximum length of sentencepiece.

optional int32 max_sentencepiece_length = 20 [default = 16];

Specified by:: getMaxSentencepieceLength in interface SentencepieceModel.TrainerSpecOrBuilder
Returns:: The maxSentencepieceLength.

setMaxSentencepieceLength

public SentencepieceModel.TrainerSpec.Builder setMaxSentencepieceLength(int value)

/////////////////////////////////////////////////////////////////
 SentencePiece parameters which control the shapes of sentence piece.
 Maximum length of sentencepiece.

optional int32 max_sentencepiece_length = 20 [default = 16];

Parameters:: value - The maxSentencepieceLength to set.
Returns:: This builder for chaining.

clearMaxSentencepieceLength

public SentencepieceModel.TrainerSpec.Builder clearMaxSentencepieceLength()

/////////////////////////////////////////////////////////////////
 SentencePiece parameters which control the shapes of sentence piece.
 Maximum length of sentencepiece.

optional int32 max_sentencepiece_length = 20 [default = 16];

Returns:: This builder for chaining.

hasSplitByUnicodeScript

public boolean hasSplitByUnicodeScript()

 Uses Unicode script to split sentence pieces.
 When `split_by_unicode_script` is true, we do not allow sentence piece to
 include multiple Unicode scripts, e.g. "F1" is not a valid piece.
 Exception: CJ characters (Hiragana/Katakana/Han) are all handled
 as one script type, since Japanese word can consist of multiple scripts.
 This exception is always applied regardless of the accept-language
 parameter.

optional bool split_by_unicode_script = 21 [default = true];

Specified by:: hasSplitByUnicodeScript in interface SentencepieceModel.TrainerSpecOrBuilder
Returns:: Whether the splitByUnicodeScript field is set.

getSplitByUnicodeScript

public boolean getSplitByUnicodeScript()

 Uses Unicode script to split sentence pieces.
 When `split_by_unicode_script` is true, we do not allow sentence piece to
 include multiple Unicode scripts, e.g. "F1" is not a valid piece.
 Exception: CJ characters (Hiragana/Katakana/Han) are all handled
 as one script type, since Japanese word can consist of multiple scripts.
 This exception is always applied regardless of the accept-language
 parameter.

optional bool split_by_unicode_script = 21 [default = true];

Specified by:: getSplitByUnicodeScript in interface SentencepieceModel.TrainerSpecOrBuilder
Returns:: The splitByUnicodeScript.

setSplitByUnicodeScript

public SentencepieceModel.TrainerSpec.Builder setSplitByUnicodeScript(boolean value)

 Uses Unicode script to split sentence pieces.
 When `split_by_unicode_script` is true, we do not allow sentence piece to
 include multiple Unicode scripts, e.g. "F1" is not a valid piece.
 Exception: CJ characters (Hiragana/Katakana/Han) are all handled
 as one script type, since Japanese word can consist of multiple scripts.
 This exception is always applied regardless of the accept-language
 parameter.

optional bool split_by_unicode_script = 21 [default = true];

Parameters:: value - The splitByUnicodeScript to set.
Returns:: This builder for chaining.

clearSplitByUnicodeScript

public SentencepieceModel.TrainerSpec.Builder clearSplitByUnicodeScript()

 Uses Unicode script to split sentence pieces.
 When `split_by_unicode_script` is true, we do not allow sentence piece to
 include multiple Unicode scripts, e.g. "F1" is not a valid piece.
 Exception: CJ characters (Hiragana/Katakana/Han) are all handled
 as one script type, since Japanese word can consist of multiple scripts.
 This exception is always applied regardless of the accept-language
 parameter.

optional bool split_by_unicode_script = 21 [default = true];

Returns:: This builder for chaining.

hasSplitByNumber

public boolean hasSplitByNumber()

 When `split_by_number` is true, put a boundary between number and
 non-number transition. If we want to treat "F1" is one token, set this flag
 to be false.

optional bool split_by_number = 23 [default = true];

Specified by:: hasSplitByNumber in interface SentencepieceModel.TrainerSpecOrBuilder
Returns:: Whether the splitByNumber field is set.

getSplitByNumber

public boolean getSplitByNumber()

 When `split_by_number` is true, put a boundary between number and
 non-number transition. If we want to treat "F1" is one token, set this flag
 to be false.

optional bool split_by_number = 23 [default = true];

Specified by:: getSplitByNumber in interface SentencepieceModel.TrainerSpecOrBuilder
Returns:: The splitByNumber.

setSplitByNumber

public SentencepieceModel.TrainerSpec.Builder setSplitByNumber(boolean value)

 When `split_by_number` is true, put a boundary between number and
 non-number transition. If we want to treat "F1" is one token, set this flag
 to be false.

optional bool split_by_number = 23 [default = true];

Parameters:: value - The splitByNumber to set.
Returns:: This builder for chaining.

clearSplitByNumber

public SentencepieceModel.TrainerSpec.Builder clearSplitByNumber()

 When `split_by_number` is true, put a boundary between number and
 non-number transition. If we want to treat "F1" is one token, set this flag
 to be false.

optional bool split_by_number = 23 [default = true];

Returns:: This builder for chaining.

hasSplitByWhitespace
```
public boolean hasSplitByWhitespace()
```
```
 Use a white space to split sentence pieces.
 When `split_by_whitespace` is false, we may have the piece containing
 a white space in the middle. e.g., "in_the".
 
```
optional bool split_by_whitespace = 22 [default = true];
Specified by:

hasSplitByWhitespace in interface SentencepieceModel.TrainerSpecOrBuilder

Returns:

Whether the splitByWhitespace field is set.

getSplitByWhitespace

public boolean getSplitByWhitespace()

 Use a white space to split sentence pieces.
 When `split_by_whitespace` is false, we may have the piece containing
 a white space in the middle. e.g., "in_the".

optional bool split_by_whitespace = 22 [default = true];

Specified by:: getSplitByWhitespace in interface SentencepieceModel.TrainerSpecOrBuilder
Returns:: The splitByWhitespace.

setSplitByWhitespace

public SentencepieceModel.TrainerSpec.Builder setSplitByWhitespace(boolean value)

 Use a white space to split sentence pieces.
 When `split_by_whitespace` is false, we may have the piece containing
 a white space in the middle. e.g., "in_the".

optional bool split_by_whitespace = 22 [default = true];

Parameters:: value - The splitByWhitespace to set.
Returns:: This builder for chaining.

clearSplitByWhitespace

public SentencepieceModel.TrainerSpec.Builder clearSplitByWhitespace()

 Use a white space to split sentence pieces.
 When `split_by_whitespace` is false, we may have the piece containing
 a white space in the middle. e.g., "in_the".

optional bool split_by_whitespace = 22 [default = true];

Returns:: This builder for chaining.

hasTreatWhitespaceAsSuffix

public boolean hasTreatWhitespaceAsSuffix()

 Adds whitespace symbol (_) as a suffix instead of prefix. e.g., _hello =>
 hello_. When `treat_whitespace_as_suffix` is true,
 NormalizerSpec::add_dummy_prefix will add the dummy whitespace to the end
 of sentence.

optional bool treat_whitespace_as_suffix = 24 [default = false];

Specified by:: hasTreatWhitespaceAsSuffix in interface SentencepieceModel.TrainerSpecOrBuilder
Returns:: Whether the treatWhitespaceAsSuffix field is set.

getTreatWhitespaceAsSuffix

public boolean getTreatWhitespaceAsSuffix()

 Adds whitespace symbol (_) as a suffix instead of prefix. e.g., _hello =>
 hello_. When `treat_whitespace_as_suffix` is true,
 NormalizerSpec::add_dummy_prefix will add the dummy whitespace to the end
 of sentence.

optional bool treat_whitespace_as_suffix = 24 [default = false];

Specified by:: getTreatWhitespaceAsSuffix in interface SentencepieceModel.TrainerSpecOrBuilder
Returns:: The treatWhitespaceAsSuffix.

setTreatWhitespaceAsSuffix

public SentencepieceModel.TrainerSpec.Builder setTreatWhitespaceAsSuffix(boolean value)

 Adds whitespace symbol (_) as a suffix instead of prefix. e.g., _hello =>
 hello_. When `treat_whitespace_as_suffix` is true,
 NormalizerSpec::add_dummy_prefix will add the dummy whitespace to the end
 of sentence.

optional bool treat_whitespace_as_suffix = 24 [default = false];

Parameters:: value - The treatWhitespaceAsSuffix to set.
Returns:: This builder for chaining.

clearTreatWhitespaceAsSuffix

public SentencepieceModel.TrainerSpec.Builder clearTreatWhitespaceAsSuffix()

 Adds whitespace symbol (_) as a suffix instead of prefix. e.g., _hello =>
 hello_. When `treat_whitespace_as_suffix` is true,
 NormalizerSpec::add_dummy_prefix will add the dummy whitespace to the end
 of sentence.

optional bool treat_whitespace_as_suffix = 24 [default = false];

Returns:: This builder for chaining.

hasAllowWhitespaceOnlyPieces
```
public boolean hasAllowWhitespaceOnlyPieces()
```
```
 Allows pieces that only contain whitespaces instead of appearing only as
 prefix or suffix of other pieces.
 
```
optional bool allow_whitespace_only_pieces = 26 [default = false];
Specified by:

hasAllowWhitespaceOnlyPieces in interface SentencepieceModel.TrainerSpecOrBuilder

Returns:

Whether the allowWhitespaceOnlyPieces field is set.

getAllowWhitespaceOnlyPieces
```
public boolean getAllowWhitespaceOnlyPieces()
```
```
 Allows pieces that only contain whitespaces instead of appearing only as
 prefix or suffix of other pieces.
 
```
optional bool allow_whitespace_only_pieces = 26 [default = false];
Specified by:

getAllowWhitespaceOnlyPieces in interface SentencepieceModel.TrainerSpecOrBuilder

Returns:

The allowWhitespaceOnlyPieces.

setAllowWhitespaceOnlyPieces

public SentencepieceModel.TrainerSpec.Builder setAllowWhitespaceOnlyPieces(boolean value)

 Allows pieces that only contain whitespaces instead of appearing only as
 prefix or suffix of other pieces.

optional bool allow_whitespace_only_pieces = 26 [default = false];

Parameters:: value - The allowWhitespaceOnlyPieces to set.
Returns:: This builder for chaining.

clearAllowWhitespaceOnlyPieces

public SentencepieceModel.TrainerSpec.Builder clearAllowWhitespaceOnlyPieces()

 Allows pieces that only contain whitespaces instead of appearing only as
 prefix or suffix of other pieces.

optional bool allow_whitespace_only_pieces = 26 [default = false];

Returns:: This builder for chaining.

hasSplitDigits
```
public boolean hasSplitDigits()
```
```
 Split all digits (0-9) into separate pieces.
 
```
optional bool split_digits = 25 [default = false];
Specified by:

hasSplitDigits in interface SentencepieceModel.TrainerSpecOrBuilder

Returns:

Whether the splitDigits field is set.

getSplitDigits
```
public boolean getSplitDigits()
```
```
 Split all digits (0-9) into separate pieces.
 
```
optional bool split_digits = 25 [default = false];
Specified by:

getSplitDigits in interface SentencepieceModel.TrainerSpecOrBuilder

Returns:

The splitDigits.

setSplitDigits
```
public SentencepieceModel.TrainerSpec.Builder setSplitDigits(boolean value)
```
```
 Split all digits (0-9) into separate pieces.
 
```
optional bool split_digits = 25 [default = false];
Parameters:

value - The splitDigits to set.

Returns:

This builder for chaining.

clearSplitDigits

public SentencepieceModel.TrainerSpec.Builder clearSplitDigits()

 Split all digits (0-9) into separate pieces.

optional bool split_digits = 25 [default = false];

Returns:: This builder for chaining.