Package sentencepiece

Interface SentencepieceModel.NormalizerSpecOrBuilder

    • Method Summary

      All Methods Instance Methods Abstract Methods 
      Modifier and Type Method Description
      boolean getAddDummyPrefix()
      Adds dummy whitespace at the beginning of text in order to treat "world" in "world" and "hello world" in the same way.
      boolean getEscapeWhitespaces()
      Replaces whitespace with meta symbol.
      java.lang.String getName()
      name of normalization rule.
      com.google.protobuf.ByteString getNameBytes()
      name of normalization rule.
      java.lang.String getNormalizationRuleTsv()
      Custom normalization rule file in TSV format.
      com.google.protobuf.ByteString getNormalizationRuleTsvBytes()
      Custom normalization rule file in TSV format.
      com.google.protobuf.ByteString getPrecompiledCharsmap()
      Pre-compiled normalization rule created by Builder::GetPrecompiledCharsMap() or Builder::CompileCharsMap() method.
      boolean getRemoveExtraWhitespaces()
      Removes leading, trailing, and duplicate internal whitespace.
      boolean hasAddDummyPrefix()
      Adds dummy whitespace at the beginning of text in order to treat "world" in "world" and "hello world" in the same way.
      boolean hasEscapeWhitespaces()
      Replaces whitespace with meta symbol.
      boolean hasName()
      name of normalization rule.
      boolean hasNormalizationRuleTsv()
      Custom normalization rule file in TSV format.
      boolean hasPrecompiledCharsmap()
      Pre-compiled normalization rule created by Builder::GetPrecompiledCharsMap() or Builder::CompileCharsMap() method.
      boolean hasRemoveExtraWhitespaces()
      Removes leading, trailing, and duplicate internal whitespace.
      • Methods inherited from interface com.google.protobuf.GeneratedMessageV3.ExtendableMessageOrBuilder

        getDefaultInstanceForType, getExtension, getExtension, getExtension, getExtension, getExtension, getExtension, getExtensionCount, getExtensionCount, getExtensionCount, hasExtension, hasExtension, hasExtension
      • Methods inherited from interface com.google.protobuf.MessageLiteOrBuilder

        isInitialized
      • Methods inherited from interface com.google.protobuf.MessageOrBuilder

        findInitializationErrors, getAllFields, getDescriptorForType, getField, getInitializationErrorString, getOneofFieldDescriptor, getRepeatedField, getRepeatedFieldCount, getUnknownFields, hasField, hasOneof
    • Method Detail

      • hasName

        boolean hasName()
         name of normalization rule.
         
        optional string name = 1;
        Returns:
        Whether the name field is set.
      • getName

        java.lang.String getName()
         name of normalization rule.
         
        optional string name = 1;
        Returns:
        The name.
      • getNameBytes

        com.google.protobuf.ByteString getNameBytes()
         name of normalization rule.
         
        optional string name = 1;
        Returns:
        The bytes for name.
      • hasPrecompiledCharsmap

        boolean hasPrecompiledCharsmap()
         Pre-compiled normalization rule created by
         Builder::GetPrecompiledCharsMap() or Builder::CompileCharsMap() method.
         Usually this field is set by Builder::GetNormalizerSpec() method.
         
        optional bytes precompiled_charsmap = 2;
        Returns:
        Whether the precompiledCharsmap field is set.
      • getPrecompiledCharsmap

        com.google.protobuf.ByteString getPrecompiledCharsmap()
         Pre-compiled normalization rule created by
         Builder::GetPrecompiledCharsMap() or Builder::CompileCharsMap() method.
         Usually this field is set by Builder::GetNormalizerSpec() method.
         
        optional bytes precompiled_charsmap = 2;
        Returns:
        The precompiledCharsmap.
      • hasAddDummyPrefix

        boolean hasAddDummyPrefix()
         Adds dummy whitespace at the beginning of text in order to
         treat "world" in "world" and "hello world" in the same way.
         
        optional bool add_dummy_prefix = 3 [default = true];
        Returns:
        Whether the addDummyPrefix field is set.
      • getAddDummyPrefix

        boolean getAddDummyPrefix()
         Adds dummy whitespace at the beginning of text in order to
         treat "world" in "world" and "hello world" in the same way.
         
        optional bool add_dummy_prefix = 3 [default = true];
        Returns:
        The addDummyPrefix.
      • hasRemoveExtraWhitespaces

        boolean hasRemoveExtraWhitespaces()
         Removes leading, trailing, and duplicate internal whitespace.
         
        optional bool remove_extra_whitespaces = 4 [default = true];
        Returns:
        Whether the removeExtraWhitespaces field is set.
      • getRemoveExtraWhitespaces

        boolean getRemoveExtraWhitespaces()
         Removes leading, trailing, and duplicate internal whitespace.
         
        optional bool remove_extra_whitespaces = 4 [default = true];
        Returns:
        The removeExtraWhitespaces.
      • hasEscapeWhitespaces

        boolean hasEscapeWhitespaces()
         Replaces whitespace with meta symbol.
         This field must be true to train sentence piece model.
         
        optional bool escape_whitespaces = 5 [default = true];
        Returns:
        Whether the escapeWhitespaces field is set.
      • getEscapeWhitespaces

        boolean getEscapeWhitespaces()
         Replaces whitespace with meta symbol.
         This field must be true to train sentence piece model.
         
        optional bool escape_whitespaces = 5 [default = true];
        Returns:
        The escapeWhitespaces.
      • hasNormalizationRuleTsv

        boolean hasNormalizationRuleTsv()
         Custom normalization rule file in TSV format.
         https://github.com/google/sentencepiece/blob/master/doc/normalization.md
         This field is only used in SentencePieceTrainer::Train() method, which
         compiles the rule into the binary rule stored in `precompiled_charsmap`.
         
        optional string normalization_rule_tsv = 6;
        Returns:
        Whether the normalizationRuleTsv field is set.
      • getNormalizationRuleTsv

        java.lang.String getNormalizationRuleTsv()
         Custom normalization rule file in TSV format.
         https://github.com/google/sentencepiece/blob/master/doc/normalization.md
         This field is only used in SentencePieceTrainer::Train() method, which
         compiles the rule into the binary rule stored in `precompiled_charsmap`.
         
        optional string normalization_rule_tsv = 6;
        Returns:
        The normalizationRuleTsv.
      • getNormalizationRuleTsvBytes

        com.google.protobuf.ByteString getNormalizationRuleTsvBytes()
         Custom normalization rule file in TSV format.
         https://github.com/google/sentencepiece/blob/master/doc/normalization.md
         This field is only used in SentencePieceTrainer::Train() method, which
         compiles the rule into the binary rule stored in `precompiled_charsmap`.
         
        optional string normalization_rule_tsv = 6;
        Returns:
        The bytes for normalizationRuleTsv.