类 RegexTokenizer

  • 所有已实现的接口:
    Serializable, org.apache.flink.ml.api.AlgoOperator<RegexTokenizer>, org.apache.flink.ml.api.Stage<RegexTokenizer>, org.apache.flink.ml.api.Transformer<RegexTokenizer>, org.apache.flink.ml.common.param.HasInputCol<RegexTokenizer>, org.apache.flink.ml.common.param.HasOutputCol<RegexTokenizer>, RegexTokenizerParams<RegexTokenizer>, org.apache.flink.ml.param.WithParams<RegexTokenizer>

    public class RegexTokenizer
    extends Object
    implements org.apache.flink.ml.api.Transformer<RegexTokenizer>, RegexTokenizerParams<RegexTokenizer>
    A Transformer which converts the input string to lowercase and then splits it by white spaces based on regex. It provides two options to extract tokens:
    • if "gaps" is true: uses the provided pattern to split the input string.
    • else: repeatedly matches the regex (the provided pattern) with the input string.

    Moreover, it provides parameters to filter tokens with a minimal length and converts input to lowercase. The output of each input string is an array of strings that can be empty.

    另请参阅:
    序列化表格
    • 构造器详细资料

      • RegexTokenizer

        public RegexTokenizer()
    • 方法详细资料

      • transform

        public org.apache.flink.table.api.Table[] transform​(org.apache.flink.table.api.Table... inputs)
        指定者:
        transform 在接口中 org.apache.flink.ml.api.AlgoOperator<RegexTokenizer>
      • getParamMap

        public Map<org.apache.flink.ml.param.Param<?>,​Object> getParamMap()
        指定者:
        getParamMap 在接口中 org.apache.flink.ml.param.WithParams<RegexTokenizer>