类 RegexTokenizer
- java.lang.Object
-
- org.apache.flink.ml.feature.regextokenizer.RegexTokenizer
-
- 所有已实现的接口:
Serializable,org.apache.flink.ml.api.AlgoOperator<RegexTokenizer>,org.apache.flink.ml.api.Stage<RegexTokenizer>,org.apache.flink.ml.api.Transformer<RegexTokenizer>,org.apache.flink.ml.common.param.HasInputCol<RegexTokenizer>,org.apache.flink.ml.common.param.HasOutputCol<RegexTokenizer>,RegexTokenizerParams<RegexTokenizer>,org.apache.flink.ml.param.WithParams<RegexTokenizer>
public class RegexTokenizer extends Object implements org.apache.flink.ml.api.Transformer<RegexTokenizer>, RegexTokenizerParams<RegexTokenizer>
A Transformer which converts the input string to lowercase and then splits it by white spaces based on regex. It provides two options to extract tokens:- if "gaps" is true: uses the provided pattern to split the input string.
- else: repeatedly matches the regex (the provided pattern) with the input string.
Moreover, it provides parameters to filter tokens with a minimal length and converts input to lowercase. The output of each input string is an array of strings that can be empty.
- 另请参阅:
- 序列化表格
-
-
嵌套类概要
嵌套类 修饰符和类型 类 说明 static classRegexTokenizer.RegexTokenizerUdfThe main logic of $RegexTokenizer, which converts the input string to an array of tokens.
-
字段概要
-
从接口继承的字段 org.apache.flink.ml.feature.regextokenizer.RegexTokenizerParams
GAPS, MIN_TOKEN_LENGTH, PATTERN, TO_LOWERCASE
-
-
构造器概要
构造器 构造器 说明 RegexTokenizer()
-
方法概要
所有方法 静态方法 实例方法 具体方法 修饰符和类型 方法 说明 Map<org.apache.flink.ml.param.Param<?>,Object>getParamMap()static RegexTokenizerload(org.apache.flink.table.api.bridge.java.StreamTableEnvironment tEnv, String path)voidsave(String path)org.apache.flink.table.api.Table[]transform(org.apache.flink.table.api.Table... inputs)-
从类继承的方法 java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
-
从接口继承的方法 org.apache.flink.ml.feature.regextokenizer.RegexTokenizerParams
getGaps, getMinTokenLength, getPattern, getToLowercase, setGaps, setMinTokenLength, setPattern, setToLowercase
-
-
-
-
方法详细资料
-
transform
public org.apache.flink.table.api.Table[] transform(org.apache.flink.table.api.Table... inputs)
- 指定者:
transform在接口中org.apache.flink.ml.api.AlgoOperator<RegexTokenizer>
-
save
public void save(String path) throws IOException
- 指定者:
save在接口中org.apache.flink.ml.api.Stage<RegexTokenizer>- 抛出:
IOException
-
getParamMap
public Map<org.apache.flink.ml.param.Param<?>,Object> getParamMap()
- 指定者:
getParamMap在接口中org.apache.flink.ml.param.WithParams<RegexTokenizer>
-
load
public static RegexTokenizer load(org.apache.flink.table.api.bridge.java.StreamTableEnvironment tEnv, String path) throws IOException
- 抛出:
IOException
-
-