类 MinHashLSHModel
- java.lang.Object
-
- org.apache.flink.ml.feature.lsh.MinHashLSHModel
-
- 所有已实现的接口:
Serializable,org.apache.flink.ml.api.AlgoOperator<MinHashLSHModel>,org.apache.flink.ml.api.Model<MinHashLSHModel>,org.apache.flink.ml.api.Stage<MinHashLSHModel>,org.apache.flink.ml.api.Transformer<MinHashLSHModel>,org.apache.flink.ml.common.param.HasInputCol<MinHashLSHModel>,org.apache.flink.ml.common.param.HasOutputCol<MinHashLSHModel>,LSHModelParams<MinHashLSHModel>,org.apache.flink.ml.param.WithParams<MinHashLSHModel>
public class MinHashLSHModel extends Object
A Model which generates hash values using the model data computed byMinHashLSH.- 另请参阅:
- 序列化表格
-
-
字段概要
字段 修饰符和类型 字段 说明 protected org.apache.flink.table.api.TablemodelDataTable
-
构造器概要
构造器 构造器 说明 MinHashLSHModel()
-
方法概要
所有方法 静态方法 实例方法 具体方法 修饰符和类型 方法 说明 org.apache.flink.table.api.TableapproxNearestNeighbors(org.apache.flink.table.api.Table dataset, org.apache.flink.ml.linalg.Vector key, int k)An overloaded version of `approxNearestNeighbors` with "distCol" as default value of `distCol`.org.apache.flink.table.api.TableapproxNearestNeighbors(org.apache.flink.table.api.Table dataset, org.apache.flink.ml.linalg.Vector key, int k, String distCol)Approximately finds at most k items from a dataset which have the closest distance to a given item.org.apache.flink.table.api.TableapproxSimilarityJoin(org.apache.flink.table.api.Table datasetA, org.apache.flink.table.api.Table datasetB, double threshold, String idCol)An overloaded version of `approxNearestNeighbors` with "distCol" as default value of `distCol`.org.apache.flink.table.api.TableapproxSimilarityJoin(org.apache.flink.table.api.Table datasetA, org.apache.flink.table.api.Table datasetB, double threshold, String idCol, String distCol)Joins two datasets to approximately find all pairs of rows whose distance are smaller than or equal to the threshold.org.apache.flink.table.api.Table[]getModelData()Map<org.apache.flink.ml.param.Param<?>,Object>getParamMap()static MinHashLSHModelload(org.apache.flink.table.api.bridge.java.StreamTableEnvironment tEnv, String path)Loads model data from path.voidsave(String path)TsetModelData(org.apache.flink.table.api.Table... inputs)org.apache.flink.table.api.Table[]transform(org.apache.flink.table.api.Table... inputs)
-
-
-
方法详细资料
-
save
public void save(String path) throws IOException
- 抛出:
IOException
-
load
public static MinHashLSHModel load(org.apache.flink.table.api.bridge.java.StreamTableEnvironment tEnv, String path) throws IOException
Loads model data from path.- 参数:
tEnv- A StreamTableEnvironment instance.path- Model path.- 返回:
- LSH model.
- 抛出:
IOException
-
setModelData
public T setModelData(org.apache.flink.table.api.Table... inputs)
- 指定者:
setModelData在接口中org.apache.flink.ml.api.Model<T extends org.apache.flink.ml.feature.lsh.LSHModel<T>>
-
getModelData
public org.apache.flink.table.api.Table[] getModelData()
- 指定者:
getModelData在接口中org.apache.flink.ml.api.Model<T extends org.apache.flink.ml.feature.lsh.LSHModel<T>>
-
getParamMap
public Map<org.apache.flink.ml.param.Param<?>,Object> getParamMap()
- 指定者:
getParamMap在接口中org.apache.flink.ml.param.WithParams<T extends org.apache.flink.ml.feature.lsh.LSHModel<T>>
-
transform
public org.apache.flink.table.api.Table[] transform(org.apache.flink.table.api.Table... inputs)
- 指定者:
transform在接口中org.apache.flink.ml.api.AlgoOperator<T extends org.apache.flink.ml.feature.lsh.LSHModel<T>>
-
approxNearestNeighbors
public org.apache.flink.table.api.Table approxNearestNeighbors(org.apache.flink.table.api.Table dataset, org.apache.flink.ml.linalg.Vector key, int k, String distCol)Approximately finds at most k items from a dataset which have the closest distance to a given item. If the `outputCol` is missing in the given dataset, this method transforms the dataset with the model at first.- 参数:
dataset- The dataset in which to to search for nearest neighbors.key- The item to search for.k- The maximum number of nearest neighbors.distCol- The output column storing the distance between each neighbor and the key.- 返回:
- A dataset containing at most k items closest to the key with a column named `distCol` appended.
-
approxNearestNeighbors
public org.apache.flink.table.api.Table approxNearestNeighbors(org.apache.flink.table.api.Table dataset, org.apache.flink.ml.linalg.Vector key, int k)An overloaded version of `approxNearestNeighbors` with "distCol" as default value of `distCol`.
-
approxSimilarityJoin
public org.apache.flink.table.api.Table approxSimilarityJoin(org.apache.flink.table.api.Table datasetA, org.apache.flink.table.api.Table datasetB, double threshold, String idCol, String distCol)Joins two datasets to approximately find all pairs of rows whose distance are smaller than or equal to the threshold. If the `outputCol` is missing in either dataset, this method transforms the dataset at first.- 参数:
datasetA- One dataset.datasetB- The other dataset.threshold- The distance threshold.idCol- A column in the two datasets to identify each row.distCol- The output column storing the distance between each pair of rows.- 返回:
- A joined dataset containing pairs of rows. The original rows are in columns "datasetA" and "datasetB", and a column "distCol" is added to show the distance between each pair.
-
approxSimilarityJoin
public org.apache.flink.table.api.Table approxSimilarityJoin(org.apache.flink.table.api.Table datasetA, org.apache.flink.table.api.Table datasetB, double threshold, String idCol)An overloaded version of `approxNearestNeighbors` with "distCol" as default value of `distCol`.
-
-