类 MinHashLSHModel

    • 字段概要

      字段 
      修饰符和类型 字段 说明
      protected org.apache.flink.table.api.Table modelDataTable  
      • 从接口继承的字段 org.apache.flink.ml.common.param.HasInputCol

        INPUT_COL
      • 从接口继承的字段 org.apache.flink.ml.common.param.HasOutputCol

        OUTPUT_COL
    • 方法概要

      所有方法 静态方法 实例方法 具体方法 
      修饰符和类型 方法 说明
      org.apache.flink.table.api.Table approxNearestNeighbors​(org.apache.flink.table.api.Table dataset, org.apache.flink.ml.linalg.Vector key, int k)
      An overloaded version of `approxNearestNeighbors` with "distCol" as default value of `distCol`.
      org.apache.flink.table.api.Table approxNearestNeighbors​(org.apache.flink.table.api.Table dataset, org.apache.flink.ml.linalg.Vector key, int k, String distCol)
      Approximately finds at most k items from a dataset which have the closest distance to a given item.
      org.apache.flink.table.api.Table approxSimilarityJoin​(org.apache.flink.table.api.Table datasetA, org.apache.flink.table.api.Table datasetB, double threshold, String idCol)
      An overloaded version of `approxNearestNeighbors` with "distCol" as default value of `distCol`.
      org.apache.flink.table.api.Table approxSimilarityJoin​(org.apache.flink.table.api.Table datasetA, org.apache.flink.table.api.Table datasetB, double threshold, String idCol, String distCol)
      Joins two datasets to approximately find all pairs of rows whose distance are smaller than or equal to the threshold.
      org.apache.flink.table.api.Table[] getModelData()  
      Map<org.apache.flink.ml.param.Param<?>,​Object> getParamMap()  
      static MinHashLSHModel load​(org.apache.flink.table.api.bridge.java.StreamTableEnvironment tEnv, String path)
      Loads model data from path.
      void save​(String path)  
      T setModelData​(org.apache.flink.table.api.Table... inputs)  
      org.apache.flink.table.api.Table[] transform​(org.apache.flink.table.api.Table... inputs)  
      • 从接口继承的方法 org.apache.flink.ml.common.param.HasInputCol

        getInputCol, setInputCol
      • 从接口继承的方法 org.apache.flink.ml.common.param.HasOutputCol

        getOutputCol, setOutputCol
      • 从接口继承的方法 org.apache.flink.ml.param.WithParams

        get, getParam, set
    • 字段详细资料

      • modelDataTable

        protected org.apache.flink.table.api.Table modelDataTable
    • 构造器详细资料

      • MinHashLSHModel

        public MinHashLSHModel()
    • 方法详细资料

      • load

        public static MinHashLSHModel load​(org.apache.flink.table.api.bridge.java.StreamTableEnvironment tEnv,
                                           String path)
                                    throws IOException
        Loads model data from path.
        参数:
        tEnv - A StreamTableEnvironment instance.
        path - Model path.
        返回:
        LSH model.
        抛出:
        IOException
      • setModelData

        public T setModelData​(org.apache.flink.table.api.Table... inputs)
        指定者:
        setModelData 在接口中 org.apache.flink.ml.api.Model<T extends org.apache.flink.ml.feature.lsh.LSHModel<T>>
      • getModelData

        public org.apache.flink.table.api.Table[] getModelData()
        指定者:
        getModelData 在接口中 org.apache.flink.ml.api.Model<T extends org.apache.flink.ml.feature.lsh.LSHModel<T>>
      • getParamMap

        public Map<org.apache.flink.ml.param.Param<?>,​Object> getParamMap()
        指定者:
        getParamMap 在接口中 org.apache.flink.ml.param.WithParams<T extends org.apache.flink.ml.feature.lsh.LSHModel<T>>
      • transform

        public org.apache.flink.table.api.Table[] transform​(org.apache.flink.table.api.Table... inputs)
        指定者:
        transform 在接口中 org.apache.flink.ml.api.AlgoOperator<T extends org.apache.flink.ml.feature.lsh.LSHModel<T>>
      • approxNearestNeighbors

        public org.apache.flink.table.api.Table approxNearestNeighbors​(org.apache.flink.table.api.Table dataset,
                                                                       org.apache.flink.ml.linalg.Vector key,
                                                                       int k,
                                                                       String distCol)
        Approximately finds at most k items from a dataset which have the closest distance to a given item. If the `outputCol` is missing in the given dataset, this method transforms the dataset with the model at first.
        参数:
        dataset - The dataset in which to to search for nearest neighbors.
        key - The item to search for.
        k - The maximum number of nearest neighbors.
        distCol - The output column storing the distance between each neighbor and the key.
        返回:
        A dataset containing at most k items closest to the key with a column named `distCol` appended.
      • approxNearestNeighbors

        public org.apache.flink.table.api.Table approxNearestNeighbors​(org.apache.flink.table.api.Table dataset,
                                                                       org.apache.flink.ml.linalg.Vector key,
                                                                       int k)
        An overloaded version of `approxNearestNeighbors` with "distCol" as default value of `distCol`.
      • approxSimilarityJoin

        public org.apache.flink.table.api.Table approxSimilarityJoin​(org.apache.flink.table.api.Table datasetA,
                                                                     org.apache.flink.table.api.Table datasetB,
                                                                     double threshold,
                                                                     String idCol,
                                                                     String distCol)
        Joins two datasets to approximately find all pairs of rows whose distance are smaller than or equal to the threshold. If the `outputCol` is missing in either dataset, this method transforms the dataset at first.
        参数:
        datasetA - One dataset.
        datasetB - The other dataset.
        threshold - The distance threshold.
        idCol - A column in the two datasets to identify each row.
        distCol - The output column storing the distance between each pair of rows.
        返回:
        A joined dataset containing pairs of rows. The original rows are in columns "datasetA" and "datasetB", and a column "distCol" is added to show the distance between each pair.
      • approxSimilarityJoin

        public org.apache.flink.table.api.Table approxSimilarityJoin​(org.apache.flink.table.api.Table datasetA,
                                                                     org.apache.flink.table.api.Table datasetB,
                                                                     double threshold,
                                                                     String idCol)
        An overloaded version of `approxNearestNeighbors` with "distCol" as default value of `distCol`.