Class VectorIndexBuilder

    • Method Detail

      • withDistanceFunction

        public VectorIndexBuilder withDistanceFunction​(com.github.jelmerk.knn.DistanceFunction distanceFunction)
      • withM

        public VectorIndexBuilder withM​(int m)
        Sets the number of bi-directional links created for every new element during construction. Reasonable range for m is 2-100. Higher m work better on datasets with high intrinsic dimensionality and/or high recall, while low m work better for datasets with low intrinsic dimensionality and/or low recalls. The parameter also determines the algorithm's memory consumption. As an example for d = 4 random vectors optimal m for search is somewhere around 6, while for high dimensional datasets (word embeddings, good face descriptors), higher M are required (e.g. m = 48, 64) for optimal performance at high recall. The range mM = 12-48 is ok for the most of the use cases. When m is changed one has to update the other parameters. Nonetheless, ef and efConstruction parameters can be roughly estimated by assuming that m efConstruction is a constant.
        Parameters:
        m - the number of bi-directional links created for every new element during construction
        Returns:
        the builder.
      • withEfConstruction

        public VectorIndexBuilder withEfConstruction​(int efConstruction)
        ` The parameter has the same meaning as ef, but controls the index time / index precision. Bigger efConstruction leads to longer construction, but better index quality. At some point, increasing efConstruction does not improve the quality of the index. One way to check if the selection of ef_construction was ok is to measure a recall for M nearest neighbor search when ef = efConstruction: if the recall is lower than 0.9, then there is room for improvement.
        Parameters:
        efConstruction - controls the index time / index precision
        Returns:
        the builder
      • withEf

        public VectorIndexBuilder withEf​(int ef)
        The size of the dynamic list for the nearest neighbors (used during the search). Higher ef leads to more accurate but slower search. The value ef of can be anything between k and the size of the dataset.
        Parameters:
        ef - size of the dynamic list for the nearest neighbors
        Returns:
        the builder
      • getDimensions

        public int getDimensions()
      • getDistanceFunction

        public com.github.jelmerk.knn.DistanceFunction getDistanceFunction()
      • getDistanceComparator

        public Comparator getDistanceComparator()
      • getM

        public int getM()
      • getEf

        public int getEf()
      • getEfConstruction

        public int getEfConstruction()
      • getMaxItemCount

        public int getMaxItemCount()
      • getVertexType

        public String getVertexType()
      • getIdPropertyName

        public String getIdPropertyName()
      • getDeletedPropertyName

        public String getDeletedPropertyName()
      • getEdgeType

        public String getEdgeType()
      • getVectorPropertyName

        public String getVectorPropertyName()