Package org.apache.lucene.demo.knn
Class KnnVectorDict
- java.lang.Object
-
- org.apache.lucene.demo.knn.KnnVectorDict
-
- All Implemented Interfaces:
Closeable
,AutoCloseable
public class KnnVectorDict extends Object implements Closeable
Manages a map from token to numeric vector for use with KnnVector indexing and search. The map is stored as an FST: token-to-ordinal plus a dense binary file holding the vectors.
-
-
Constructor Summary
Constructors Constructor Description KnnVectorDict(Directory directory, String dictName)
Sole constructor
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description static void
build(Path gloveInput, Directory directory, String dictName)
Convert from a GloVe-formatted dictionary file to a KnnVectorDict file pair.void
close()
void
get(BytesRef token, byte[] output)
Get the vector corresponding to the given token.int
getDimension()
Get the dimension of the vectors returned by this.long
ramBytesUsed()
Return the size of the dictionary in bytes
-
-
-
Constructor Detail
-
KnnVectorDict
public KnnVectorDict(Directory directory, String dictName) throws IOException
Sole constructor- Parameters:
directory
- Lucene directory from which knn directory should be read.dictName
- the base name of the directory files that store the knn vector dictionary. A file with extension '.bin' holds the vectors and the '.fst' maps tokens to offsets in the '.bin' file.- Throws:
IOException
-
-
Method Detail
-
get
public void get(BytesRef token, byte[] output) throws IOException
Get the vector corresponding to the given token. NOTE: the returned array is shared and its contents will be overwritten by subsequent calls. The caller is responsible to copy the data as needed.- Parameters:
token
- the token to look upoutput
- the array in which to write the corresponding vector. Its length must begetDimension()
*Float.BYTES
. It will be filled with zeros if the token is not present in the dictionary.- Throws:
IllegalArgumentException
- if the output array is incorrectly sizedIOException
- if there is a problem reading the dictionary
-
getDimension
public int getDimension()
Get the dimension of the vectors returned by this.- Returns:
- the vector dimension
-
close
public void close() throws IOException
- Specified by:
close
in interfaceAutoCloseable
- Specified by:
close
in interfaceCloseable
- Throws:
IOException
-
build
public static void build(Path gloveInput, Directory directory, String dictName) throws IOException
Convert from a GloVe-formatted dictionary file to a KnnVectorDict file pair.- Parameters:
gloveInput
- the path to the input dictionary. The dictionary is delimited by newlines, and each line is space-delimited. The first column has the token, and the remaining columns are the vector components, as text. The dictionary must be sorted by its leading tokens (considered as bytes).directory
- a Lucene directory to write the dictionary to.dictName
- Base name for the knn dictionary files.- Throws:
IOException
-
ramBytesUsed
public long ramBytesUsed()
Return the size of the dictionary in bytes
-
-