seems to work, but experimental and not generic yet
Bucket keys to use for quickly finding other similar items via locality sensitive hashing
Bucket keys to use for quickly finding other similar items via locality sensitive hashing
Decode two signatures into hash values, combine them somehow, and produce a new array
Decode two signatures into hash values, combine them somehow, and produce a new array
Initialize a byte array by generating hash values
Initialize a byte array by generating hash values
useful for understanding the effects of numBands and numRows
useful for understanding the effects of numBands and numRows
the number of bytes used for each hash in the signature
the number of bytes used for each hash in the signature
Create a signature for an arbitrary value
Create a signature for an arbitrary value
Create a signature for a single String value
Create a signature for a single String value
Create a signature for a single Long value
Create a signature for a single Long value
Maximum value the hash can take on (not 2*hashSize because of signed types)
Maximum value the hash can take on (not 2*hashSize because of signed types)
For explanation of the "bands" and "rows" see Ullman and Rajaraman
For explanation of the "bands" and "rows" see Ullman and Rajaraman
Set union
useful for understanding the effects of numBands and numRows
useful for understanding the effects of numBands and numRows
Esimate jaccard similarity (size of union / size of intersection)
Esimate jaccard similarity (size of union / size of intersection)
override this if there is a faster way to do this sum than reduceLeftOption on plus
override this if there is a faster way to do this sum than reduceLeftOption on plus
Signature for empty set, needed to be a proper Monoid