public class BloomKFilter extends Object
BloomFilter
. Unlike BloomFilter, BloomKFilter will spread
'k' hash bits within same cache line for better L1 cache performance. The way it works is,
First hash code is computed from key which is used to locate the block offset (n-longs in bitset constitute a block)
Subsequent 'k' hash codes are used to spread hash bits within the block. By default block size is chosen as 8,
which is to match cache line size (8 longs = 64 bytes = cache line size).
Refer addBytes(byte[])
for more info.
This implementation has much lesser L1 data cache misses than BloomFilter
.Modifier and Type | Class and Description |
---|---|
static class |
BloomKFilter.BitSet
Bare metal bit set implementation.
|
Modifier and Type | Field and Description |
---|---|
static float |
DEFAULT_FPP |
static int |
START_OF_SERIALIZED_LONGS |
Constructor and Description |
---|
BloomKFilter(long maxNumEntries) |
BloomKFilter(long[] bits,
int numFuncs)
A constructor to support rebuilding the BloomFilter from a serialized representation.
|
Modifier and Type | Method and Description |
---|---|
void |
add(byte[] val) |
void |
addByte(byte val) |
void |
addBytes(byte[] val) |
void |
addBytes(byte[] val,
int offset,
int length) |
void |
addDouble(double val) |
void |
addFloat(float val) |
void |
addInt(int val) |
void |
addLong(long val) |
void |
addString(String val) |
static BloomKFilter |
deserialize(InputStream in)
Deserialize a bloom filter
Read a byte stream, which was written by serialize(OutputStream, BloomKFilter)
into a
BloomKFilter |
long[] |
getBitSet() |
int |
getBitSize() |
static byte[] |
getInitialBytes(long expectedEntries) |
long |
getNumBits() |
int |
getNumHashFunctions() |
void |
merge(BloomKFilter that)
Merge the specified bloom filter with current bloom filter.
|
static void |
mergeBloomFilterBytes(byte[] bf1Bytes,
int bf1Start,
int bf1Length,
byte[] bf2Bytes,
int bf2Start,
int bf2Length) |
static void |
mergeBloomFilterBytes(byte[] bf1Bytes,
int bf1Start,
int bf1Length,
byte[] bf2Bytes,
int bf2Start,
int bf2Length,
int mergeStart,
int mergeEnd)
Merges BloomKFilter bf2 into bf1.
|
void |
reset() |
static void |
serialize(OutputStream out,
BloomKFilter bloomFilter)
Serialize a bloom filter:
Serialized BloomKFilter format:
1 byte for the number of hash functions.
|
long |
sizeInBytes() |
boolean |
test(byte[] val) |
boolean |
testByte(byte val) |
boolean |
testBytes(byte[] val) |
boolean |
testBytes(byte[] val,
int offset,
int length) |
boolean |
testDouble(double val) |
boolean |
testFloat(float val) |
boolean |
testInt(int val) |
boolean |
testLong(long val) |
boolean |
testString(String val) |
String |
toString() |
public static final float DEFAULT_FPP
public static final int START_OF_SERIALIZED_LONGS
public BloomKFilter(long maxNumEntries)
public BloomKFilter(long[] bits, int numFuncs)
bits
- BloomK sketch data in form of array of longs.numFuncs
- Number of functions called as K.public void add(byte[] val)
public void addBytes(byte[] val, int offset, int length)
public void addBytes(byte[] val)
public void addString(String val)
public void addByte(byte val)
public void addInt(int val)
public void addLong(long val)
public void addFloat(float val)
public void addDouble(double val)
public boolean test(byte[] val)
public boolean testBytes(byte[] val)
public boolean testBytes(byte[] val, int offset, int length)
public boolean testString(String val)
public boolean testByte(byte val)
public boolean testInt(int val)
public boolean testLong(long val)
public boolean testFloat(float val)
public boolean testDouble(double val)
public long sizeInBytes()
public int getBitSize()
public int getNumHashFunctions()
public long getNumBits()
public long[] getBitSet()
public void merge(BloomKFilter that)
that
- - bloom filter to mergepublic void reset()
public static void serialize(OutputStream out, BloomKFilter bloomFilter) throws IOException
out
- output stream to write tobloomFilter
- BloomKFilter that needs to be serializedIOException
public static BloomKFilter deserialize(InputStream in) throws IOException
BloomKFilter
in
- input bytestreamIOException
public static void mergeBloomFilterBytes(byte[] bf1Bytes, int bf1Start, int bf1Length, byte[] bf2Bytes, int bf2Start, int bf2Length)
public static void mergeBloomFilterBytes(byte[] bf1Bytes, int bf1Start, int bf1Length, byte[] bf2Bytes, int bf2Start, int bf2Length, int mergeStart, int mergeEnd)
bf1Bytes
- Data of bloom filter 1.bf1Start
- Start index of BF1.bf1Length
- BF1 length.bf2Bytes
- Data of bloom filter 1bf2Start
- Start index of BF2.bf2Length
- BF2 length.public static byte[] getInitialBytes(long expectedEntries)
Copyright © 2021 The Apache Software Foundation. All rights reserved.