javaewah
Class EWAHCompressedBitmap

java.lang.Object
  extended by javaewah.EWAHCompressedBitmap
All Implemented Interfaces:
Externalizable, Serializable, Cloneable, Iterable<Integer>, BitmapStorage

public final class EWAHCompressedBitmap
extends Object
implements Cloneable, Externalizable, Iterable<Integer>, BitmapStorage

This implements the patent-free(1) EWAH scheme. Roughly speaking, it is a 64-bit variant of the BBC compression scheme used by Oracle for its bitmap indexes.

The objective of this compression type is to provide some compression, while reducing as much as possible the CPU cycle usage.

This implementation being 64-bit, it assumes a 64-bit CPU together with a 64-bit Java Virtual Machine. This same code on a 32-bit machine may not be as fast.

For more details, see the following paper:

A 32-bit version of the compressed format was described by Wu et al. and named WBC:

Probably, the best prior art is the Oracle bitmap compression scheme (BBC):

1- The authors do not know of any patent infringed by the following implementation. However, similar schemes, like WAH are covered by patents.

Since:
0.1.0
See Also:
Serialized Form

Field Summary
static int wordinbits
          The Constant wordinbits represents the number of bits in a long.
 
Constructor Summary
EWAHCompressedBitmap()
          Creates an empty bitmap (no bit set to true).
EWAHCompressedBitmap(int buffersize)
          Sets explicitly the buffer size (in 64-bit words).
 
Method Summary
 int add(long newdata)
          Adding words directly to the bitmap (for expert use).
 int add(long newdata, int bitsthatmatter)
          Adding words directly to the bitmap (for expert use).
 long addStreamOfDirtyWords(long[] data, long start, long number)
          if you have several dirty words to copy over, this might be faster.
 int addStreamOfEmptyWords(boolean v, long number)
          For experts: You want to add many zeroes or ones? This is the method you use.
 long addStreamOfNegatedDirtyWords(long[] data, long start, long number)
          Same as addStreamOfDirtyWords, but the words are negated.
static EWAHCompressedBitmap and(EWAHCompressedBitmap... bitmaps)
          Returns a new compressed bitmap containing the bitwise AND values of the provided bitmaps.
 EWAHCompressedBitmap and(EWAHCompressedBitmap a)
          Returns a new compressed bitmap containing the bitwise AND values of the current bitmap with some other bitmap.
static int andCardinality(EWAHCompressedBitmap... bitmaps)
          Returns the cardinality of the result of a bitwise AND of the values of the provided bitmaps.
 int andCardinality(EWAHCompressedBitmap a)
          Returns the cardinality of the result of a bitwise AND of the values of the current bitmap with some other bitmap.
 EWAHCompressedBitmap andNot(EWAHCompressedBitmap a)
          Returns a new compressed bitmap containing the bitwise AND NOT values of the current bitmap with some other bitmap.
 int andNotCardinality(EWAHCompressedBitmap a)
          Returns the cardinality of the result of a bitwise AND NOT of the values of the current bitmap with some other bitmap.
 int cardinality()
          reports the number of bits set to true.
 void clear()
          Clear any set bits and set size in bits back to 0
 Object clone()
           
 void deserialize(DataInput in)
          Deserialize.
protected static void discharge(BufferedRunningLengthWord initialWord, EWAHIterator iterator, BitmapStorage container)
          For internal use.
 boolean equals(Object o)
          Check to see whether the two compressed bitmaps contain the same data.
 List<Integer> getPositions()
          get the locations of the true values as one vector.
 int hashCode()
          Returns a customized hash code (based on Karp-Rabin).
 boolean intersects(EWAHCompressedBitmap a)
          Return true if the two EWAHCompressedBitmap have both at least one true bit in the same position.
 IntIterator intIterator()
          Iterator over the set bits (this is what most people will want to use to browse the content).
 Iterator<Integer> iterator()
          iterate over the positions of the true values.
 void not()
          Negate (bitwise) the current bitmap.
static EWAHCompressedBitmap or(EWAHCompressedBitmap... bitmaps)
          Returns a new compressed bitmap containing the bitwise OR values of the provided bitmaps.
 EWAHCompressedBitmap or(EWAHCompressedBitmap a)
          Returns a new compressed bitmap containing the bitwise OR values of the current bitmap with some other bitmap.
static int orCardinality(EWAHCompressedBitmap... bitmaps)
          Returns the cardinality of the result of a bitwise OR of the values of the provided bitmaps.
 int orCardinality(EWAHCompressedBitmap a)
          Returns the cardinality of the result of a bitwise OR of the values of the current bitmap with some other bitmap.
 void readExternal(ObjectInput in)
           
 void serialize(DataOutput out)
          Serialize.
 int serializedSizeInBytes()
          Report the size required to serialize this bitmap
 boolean set(int i)
          set the bit at position i to true, the bits must be set in increasing order.
 void setSizeInBits(int size)
          set the size in bits
 boolean setSizeInBits(int size, boolean defaultvalue)
          Change the reported size in bits of the *uncompressed* bitmap represented by this compressed bitmap.
 int sizeInBits()
          Returns the size in bits of the *uncompressed* bitmap represented by this compressed bitmap.
 int sizeInBytes()
          Report the *compressed* size of the bitmap (equivalent to memory usage, after accounting for some overhead).
 String toDebugString()
          A more detailed string describing the bitmap (useful for debugging).
 String toString()
          A string describing the bitmap.
 void writeExternal(ObjectOutput out)
           
 EWAHCompressedBitmap xor(EWAHCompressedBitmap a)
          Returns a new compressed bitmap containing the bitwise XOR values of the current bitmap with some other bitmap.
 int xorCardinality(EWAHCompressedBitmap a)
          Returns the cardinality of the result of a bitwise XOR of the values of the current bitmap with some other bitmap.
 
Methods inherited from class java.lang.Object
finalize, getClass, notify, notifyAll, wait, wait, wait
 

Field Detail

wordinbits

public static final int wordinbits
The Constant wordinbits represents the number of bits in a long.

See Also:
Constant Field Values
Constructor Detail

EWAHCompressedBitmap

public EWAHCompressedBitmap()
Creates an empty bitmap (no bit set to true).


EWAHCompressedBitmap

public EWAHCompressedBitmap(int buffersize)
Sets explicitly the buffer size (in 64-bit words). The initial memory usage will be "buffersize * 64". For large poorly compressible bitmaps, using large values may improve performance.

Parameters:
buffersize - number of 64-bit words reserved when the object is created)
Method Detail

xor

public EWAHCompressedBitmap xor(EWAHCompressedBitmap a)
Returns a new compressed bitmap containing the bitwise XOR values of the current bitmap with some other bitmap. The running time is proportional to the sum of the compressed sizes (as reported by sizeInBytes()).

Parameters:
a - the other bitmap
Returns:
the EWAH compressed bitmap

and

public EWAHCompressedBitmap and(EWAHCompressedBitmap a)
Returns a new compressed bitmap containing the bitwise AND values of the current bitmap with some other bitmap. The running time is proportional to the sum of the compressed sizes (as reported by sizeInBytes()).

Parameters:
a - the other bitmap
Returns:
the EWAH compressed bitmap
Since:
0.4.3

and

public static EWAHCompressedBitmap and(EWAHCompressedBitmap... bitmaps)
Returns a new compressed bitmap containing the bitwise AND values of the provided bitmaps.

Parameters:
bitmaps - bitmaps to AND together
Returns:
result of the AND
Since:
0.4.3

andCardinality

public static int andCardinality(EWAHCompressedBitmap... bitmaps)
Returns the cardinality of the result of a bitwise AND of the values of the provided bitmaps. Avoids needing to allocate an intermediate bitmap to hold the result of the AND.

Parameters:
bitmaps - bitmaps to AND
Returns:
the cardinality
Since:
0.4.3

intersects

public boolean intersects(EWAHCompressedBitmap a)
Return true if the two EWAHCompressedBitmap have both at least one true bit in the same position. Equivalently, you could call "and" and check whether there is a set bit, but intersects will run faster if you don't need the result of the "and" operation.

Parameters:
a - the other bitmap
Returns:
whether they intersect
Since:
0.3.2

andNot

public EWAHCompressedBitmap andNot(EWAHCompressedBitmap a)
Returns a new compressed bitmap containing the bitwise AND NOT values of the current bitmap with some other bitmap. The running time is proportional to the sum of the compressed sizes (as reported by sizeInBytes()).

Parameters:
a - the other bitmap
Returns:
the EWAH compressed bitmap

not

public void not()
Negate (bitwise) the current bitmap. To get a negated copy, do ((EWAHCompressedBitmap) mybitmap.clone()).not(); The running time is proportional to the compressed size (as reported by sizeInBytes()).


or

public EWAHCompressedBitmap or(EWAHCompressedBitmap a)
Returns a new compressed bitmap containing the bitwise OR values of the current bitmap with some other bitmap. The running time is proportional to the sum of the compressed sizes (as reported by sizeInBytes()).

Parameters:
a - the other bitmap
Returns:
the EWAH compressed bitmap

orCardinality

public int orCardinality(EWAHCompressedBitmap a)
Returns the cardinality of the result of a bitwise OR of the values of the current bitmap with some other bitmap. Avoids needing to allocate an intermediate bitmap to hold the result of the OR.

Parameters:
a - the other bitmap
Returns:
the cardinality
Since:
0.4.0

andCardinality

public int andCardinality(EWAHCompressedBitmap a)
Returns the cardinality of the result of a bitwise AND of the values of the current bitmap with some other bitmap. Avoids needing to allocate an intermediate bitmap to hold the result of the OR.

Parameters:
a - the other bitmap
Returns:
the cardinality
Since:
0.4.0

andNotCardinality

public int andNotCardinality(EWAHCompressedBitmap a)
Returns the cardinality of the result of a bitwise AND NOT of the values of the current bitmap with some other bitmap. Avoids needing to allocate an intermediate bitmap to hold the result of the OR.

Parameters:
a - the other bitmap
Returns:
the cardinality
Since:
0.4.0

xorCardinality

public int xorCardinality(EWAHCompressedBitmap a)
Returns the cardinality of the result of a bitwise XOR of the values of the current bitmap with some other bitmap. Avoids needing to allocate an intermediate bitmap to hold the result of the OR.

Parameters:
a - the other bitmap
Returns:
the cardinality
Since:
0.4.0

or

public static EWAHCompressedBitmap or(EWAHCompressedBitmap... bitmaps)
Returns a new compressed bitmap containing the bitwise OR values of the provided bitmaps.

Parameters:
bitmaps - bitmaps to OR together
Returns:
result of the OR
Since:
0.4.0

orCardinality

public static int orCardinality(EWAHCompressedBitmap... bitmaps)
Returns the cardinality of the result of a bitwise OR of the values of the provided bitmaps. Avoids needing to allocate an intermediate bitmap to hold the result of the OR.

Parameters:
bitmaps - bitmaps to OR
Returns:
the cardinality
Since:
0.4.0

discharge

protected static void discharge(BufferedRunningLengthWord initialWord,
                                EWAHIterator iterator,
                                BitmapStorage container)
For internal use.

Parameters:
initialWord - the initial word
iterator - the iterator
container - the container

set

public boolean set(int i)
set the bit at position i to true, the bits must be set in increasing order. For example, set(15) and then set(7) will fail. You must do set(7) and then set(15).

Parameters:
i - the index
Returns:
true if the value was set (always true when i>= sizeInBits()).

add

public int add(long newdata)
Adding words directly to the bitmap (for expert use). This is normally how you add data to the array. So you add bits in streams of 8*8 bits.

Specified by:
add in interface BitmapStorage
Parameters:
newdata - the word
Returns:
the number of words added to the buffer

addStreamOfEmptyWords

public int addStreamOfEmptyWords(boolean v,
                                 long number)
For experts: You want to add many zeroes or ones? This is the method you use.

Specified by:
addStreamOfEmptyWords in interface BitmapStorage
Parameters:
v - the boolean value
number - the number
Returns:
the number of words added to the buffer

addStreamOfNegatedDirtyWords

public long addStreamOfNegatedDirtyWords(long[] data,
                                         long start,
                                         long number)
Same as addStreamOfDirtyWords, but the words are negated.

Specified by:
addStreamOfNegatedDirtyWords in interface BitmapStorage
Parameters:
data - the dirty words
start - the starting point in the array
number - the number of dirty words to add
Returns:
how many (compressed) words were added to the bitmap

addStreamOfDirtyWords

public long addStreamOfDirtyWords(long[] data,
                                  long start,
                                  long number)
if you have several dirty words to copy over, this might be faster.

Specified by:
addStreamOfDirtyWords in interface BitmapStorage
Parameters:
data - the dirty words
start - the starting point in the array
number - the number of dirty words to add
Returns:
how many (compressed) words were added to the bitmap

add

public int add(long newdata,
               int bitsthatmatter)
Adding words directly to the bitmap (for expert use).

Parameters:
newdata - the word
bitsthatmatter - the number of significant bits (by default it should be 64)
Returns:
the number of words added to the buffer

sizeInBits

public int sizeInBits()
Returns the size in bits of the *uncompressed* bitmap represented by this compressed bitmap. Initially, the sizeInBits is zero. It is extended automatically when you set bits to true.

Returns:
the size in bits

setSizeInBits

public void setSizeInBits(int size)
set the size in bits

Specified by:
setSizeInBits in interface BitmapStorage
Parameters:
size - number of bits
Since:
0.4.0

setSizeInBits

public boolean setSizeInBits(int size,
                             boolean defaultvalue)
Change the reported size in bits of the *uncompressed* bitmap represented by this compressed bitmap. It is not possible to reduce the sizeInBits, but it can be extended. The new bits are set to false or true depending on the value of defaultvalue.

Parameters:
size - the size in bits
defaultvalue - the default boolean value
Returns:
true if the update was possible

sizeInBytes

public int sizeInBytes()
Report the *compressed* size of the bitmap (equivalent to memory usage, after accounting for some overhead).

Returns:
the size in bytes

cardinality

public int cardinality()
reports the number of bits set to true. Running time is proportional to compressed size (as reported by sizeInBytes).

Returns:
the number of bits set to true

toString

public String toString()
A string describing the bitmap.

Overrides:
toString in class Object
Returns:
the string

toDebugString

public String toDebugString()
A more detailed string describing the bitmap (useful for debugging).

Returns:
the string

intIterator

public IntIterator intIterator()
Iterator over the set bits (this is what most people will want to use to browse the content). The location of the set bits is returned, in increasing order.

Returns:
the int iterator

iterator

public Iterator<Integer> iterator()
iterate over the positions of the true values. This is similar to intIterator(), but it uses Java generics.

Specified by:
iterator in interface Iterable<Integer>
Returns:
the iterator

getPositions

public List<Integer> getPositions()
get the locations of the true values as one vector. (may use more memory than iterator())

Returns:
the positions

equals

public boolean equals(Object o)
Check to see whether the two compressed bitmaps contain the same data.

Overrides:
equals in class Object
See Also:
Object.equals(java.lang.Object)

hashCode

public int hashCode()
Returns a customized hash code (based on Karp-Rabin). Naturally, if the bitmaps are equal, they will hash to the same value.

Overrides:
hashCode in class Object

clone

public Object clone()
             throws CloneNotSupportedException
Overrides:
clone in class Object
Throws:
CloneNotSupportedException

readExternal

public void readExternal(ObjectInput in)
                  throws IOException
Specified by:
readExternal in interface Externalizable
Throws:
IOException

deserialize

public void deserialize(DataInput in)
                 throws IOException
Deserialize.

Parameters:
in - the DataInput stream
Throws:
IOException - Signals that an I/O exception has occurred.

writeExternal

public void writeExternal(ObjectOutput out)
                   throws IOException
Specified by:
writeExternal in interface Externalizable
Throws:
IOException

serialize

public void serialize(DataOutput out)
               throws IOException
Serialize.

Parameters:
out - the DataOutput stream
Throws:
IOException - Signals that an I/O exception has occurred.

serializedSizeInBytes

public int serializedSizeInBytes()
Report the size required to serialize this bitmap

Returns:
the size in bytes

clear

public void clear()
Clear any set bits and set size in bits back to 0



Copyright © 2012. All Rights Reserved.