javaewah
Class EWAHCompressedBitmap

java.lang.Object
  extended by javaewah.EWAHCompressedBitmap
All Implemented Interfaces:
Externalizable, Serializable, Cloneable, Iterable<Integer>

public final class EWAHCompressedBitmap
extends Object
implements Cloneable, Externalizable, Iterable<Integer>

This implements the patent-free(1) EWAH scheme. Roughly speaking, it is a 64-bit variant of the BBC compression scheme used by Oracle for its bitmap indexes.

The objective of this compression type is to provide some compression, while reducing as much as possible the CPU cycle usage.

This implementation being 64-bit, it assumes a 64-bit CPU together with a 64-bit Java Virtual Machine. This same code on a 32-bit machine may not be as fast.

For more details, see the following paper:

It was first described by Wu et al. and named WBC:

We can view this scheme as a 64-bit equivalent to the Oracle bitmap compression scheme:

1- The author (D. Lemire) does not know of any patent infringed by the following implementation. However, similar schemes, like WAH are covered by patents.

See Also:
Serialized Form

Field Summary
static int wordinbits
          The Constant wordinbits represents the number of bits in a long.
 
Constructor Summary
EWAHCompressedBitmap()
          Creates an empty bitmap (no bit set to true).
EWAHCompressedBitmap(int buffersize)
          Sets explicitly the buffer size (in 64-bit words).
 
Method Summary
 int add(long newdata)
          Adding words directly to the bitmap (for expert use).
 int add(long newdata, int bitsthatmatter)
          Adding words directly to the bitmap (for expert use).
 int addStreamOfEmptyWords(boolean v, long number)
          For experts: You want to add many zeroes or ones? This is the method you use.
 EWAHCompressedBitmap and(EWAHCompressedBitmap a)
          Returns a new compressed bitmap containing the bitwise AND values of the current bitmap with some other bitmap.
 EWAHCompressedBitmap andNot(EWAHCompressedBitmap a)
          Returns a new compressed bitmap containing the bitwise AND NOT values of the current bitmap with some other bitmap.
 int cardinality()
          reports the number of bits set to true.
 void clear()
          Clear any set bits and set size in bits back to 0
 Object clone()
           
 void deserialize(DataInput in)
          Deserialize.
 boolean equals(Object o)
          Check to see whether the two compressed bitmaps contain the same data.
 List<Integer> getPositions()
          get the locations of the true values as one vector.
 int hashCode()
          Returns a customized hash code (based on Karp-Rabin).
 boolean intersects(EWAHCompressedBitmap a)
          Return true if the two EWAHCompressedBitmap have both at least one true bit in the same position.
 IntIterator intIterator()
          Iterator over the set bits (this is what most people will want to use to browse the content).
 Iterator<Integer> iterator()
          iterate over the positions of the true values.
 void not()
          Negate (bitwise) the current bitmap.
 EWAHCompressedBitmap or(EWAHCompressedBitmap a)
          Returns a new compressed bitmap containing the bitwise OR values of the current bitmap with some other bitmap.
 void readExternal(ObjectInput in)
           
 void serialize(DataOutput out)
          Serialize.
 boolean set(int i)
          set the bit at position i to true, the bits must be set in increasing order.
 boolean setSizeInBits(int size, boolean defaultvalue)
          Change the reported size in bits of the *uncompressed* bitmap represented by this compressed bitmap.
 int sizeInBits()
          Returns the size in bits of the *uncompressed* bitmap represented by this compressed bitmap.
 int sizeInBytes()
          Report the *compressed* size of the bitmap (equivalent to memory usage, after accounting for some overhead).
 String toDebugString()
          A more detailed string describing the bitmap (useful for debugging).
 String toString()
          A string describing the bitmap.
 void writeExternal(ObjectOutput out)
           
 EWAHCompressedBitmap xor(EWAHCompressedBitmap a)
          Returns a new compressed bitmap containing the bitwise XOR values of the current bitmap with some other bitmap.
 
Methods inherited from class java.lang.Object
finalize, getClass, notify, notifyAll, wait, wait, wait
 

Field Detail

wordinbits

public static final int wordinbits
The Constant wordinbits represents the number of bits in a long.

See Also:
Constant Field Values
Constructor Detail

EWAHCompressedBitmap

public EWAHCompressedBitmap()
Creates an empty bitmap (no bit set to true).


EWAHCompressedBitmap

public EWAHCompressedBitmap(int buffersize)
Sets explicitly the buffer size (in 64-bit words). The initial memory usage will be "buffersize * 64". For large poorly compressible bitmaps, using large values may improve performance.

Parameters:
buffersize - number of 64-bit words reserved when the object is created)
Method Detail

xor

public EWAHCompressedBitmap xor(EWAHCompressedBitmap a)
Returns a new compressed bitmap containing the bitwise XOR values of the current bitmap with some other bitmap. The running time is proportional to the sum of the compressed sizes (as reported by sizeInBytes()).

Parameters:
a - the other bitmap
Returns:
the EWAH compressed bitmap

and

public EWAHCompressedBitmap and(EWAHCompressedBitmap a)
Returns a new compressed bitmap containing the bitwise AND values of the current bitmap with some other bitmap. The running time is proportional to the sum of the compressed sizes (as reported by sizeInBytes()).

Parameters:
a - the other bitmap
Returns:
the EWAH compressed bitmap

intersects

public boolean intersects(EWAHCompressedBitmap a)
Return true if the two EWAHCompressedBitmap have both at least one true bit in the same position. Equivalently, you could call "and" and check whether there is a set bit, but intersects will run faster if you don't need the result of the "and" operation.

Parameters:
a - the other bitmap
Returns:
whether they intersect

andNot

public EWAHCompressedBitmap andNot(EWAHCompressedBitmap a)
Returns a new compressed bitmap containing the bitwise AND NOT values of the current bitmap with some other bitmap. The running time is proportional to the sum of the compressed sizes (as reported by sizeInBytes()).

Parameters:
a - the other bitmap
Returns:
the EWAH compressed bitmap

not

public void not()
Negate (bitwise) the current bitmap. To get a negated copy, do ((EWAHCompressedBitmap) mybitmap.clone()).not(); The running time is proportional to the compressed size (as reported by sizeInBytes()).


or

public EWAHCompressedBitmap or(EWAHCompressedBitmap a)
Returns a new compressed bitmap containing the bitwise OR values of the current bitmap with some other bitmap. The running time is proportional to the sum of the compressed sizes (as reported by sizeInBytes()).

Parameters:
a - the other bitmap
Returns:
the EWAH compressed bitmap

set

public boolean set(int i)
set the bit at position i to true, the bits must be set in increasing order. For example, set(15) and then set(7) will fail. You must do set(7) and then set(15).

Parameters:
i - the index
Returns:
true if the value was set (always true when i>= sizeInBits()).

add

public int add(long newdata)
Adding words directly to the bitmap (for expert use). This is normally how you add data to the array. So you add bits in streams of 8*8 bits.

Parameters:
newdata - the word
Returns:
the number of words added to the buffer

addStreamOfEmptyWords

public int addStreamOfEmptyWords(boolean v,
                                 long number)
For experts: You want to add many zeroes or ones? This is the method you use.

Parameters:
v - the boolean value
number - the number
Returns:
the number of words added to the buffer

add

public int add(long newdata,
               int bitsthatmatter)
Adding words directly to the bitmap (for expert use).

Parameters:
newdata - the word
bitsthatmatter - the number of significant bits (by default it should be 64)
Returns:
the number of words added to the buffer

sizeInBits

public int sizeInBits()
Returns the size in bits of the *uncompressed* bitmap represented by this compressed bitmap. Initially, the sizeInBits is zero. It is extended automatically when you set bits to true.

Returns:
the size in bits

setSizeInBits

public boolean setSizeInBits(int size,
                             boolean defaultvalue)
Change the reported size in bits of the *uncompressed* bitmap represented by this compressed bitmap. It is not possible to reduce the sizeInBits, but it can be extended. The new bits are set to false or true depending on the value of defaultvalue.

Parameters:
size - the size in bits
defaultvalue - the default boolean value
Returns:
true if the update was possible

sizeInBytes

public int sizeInBytes()
Report the *compressed* size of the bitmap (equivalent to memory usage, after accounting for some overhead).

Returns:
the size in bytes

cardinality

public int cardinality()
reports the number of bits set to true. Running time is proportional to compressed size (as reported by sizeInBytes).

Returns:
the number of bits set to true

toString

public String toString()
A string describing the bitmap.

Overrides:
toString in class Object
Returns:
the string

toDebugString

public String toDebugString()
A more detailed string describing the bitmap (useful for debugging).

Returns:
the string

intIterator

public IntIterator intIterator()
Iterator over the set bits (this is what most people will want to use to browse the content). The location of the set bits is returned, in increasing order.

Returns:
the int iterator

iterator

public Iterator<Integer> iterator()
iterate over the positions of the true values. This is similar to intIterator(), but it uses Java generics.

Specified by:
iterator in interface Iterable<Integer>
Returns:
the iterator

getPositions

public List<Integer> getPositions()
get the locations of the true values as one vector. (may use more memory than iterator())

Returns:
the positions

equals

public boolean equals(Object o)
Check to see whether the two compressed bitmaps contain the same data.

Overrides:
equals in class Object
See Also:
Object.equals(java.lang.Object)

hashCode

public int hashCode()
Returns a customized hash code (based on Karp-Rabin). Naturally, if the bitmaps are equal, they will hash to the same value.

Overrides:
hashCode in class Object

clone

public Object clone()
             throws CloneNotSupportedException
Overrides:
clone in class Object
Throws:
CloneNotSupportedException

readExternal

public void readExternal(ObjectInput in)
                  throws IOException
Specified by:
readExternal in interface Externalizable
Throws:
IOException

deserialize

public void deserialize(DataInput in)
                 throws IOException
Deserialize.

Parameters:
in - the DataInput stream
Throws:
IOException - Signals that an I/O exception has occurred.

writeExternal

public void writeExternal(ObjectOutput out)
                   throws IOException
Specified by:
writeExternal in interface Externalizable
Throws:
IOException

serialize

public void serialize(DataOutput out)
               throws IOException
Serialize.

Parameters:
out - the DataOutput stream
Throws:
IOException - Signals that an I/O exception has occurred.

clear

public void clear()
Clear any set bits and set size in bits back to 0



Copyright © 2012. All Rights Reserved.