CellReader is intended to read the data written byte
CellWriter
. The
CellWriter.writeTo()
method's
output must be made available as a ByteBuffer. While this provides relatively efficient random access, it is
optimized for sequential access by caching the last decompressed block in both the index (which is
block-compressed) and the data.
A random access incurs the following costs:
1. seek to compressed block location in index
2. decompress index block
3. read data location
4. decompress data block
5. wrap or copy data from uncompressed block (copy for data that spans more than one block)
Sequential access amortizes the decompression cost by storing the last decompressed block (cache of size 1,
effectively).
Note also that the index itself is compressed, so random accesses potentially incur an additional decompression
step for large datasets.
{@code
ByteBuffer byteBuffer = ....; // ByteBuffer created from writableChannel output of CellWriter.writeTo()
try (CellRead cellReader = new CellReader.Builder(byteBuffer).build()) {
for (int i = 0; i < numPayloads; i++) {
byte[] payload = cellReader.getCell(i);
processPayload(payload); // may deserialize and peform work
}
}
While you may allocate your own 64k buffers, it is recommended you use {@code NativeClearedByteBufferProvider}
which provides direct 64k ByteBuffers from a pool, wrapped in a ResourceHolder. These objects may be
registered in a Closer
To enhance future random accesss, a decompressed block cache may be added of some size k (=10, etc)
At present, we effecitively have a block cache of size 1