Class CellWriter

  • All Implemented Interfaces:
    Closeable, AutoCloseable, Serializer

    public class CellWriter
    extends Object
    implements Serializer, Closeable

    usage:

    CellReader effectively stores a list of byte[] payloads that are retrievable randomly by index. The entirety of the data is block compressed. For reading, see CellReader. Example usage:

    
    
     StagedSerde<Fuu> fuuSerDe = new ...
     // note that cellWriter.close() *must* be called before writeTo() in order to finalize the index
     try (CellWriter cellWriter = new CellWriter.Builder(segmentWriteOutMedium).build()) {
    
        fuuList.stream().map(fuuSerDe:serialize).forEach(cellWriter::write);
      }
      // at this point cellWriter contains the index and compressed data
    
    
      // transfers the index and compressed data in the format specified below. This method is idempotent and copies
      // the data each time.
      cellWriter.writeTo(writableChannel, fileSmoosher); // 2nd argument currently unused, may be null
    
      

    Note that for use with CellReader, the contents written to the writableChannel must be available as a ByteBuffer

    Internal Storage Details

     serialized data is of the form:
    
        [cell index]
        [payload storage]
    
     each of these items is stored in compressed streams of blocks with a block index.
    
     A BlockCompressedPayloadWriter stores byte[] payloads. These may be accessed by creating a
     BlockCompressedPayloadReader over the produced ByteBuffer. Reads may be done by giving a location in the
     uncompressed stream and a size
    
     NOTE: BlockCompressedPayloadBuffer does not store nulls on write(). However, the cellIndex stores an entry
     with a size of 0 for nulls and CellReader will return null for any null written
    
      [blockIndexSize:int]
     |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
     |      block index
     |      compressed block # -> block start in compressed stream position (relative to data start)
     |
     |      0: [block position: int]
     |      1: [block position: int]
     |      ...
     |      i: [block position: int]
     |      ...
     |      n: [block position: int]
     |      n+1: [total compressed size ] // stored to simplify invariant of n+1 - n = length(n)
     |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
     [dataSize:int]
     |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
     | [compressed payload block 1]
     | [compressed payload block 2]
     | ...
     | [compressed paylod block n]
     |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
     the CellIndexWriter stores an array of longs using the BlockCompressedPayloadWriter
    
     logically this an array of longs
    
     |    0: start_0 : long
     |    1: start_1 : long
     |    ...
     |    n: start_n : long
     |    n+1: start_n + length_n : long  //ie, next position that would have been written to
     |                                   //used again for invariant of length_i = row_i+1 - row_i
     |
     |    but this will be stored as block compressed. Reads are done by addressing it as a long array of bytes
     |
     |    [block index size]
     |    [block index>
     |
     |    [data stream size]
     |    [block compressed payload stream]
    
     resulting in
    
     |    [cell index size]
     | ----cell index------------------------
     |    [block index size]
     |    [block index]
     |    [data stream size]
     |    [block compressed payload stream]
     | -------------------------------------
     |    [data stream size]
     | ----data stream------------------------
     |    [block index size]
     |    [block index]
     |    [data stream size]
     |    [block compressed payload stream]
     | -------------------------------------