Class GenericIndexed<T>

  • All Implemented Interfaces:
    Closeable, AutoCloseable, Iterable<T>, HotLoopCallee, CloseableIndexed<T>, Indexed<T>, Serializer

    public abstract class GenericIndexed<T>
    extends Object
    implements CloseableIndexed<T>, Serializer
    A generic, flat storage mechanism. Use static methods fromArray() or fromIterable() to construct. If input is sorted, supports binary search index lookups. If input is not sorted, only supports array-like index lookups.

    V1 Storage Format:

    byte 1: version (0x1) byte 2 == 0x1 =>; allowReverseLookup bytes 3-6 =>; numBytesUsed bytes 7-10 =>; numElements bytes 10-((numElements * 4) + 10): integers representing *end* offsets of byte serialized values bytes ((numElements * 4) + 10)-(numBytesUsed + 2): 4-byte integer representing length of value, followed by bytes for value. Length of value stored has no meaning, if next offset is strictly greater than the current offset, and if they are the same, -1 at this field means null, and 0 at this field means some object (potentially non-null - e. g. in the string case, that is serialized as an empty sequence of bytes).

    V2 Storage Format Meta, header and value files are separate and header file stored in native endian byte order. Meta File: byte 1: version (0x2) byte 2 == 0x1 =>; allowReverseLookup bytes 3-6: numberOfElementsPerValueFile expressed as power of 2. That means all the value files contains same number of items except last value file and may have fewer elements. bytes 7-10 =>; numElements bytes 11-14 =>; columnNameLength bytes 15-columnNameLength =>; columnName

    Header file name is identified as: StringUtils.format("%s_header", columnName) value files are identified as: StringUtils.format("%s_value_%d", columnName, fileNumber) number of value files == numElements/numberOfElementsPerValueFile The version EncodedStringDictionaryWriter.VERSION is reserved and must never be specified as the GenericIndexed version byte, else it will interfere with string column deserialization.

    See Also:
    GenericIndexedWriter