Class GenericIndexed<T>
- java.lang.Object
-
- org.apache.druid.segment.data.GenericIndexed<T>
-
- All Implemented Interfaces:
Closeable
,AutoCloseable
,Iterable<T>
,HotLoopCallee
,CloseableIndexed<T>
,Indexed<T>
,Serializer
public abstract class GenericIndexed<T> extends Object implements CloseableIndexed<T>, Serializer
A generic, flat storage mechanism. Use static methods fromArray() or fromIterable() to construct. If input is sorted, supports binary search index lookups. If input is not sorted, only supports array-like index lookups.V1 Storage Format:
byte 1: version (0x1) byte 2 == 0x1 =>; allowReverseLookup bytes 3-6 =>; numBytesUsed bytes 7-10 =>; numElements bytes 10-((numElements * 4) + 10): integers representing *end* offsets of byte serialized values bytes ((numElements * 4) + 10)-(numBytesUsed + 2): 4-byte integer representing length of value, followed by bytes for value. Length of value stored has no meaning, if next offset is strictly greater than the current offset, and if they are the same, -1 at this field means null, and 0 at this field means some object (potentially non-null - e. g. in the string case, that is serialized as an empty sequence of bytes).
V2 Storage Format Meta, header and value files are separate and header file stored in native endian byte order. Meta File: byte 1: version (0x2) byte 2 == 0x1 =>; allowReverseLookup bytes 3-6: numberOfElementsPerValueFile expressed as power of 2. That means all the value files contains same number of items except last value file and may have fewer elements. bytes 7-10 =>; numElements bytes 11-14 =>; columnNameLength bytes 15-columnNameLength =>; columnName
Header file name is identified as: StringUtils.format("%s_header", columnName) value files are identified as: StringUtils.format("%s_value_%d", columnName, fileNumber) number of value files == numElements/numberOfElementsPerValueFile The version
EncodedStringDictionaryWriter.VERSION
is reserved and must never be specified as theGenericIndexed
version byte, else it will interfere with string column deserialization.- See Also:
GenericIndexedWriter
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description class
GenericIndexed.BufferIndexed
Single-threaded view.
-
Field Summary
Fields Modifier and Type Field Description protected boolean
allowReverseLookup
protected int
size
protected ObjectStrategy<T>
strategy
static ObjectStrategy<String>
STRING_STRATEGY
static ObjectStrategy<ByteBuffer>
UTF8_STRATEGY
An ObjectStrategy that returns a big-endian ByteBuffer pointing to original data.
-
Constructor Summary
Constructors Constructor Description GenericIndexed(ObjectStrategy<T> strategy, boolean allowReverseLookup, int size)
-
Method Summary
All Methods Static Methods Instance Methods Abstract Methods Concrete Methods Modifier and Type Method Description protected void
checkIndex(int index)
Checks ifindex
a valid `element index` in GenericIndexed.void
close()
protected T
copyBufferAndGet(ByteBuffer valueBuffer, int startOffset, int endOffset)
static <T> GenericIndexed<T>
fromArray(T[] objects, ObjectStrategy<T> strategy)
static <T> GenericIndexed<T>
fromIterable(Iterable<T> objectsIterable, ObjectStrategy<T> strategy)
Class<? extends T>
getClazz()
abstract long
getSerializedSize()
Returns the number of bytes, that this Serializer will write to the output _channel_ (not smoosher) on aSerializer.writeTo(java.nio.channels.WritableByteChannel, org.apache.druid.java.util.common.io.smoosh.FileSmoosher)
call.int
indexOf(T value)
Returns the index of "value" in this GenericIndexed object, or (-(insertion point) - 1) if the value is not present, in the manner of Arrays.binarySearch.boolean
isSorted()
Indicates if this value set is sorted, the implication being that the contract ofIndexed.indexOf(T)
is strenthened to return a negative number equal to (-(insertion point) - 1) when the value is not present in the set.Iterator<T>
iterator()
static GenericIndexed<ResourceHolder<ByteBuffer>>
ofCompressedByteBuffers(Iterable<ByteBuffer> buffers, CompressionStrategy compression, int bufferSize, ByteOrder order, Closer closer)
static <T> GenericIndexed<T>
read(ByteBuffer buffer, ObjectStrategy<T> strategy)
static <T> GenericIndexed<T>
read(ByteBuffer buffer, ObjectStrategy<T> strategy, SmooshedFileMapper fileMapper)
abstract GenericIndexed.BufferIndexed
singleThreaded()
int
size()
Number of elements in the value set-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
-
Methods inherited from interface org.apache.druid.query.monomorphicprocessing.HotLoopCallee
inspectRuntimeShape
-
Methods inherited from interface java.lang.Iterable
forEach, spliterator
-
Methods inherited from interface org.apache.druid.segment.serde.Serializer
writeTo
-
-
-
-
Field Detail
-
UTF8_STRATEGY
public static final ObjectStrategy<ByteBuffer> UTF8_STRATEGY
An ObjectStrategy that returns a big-endian ByteBuffer pointing to original data. The returned ByteBuffer is a fresh read-only instance, so it is OK for callers to modify its position, limit, etc. However, it does point to the original data, so callers must take care not to use it if the original data may have been freed. The compare method of this instance usesStringUtils.compareUtf8UsingJavaStringOrdering(byte[], byte[])
so that behavior is consistent withSTRING_STRATEGY
.
-
STRING_STRATEGY
public static final ObjectStrategy<String> STRING_STRATEGY
-
strategy
protected final ObjectStrategy<T> strategy
-
allowReverseLookup
protected final boolean allowReverseLookup
-
size
protected final int size
-
-
Constructor Detail
-
GenericIndexed
public GenericIndexed(ObjectStrategy<T> strategy, boolean allowReverseLookup, int size)
-
-
Method Detail
-
read
public static <T> GenericIndexed<T> read(ByteBuffer buffer, ObjectStrategy<T> strategy)
-
read
public static <T> GenericIndexed<T> read(ByteBuffer buffer, ObjectStrategy<T> strategy, SmooshedFileMapper fileMapper)
-
fromArray
public static <T> GenericIndexed<T> fromArray(T[] objects, ObjectStrategy<T> strategy)
-
ofCompressedByteBuffers
public static GenericIndexed<ResourceHolder<ByteBuffer>> ofCompressedByteBuffers(Iterable<ByteBuffer> buffers, CompressionStrategy compression, int bufferSize, ByteOrder order, Closer closer)
-
fromIterable
public static <T> GenericIndexed<T> fromIterable(Iterable<T> objectsIterable, ObjectStrategy<T> strategy)
-
singleThreaded
public abstract GenericIndexed.BufferIndexed singleThreaded()
-
getSerializedSize
public abstract long getSerializedSize()
Description copied from interface:Serializer
Returns the number of bytes, that this Serializer will write to the output _channel_ (not smoosher) on aSerializer.writeTo(java.nio.channels.WritableByteChannel, org.apache.druid.java.util.common.io.smoosh.FileSmoosher)
call.- Specified by:
getSerializedSize
in interfaceSerializer
-
checkIndex
protected void checkIndex(int index)
Checks ifindex
a valid `element index` in GenericIndexed. Similar to Preconditions.checkElementIndex() except this method throwsIAE
with custom error message.Used here to get existing behavior(same error message and exception) of V1 GenericIndexed.
- Parameters:
index
- index identifying an element of an GenericIndexed.
-
size
public int size()
Description copied from interface:Indexed
Number of elements in the value set
-
indexOf
public int indexOf(@Nullable T value)
Returns the index of "value" in this GenericIndexed object, or (-(insertion point) - 1) if the value is not present, in the manner of Arrays.binarySearch. This strengthens the contract of Indexed, which only guarantees that values-not-found will return some negative number.
-
isSorted
public boolean isSorted()
Description copied from interface:Indexed
Indicates if this value set is sorted, the implication being that the contract ofIndexed.indexOf(T)
is strenthened to return a negative number equal to (-(insertion point) - 1) when the value is not present in the set.
-
copyBufferAndGet
@Nullable protected T copyBufferAndGet(ByteBuffer valueBuffer, int startOffset, int endOffset)
-
close
public void close()
- Specified by:
close
in interfaceAutoCloseable
- Specified by:
close
in interfaceCloseable
-
-