Class TextIndexBunchedSerializer
- java.lang.Object
-
- com.apple.foundationdb.record.provider.foundationdb.indexes.TextIndexBunchedSerializer
-
- All Implemented Interfaces:
BunchedSerializer<Tuple,List<Integer>>
@API(EXPERIMENTAL) public class TextIndexBunchedSerializer extends Object implements BunchedSerializer<Tuple,List<Integer>>
Serializer used by theTextIndexMaintainer
to write entries into aBunchedMap
. This is specifically designed for writing out the mapping from document ID to position list. As a result, it requires that the lists it serializes be monotonically increasing non-negative integers (which is true for position lists). This allows it to delta compress the integers in its list, which can be a significant space savings.Keys are serialized using default
Tuple
packing. Bunches are serialized as follows: each bunch begins with a prefix (that can be used to version the serialization format), and then each entry in the bunch is serialized by writing the length of the key (using a base-128 variable length integer encoding) followed by the serialized key bytes followed by the length of the serialized position list followed by the (delta compressed) entries of each position list. Additionally, the key of the first entry in the bunch is omitted as that can be determined by using the sign-post key within theBunchedMap
.For example, suppose one attempts to serialize two entries into a single bunch, one with key
(1066,)
and value[1, 3, 5, 8]
and another with key(1415,)
and value[0, 600, 605]
. The tuple(1066,)
serializes to16 04 2A
(in hex), and the tuple(1415,)
serializes to16 05 87
. Most of the deltas are small, but600
is encoded by taking its binary representation,1001011000
, and separating the lower order groups of 7 bits into their own bytes and then using the most significant bit as a continuation flag, so it becomes10000100 01011000 = 84 58
. So, the full entry is (with20
as the prefix):20 (04 (01 02 02 03)) (03 (16 05 87) 04 (00 (84 58) 05))
The parentheses are added for clarity and separate each entry as well as grouping variable length integers together. Note that to add a new entry to the end of a serialized list, one can take the serialized entry and append it to the end of that list rather than deserializing the entry list, appending the new entry, and then serializing the new list.
- See Also:
TextIndexMaintainer
,BunchedMap
-
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description boolean
canAppend()
Returntrue
as this serialization format supports appending.List<Map.Entry<Tuple,List<Integer>>>
deserializeEntries(Tuple key, byte[] data)
Deserializes an entry list from bytes.Tuple
deserializeKey(byte[] data, int offset, int length)
Deserializes a key using standardTuple
unpacking.List<Tuple>
deserializeKeys(Tuple key, byte[] data)
Deserializes the keys from a serialized entry list.static TextIndexBunchedSerializer
instance()
Get the serializer singleton.byte[]
serializeEntries(List<Map.Entry<Tuple,List<Integer>>> entries)
Packs an entry list into a single byte array.byte[]
serializeEntry(Tuple key, List<Integer> value)
Packs a key and value into a byte array.byte[]
serializeKey(Tuple key)
Packs a key using standardTuple
encoding.-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
-
Methods inherited from interface com.apple.foundationdb.map.BunchedSerializer
deserializeKey, deserializeKey, serializeEntry
-
-
-
-
Method Detail
-
instance
public static TextIndexBunchedSerializer instance()
Get the serializer singleton. This serializer maintains no state between serializing different values, so it is safe to maintain as a singleton.- Returns:
- the
TextIndexBunchedSerializer
singleton
-
serializeKey
@Nonnull public byte[] serializeKey(@Nonnull Tuple key)
Packs a key using standardTuple
encoding. Note thatTuple
s pack in a way that preserves order, which is a requirement ofBunchedSerializer
s.- Specified by:
serializeKey
in interfaceBunchedSerializer<Tuple,List<Integer>>
- Parameters:
key
- key to serialize to bytes- Returns:
- the key packed to bytes
- Throws:
BunchedSerializationException
- if packing the tuple fails
-
serializeEntry
@Nonnull public byte[] serializeEntry(@Nonnull Tuple key, @Nonnull List<Integer> value)
Packs a key and value into a byte array. This will write out the tuple and position list in a way consistent with the way each entry is serialized byserializeEntries(List)
. Because this serializer supports appending, one can take the output of this function and append it to the end of an already serialized entry list to produce the serialized form of that list with this entry appended to the end.- Specified by:
serializeEntry
in interfaceBunchedSerializer<Tuple,List<Integer>>
- Parameters:
key
- the key of the map entryvalue
- the value of the map entry- Returns:
- the serialized map entry
- Throws:
BunchedSerializationException
- if the value is not monotonically increasing non-negative integers or if packing the tuple fails
-
serializeEntries
@Nonnull public byte[] serializeEntries(@Nonnull List<Map.Entry<Tuple,List<Integer>>> entries)
Packs an entry list into a single byte array. This does so by combining the serialized forms of each key and value in the entry list with their lengths. Their is a more in-depth explanation of the serialization format in the class-level documentation.- Specified by:
serializeEntries
in interfaceBunchedSerializer<Tuple,List<Integer>>
- Parameters:
entries
- the list of entries to serialize- Returns:
- the serialized entry list
- Throws:
BunchedSerializationException
- if the entries are invalid such as if the list is empty or contains a list that is not monotonically increasing
-
deserializeKey
@Nonnull public Tuple deserializeKey(@Nonnull byte[] data, int offset, int length)
Deserializes a key using standardTuple
unpacking.- Specified by:
deserializeKey
in interfaceBunchedSerializer<Tuple,List<Integer>>
- Parameters:
data
- source data to deserializeoffset
- beginning offset of serialized key (indexed from 0)length
- length of serialized key- Returns:
- the deserialized key
- Throws:
BunchedSerializationException
- if the byte array is malformed
-
deserializeEntries
@Nonnull public List<Map.Entry<Tuple,List<Integer>>> deserializeEntries(@Nonnull Tuple key, @Nonnull byte[] data)
Deserializes an entry list from bytes.- Specified by:
deserializeEntries
in interfaceBunchedSerializer<Tuple,List<Integer>>
- Parameters:
key
- key under which the serialized entry list was storeddata
- source list to deserialize- Returns:
- the deserialized entry list
- Throws:
BunchedSerializationException
- if the byte array is malformed
-
deserializeKeys
@Nonnull public List<Tuple> deserializeKeys(@Nonnull Tuple key, @Nonnull byte[] data)
Deserializes the keys from a serialized entry list. Because the serialization format contains markers with the length of the entries, it can skip the position list while reading through the data, so it is more efficient (in terms of memory and space) to call this method thandeserializeEntries()
if one only needs to know the keys.- Specified by:
deserializeKeys
in interfaceBunchedSerializer<Tuple,List<Integer>>
- Parameters:
key
- key under which the serialized entry list was storeddata
- source data to deserialize- Returns:
- the list of keys in the serialized data array
- Throws:
BunchedSerializationException
- if the byte array is malformed
-
canAppend
public boolean canAppend()
Returntrue
as this serialization format supports appending.- Specified by:
canAppend
in interfaceBunchedSerializer<Tuple,List<Integer>>
- Returns:
true
-
-