java.lang.Object

io.confluent.parallelconsumer.offsets.OffsetSimultaneousEncoder

public class OffsetSimultaneousEncoder extends Object

Encode with multiple strategies at the same time.

Have results in an accessible structure, easily selecting the highest compression.

See Also:

invoke()

Field Summary

Fields

Modifier and Type

Field

Description

static final String

COMPRESSION_FORCED_RESOURCE_LOCK

Used to prevent tests running in parallel that depends on setting static state in this class.

static boolean

compressionForced

Force the encoder to also add the compressed versions.

static final int

LARGE_INPUT_MAP_SIZE_THRESHOLD

Size threshold in bytes after which compressing the encodings will be compared, as it seems to be typically worth the extra compression step when beyond this size in the source array.
Constructor Summary

Constructors

Constructor

Description

OffsetSimultaneousEncoder(long baseOffsetToCommit, long highestSucceededOffset, Set<Long> incompleteOffsets)
Method Summary

Modifier and Type

Method

Description

Map<OffsetEncoding,byte[]>

getEncodingMap()

Map of different encoding types for the same offset data, used for retrieving the data for the encoding type

Set<Long>

getIncompleteOffsets()

The offsets which have not yet been fully completed and can't have their offset committed

SortedSet<EncodedOffsetPair>

getSortedEncodings()

Ordered set of the different encodings, used to quickly retrieve the most compressed encoding

OffsetSimultaneousEncoder

invoke()

Highwater mark already encoded in string - OffsetMapCodecManager.makeOffsetMetadataPayload(long, io.confluent.parallelconsumer.state.PartitionState<K, V>) - so encoding BitSet run length may not be needed, or could be swapped

byte[]

packSmallest()

Select the smallest encoding, and pack it.

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Field Details
- LARGE_INPUT_MAP_SIZE_THRESHOLD
  
  public static final int LARGE_INPUT_MAP_SIZE_THRESHOLD
  
  Size threshold in bytes after which compressing the encodings will be compared, as it seems to be typically worth the extra compression step when beyond this size in the source array.
  See Also:
  
  Constant Field Values
- compressionForced
  
  public static boolean compressionForced
  
  Force the encoder to also add the compressed versions. Useful for testing.
  Visible for testing.
- COMPRESSION_FORCED_RESOURCE_LOCK
  
  public static final String COMPRESSION_FORCED_RESOURCE_LOCK
  
  Used to prevent tests running in parallel that depends on setting static state in this class. Manipulation of static state in tests needs to be removed to this isn't necessary.
  See Also:
  
  Constant Field Values
Constructor Details
- OffsetSimultaneousEncoder
  
  public OffsetSimultaneousEncoder(long baseOffsetToCommit, long highestSucceededOffset, Set<Long> incompleteOffsets)
Method Details
- invoke
  
  public OffsetSimultaneousEncoder invoke()
  Highwater mark already encoded in string - OffsetMapCodecManager.makeOffsetMetadataPayload(long, io.confluent.parallelconsumer.state.PartitionState<K, V>) - so encoding BitSet run length may not be needed, or could be swapped
  Simultaneously encodes:
  
  OffsetEncoding.BitSet
  
  OffsetEncoding.RunLength
  
  Conditionaly encodes compression variants:
  
  OffsetEncoding.BitSetCompressed
  
  OffsetEncoding.RunLengthCompressed
  
  Currently commented out is OffsetEncoding.ByteArray as there doesn't seem to be an advantage over BitSet encoding.
  TODO: optimisation - inline this into the partition iteration loop in WorkManager
  TODO: optimisation - could double the run-length range from Short.MAX_VALUE (~33,000) to Short.MAX_VALUE * 2 (~66,000) by using unsigned shorts instead (higest representable relative offset is Short.MAX_VALUE because each runlength entry is a Short)
  TODO VERY large offests ranges are slow (Integer.MAX_VALUE) - encoding scans could be avoided if passing in map of incompletes which should already be known
- packSmallest
  
  public byte[] packSmallest() throws NoEncodingPossibleException
  
  Select the smallest encoding, and pack it.
  Throws:
  
  NoEncodingPossibleException
  
  See Also:
  
  packEncoding(EncodedOffsetPair)
- getIncompleteOffsets
  
  public Set<Long> getIncompleteOffsets()
  
  The offsets which have not yet been fully completed and can't have their offset committed
- getEncodingMap
  
  public Map<OffsetEncoding,byte[]> getEncodingMap()
  
  Map of different encoding types for the same offset data, used for retrieving the data for the encoding type
- getSortedEncodings
  
  public SortedSet<EncodedOffsetPair> getSortedEncodings()
  
  Ordered set of the different encodings, used to quickly retrieve the most compressed encoding
  See Also:
  
  packSmallest()

Class OffsetSimultaneousEncoder

Field Summary

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Field Details

LARGE_INPUT_MAP_SIZE_THRESHOLD

compressionForced

COMPRESSION_FORCED_RESOURCE_LOCK

Constructor Details

OffsetSimultaneousEncoder

Method Details

invoke

packSmallest

getIncompleteOffsets

getEncodingMap

getSortedEncodings