All Superinterfaces:

ShardSpec

All Known Implementing Classes:

DimensionRangeBucketShardSpec, HashBucketShardSpec, SingleDimensionRangeBucketShardSpec
```
public interface BucketNumberedShardSpec<T extends BuildingShardSpec>
extends ShardSpec
```
This is one of the special shardSpecs which are temporarily used during batch ingestion. In Druid, there is a concept of core partition set which is a set of segments atomically becoming queryable together in Brokers. The core partition set is represented as a range of partitionIds, i.e., [0, ShardSpec.getNumCorePartitions()). When you run a batch ingestion task with a non-linear partitioning scheme, the task populates all possible buckets upfront at the beginning (see CachingLocalSegmentAllocator) and uses them to partition input rows. However, some of the buckets can be empty even after the task consumes all inputs if the data is highly skewed. Since Druid doesn't create empty segments, the partitionId should be dynamically allocated when a bucket is actually in use, so that we can always create the packed core partition set without missing partitionIds. This BucketNumberedShardSpec is used for such use case. The task with a non-linear partitioning scheme uses it to postpone the partitionId allocation until all empty buckets are identified. See ParallelIndexSupervisorTask.groupGenericPartitionLocationsPerPartition and CachingLocalSegmentAllocator for parallel and sequential ingestion, respectively. Note that SegmentId requires the partitionId. Since the segmentId is used everwhere during ingestion, this class should implement getPartitionNum() which returns the bucketId. This should be fine because the segmentId is only used to identify each segment until pushing them to deep storage. The bucketId should be enough to uniquely identify each segment. However, when pushing segments to deep storage, the partitionId is used to create the path to store the segment on deep storage (DataSegmentPusher.getDefaultStorageDir(org.apache.druid.timeline.DataSegment, boolean) which should be correct. As a result, this shardSpec should not be used in pushing segments. This class should be Jackson-serializable as the subtasks can send it to the parallel task in parallel ingestion. This interface doesn't really have to extend ShardSpec. The only reason is the ShardSpec is used in many places such as DataSegment, and we have to modify those places to allow other types than ShardSpec which seems pretty invasive. Maybe we could clean up this mess someday in the future.

See Also:

BuildingShardSpec

Nested Class Summary
- Nested classes/interfaces inherited from interface org.apache.druid.timeline.partition.ShardSpec
  ShardSpec.Type

Method Summary

All Methods Instance Methods Abstract Methods Default Methods
Modifier and Type	Method	Description
`T`	`convert(int partitionId)`
`default <O> PartitionChunk<O>`	`createChunk(O obj)`
`int`	`getBucketId()`
`default List<String>`	`getDomainDimensions()`	Get dimensions who have possible range for the rows this shard contains.
`default int`	`getNumCorePartitions()`
`default int`	`getPartitionNum()`	Returns the partition ID of this segment.
`default boolean`	`possibleInDomain(Map<String,com.google.common.collect.RangeSet<String>> domain)`	if given domain ranges are not possible in this shard, return false; otherwise return true;

Methods inherited from interface org.apache.druid.timeline.partition.ShardSpec
getAtomicUpdateGroupSize, getEndRootPartitionId, getLookup, getMinorVersion, getStartRootPartitionId, getType, sharePartitionSpace

- Method Detail
  - getBucketId
```
int getBucketId()
```
  - convert
```
T convert(int partitionId)
```
  - createChunk
```
default <O> PartitionChunk<O> createChunk(O obj)
```
    Specified by:
    
    createChunk in interface ShardSpec
  - getPartitionNum
```
default int getPartitionNum()
```
    Description copied from interface: ShardSpec
    
    Returns the partition ID of this segment.
    
    Specified by:
    
    getPartitionNum in interface ShardSpec
  - getNumCorePartitions
```
default int getNumCorePartitions()
```
    Specified by:
    
    getNumCorePartitions in interface ShardSpec
  - getDomainDimensions
```
default List<String> getDomainDimensions()
```
    Description copied from interface: ShardSpec
    
    Get dimensions who have possible range for the rows this shard contains.
    
    Specified by:
    
    getDomainDimensions in interface ShardSpec
    
    Returns:
    
    list of dimensions who has its possible range. Dimensions with unknown possible range are not listed
  - possibleInDomain
```
default boolean possibleInDomain(Map<String,com.google.common.collect.RangeSet<String>> domain)
```
    Description copied from interface: ShardSpec
    
    if given domain ranges are not possible in this shard, return false; otherwise return true;
    
    Specified by:
    
    possibleInDomain in interface ShardSpec
    
    Returns:
    
    possibility of in domain

Interface BucketNumberedShardSpec<T extends BuildingShardSpec>

Nested Class Summary

Nested classes/interfaces inherited from interface org.apache.druid.timeline.partition.ShardSpec

Method Summary

Methods inherited from interface org.apache.druid.timeline.partition.ShardSpec

Method Detail

getBucketId

convert

createChunk

getPartitionNum

getNumCorePartitions

getDomainDimensions

possibleInDomain