Interface BuildingShardSpec<T extends ShardSpec>

  • All Superinterfaces:
    ShardSpec
    All Known Implementing Classes:
    BuildingDimensionRangeShardSpec, BuildingHashBasedNumberedShardSpec, BuildingNumberedShardSpec, BuildingSingleDimensionShardSpec

    public interface BuildingShardSpec<T extends ShardSpec>
    extends ShardSpec
    This is one of the special shardSpecs which are temporarily used during batch ingestion. In Druid, there is a concept of core partition set which is a set of segments atomically becoming queryable together in Brokers. The core partition set is represented as a range of partitionIds, i.e., [0, ShardSpec.getNumCorePartitions()). In streaming ingestion, the core partition set size cannot be determined since it's impossible to know how many segments will be created per time chunk upfront. However, in batch ingestion with time chunk locking, the core partition set is the set of segments created by an initial task or an overwriting task. Since the core partition set is determined when the task publishes segments at the end, the task postpones creating proper ShardSpec until the end. This BuildingShardSpec is used for such use case. A non-appending batch task can use this shardSpec until it publishes segments at last. When it publishes segments, it should convert the buildingShardSpec of those segments to a proper shardSpec type BuildingShardSpec. See SegmentPublisherHelper#annotateShardSpec for converting shardSpec. Note that, when the segment lock is used, the Overlord coordinates the segment allocation and this class is never used. Instead, the task sends PartialShardSpec to the Overlord to allocate a new segment. The result segment could have either a ShardSpec (for root generation segments) or an OverwriteShardSpec (for non-root generation segments). This class should be Jackson-serializable as the subtasks can send it to the parallel task in parallel ingestion. This interface doesn't really have to extend ShardSpec. The only reason is the ShardSpec is used in many places such as DataSegment, and we have to modify those places to allow other types than ShardSpec which seems pretty invasive. Maybe we could clean up this mess someday in the future.
    See Also:
    BucketNumberedShardSpec
    • Method Detail

      • getBucketId

        int getBucketId()
      • convert

        T convert​(int numCorePartitions)
      • getDomainDimensions

        default List<String> getDomainDimensions()
        Description copied from interface: ShardSpec
        Get dimensions who have possible range for the rows this shard contains.
        Specified by:
        getDomainDimensions in interface ShardSpec
        Returns:
        list of dimensions who has its possible range. Dimensions with unknown possible range are not listed
      • possibleInDomain

        default boolean possibleInDomain​(Map<String,​com.google.common.collect.RangeSet<String>> domain)
        Description copied from interface: ShardSpec
        if given domain ranges are not possible in this shard, return false; otherwise return true;
        Specified by:
        possibleInDomain in interface ShardSpec
        Returns:
        possibility of in domain