Interface ColumnConfig

    • Field Detail

      • DEFAULT_SKIP_VALUE_RANGE_INDEX_SCALE

        static final double DEFAULT_SKIP_VALUE_RANGE_INDEX_SCALE
        this value was chosen testing bound filters on double columns with a variety of ranges at which this ratio of number of bitmaps compared to total number of rows appeared to be around the threshold where indexes stopped performing consistently faster than a full scan + value matcher
        See Also:
        Constant Field Values
      • DEFAULT_SKIP_VALUE_PREDICATE_INDEX_SCALE

        static final double DEFAULT_SKIP_VALUE_PREDICATE_INDEX_SCALE
        See Also:
        Constant Field Values
      • ALWAYS_USE_INDEXES

        static final ColumnConfig ALWAYS_USE_INDEXES
    • Method Detail

      • skipValueRangeIndexScale

        default double skipValueRangeIndexScale()
        If the total number of rows in a column multiplied by this value is smaller than the total number of bitmap index operations required to perform to use LexicographicalRangeIndexes or NumericRangeIndexes, then for any ColumnIndexSupplier which chooses to participate in this config it will skip computing the index, indicated by a return value of null from the 'forRange' methods, to force the filter to be processed with a scan using a ValueMatcher instead.

        For range indexes on columns where every value has an index, the number of bitmap operations is determined by how many individual values fall in the range, a subset of the columns total cardinality.

        Currently only the NestedCommonFormatColumn implementations of ColumnIndexSupplier support this behavior.

        This can make some standalone filters faster in cases where the overhead of walking the value dictionary and combining bitmaps to construct a BitmapOffset or BitmapVectorOffset can exceed the cost of just using doing a full scan and using a ValueMatcher.

        Where this is especially useful is in cases where the range index is used as part of some AndFilter, which segment processing partitions into groups of 'pre' filters, composed of those which should use indexes, and 'post' filters, which should use a matcher on the offset created by the indexes to filter the remaining results. This value pushes what would have been expensive index computations to go into the 'pre' group into using a value matcher as part of the 'post' group instead, sometimes providing an order of magnitude or higher performance increase.

      • skipValuePredicateIndexScale

        default double skipValuePredicateIndexScale()
        If the total number of rows in a column multiplied by this value is smaller than the total number of bitmap index operations required to perform to use DruidPredicateIndexes then for any ColumnIndexSupplier which chooses to participate in this config it will skip computing the index, in favor of doing a full scan and using a ValueMatcher instead. This is indicated returning null from ColumnIndexSupplier.as(Class) even though it would have otherwise been able to create a BitmapColumnIndex. For predicate indexes, this is determined by the total value cardinality of the column for columns with an index for every value.

        Currently only the NestedCommonFormatColumn implementations of ColumnIndexSupplier support this behavior.

        This can make some standalone filters faster in cases where the overhead of walking the value dictionary and combining bitmaps to construct a BitmapOffset or BitmapVectorOffset can exceed the cost of just using doing a full scan and using a ValueMatcher.

        Where this is especially useful is in cases where the predicate index is used as part of some AndFilter, which segment processing partitions into groups of 'pre' filters, composed of those which should use indexes, and 'post' filters, which should use a matcher on the offset created by the indexes to filter the remaining results. This value pushes what would have been expensive index computations to go into the 'pre' group into using a value matcher as part of the 'post' group instead, sometimes providing an order of magnitude or higher performance increase.

        This value is separate from skipValueRangeIndexScale() since the dynamics of computing predicate indexes is potentially different than the much cheaper range calculations (especially for numeric values), so having a separate control knob allows for corrections to be done to tune things separately from ranges.