Class StringDimensionIndexer

    • Constructor Detail

      • StringDimensionIndexer

        public StringDimensionIndexer​(DimensionSchema.MultiValueHandling multiValueHandling,
                                      boolean hasBitmapIndexes,
                                      boolean hasSpatialIndexes,
                                      boolean useMaxMemoryEstimates)
    • Method Detail

      • processRowValsToUnsortedEncodedKeyComponent

        public EncodedKeyComponent<int[]> processRowValsToUnsortedEncodedKeyComponent​(@Nullable
                                                                                      Object dimValues,
                                                                                      boolean reportParseExceptions)
        Description copied from interface: DimensionIndexer
        Encodes the given row value(s) of the dimension to be used within a row key. It also updates the internal state of the DimensionIndexer, e.g. the dimLookup.

        For example, the dictionary-encoded String-type column will return an int[] containing dictionary IDs.

        Parameters:
        dimValues - Value(s) of the dimension in a row. This can either be a single value or a list of values (for multi-valued dimensions)
        reportParseExceptions - true if parse exceptions should be reported, false otherwise
        Returns:
        Encoded dimension value(s) to be used as a component for the row key. Contains an object of the DimensionIndexer and the effective size of the key component in bytes.
      • compareUnsortedEncodedKeyComponents

        public int compareUnsortedEncodedKeyComponents​(int[] lhs,
                                                       int[] rhs)
        Description copied from interface: DimensionIndexer
        Compares the row values for this DimensionIndexer's dimension from a Row key. The dimension value arrays within a Row key always use the "unsorted" ordering for encoded values. The row values are passed to this function as an Object, the implementer should cast them to the type appropriate for this dimension. For example, a dictionary encoded String implementation would cast the Objects as int[] arrays. When comparing, if the two arrays have different lengths, the shorter array should be ordered first. Otherwise, the implementer of this function should iterate through the unsorted encoded values, converting them to their actual type (e.g., performing a dictionary lookup for a dict-encoded String dimension), and comparing the actual values until a difference is found. Refer to StringDimensionIndexer.compareUnsortedEncodedKeyComponents() for a reference implementation. The comparison rules used by this method should match the rules used by DimensionHandler.getEncodedValueSelectorComparator(), otherwise incorrect ordering/merging of rows can occur during ingestion, causing issues such as imperfect rollup.
        Parameters:
        lhs - dimension value array from a Row key
        rhs - dimension value array from a Row key
        Returns:
        comparison of the two arrays
      • checkUnsortedEncodedKeyComponentsEqual

        public boolean checkUnsortedEncodedKeyComponentsEqual​(int[] lhs,
                                                              int[] rhs)
        Description copied from interface: DimensionIndexer
        Check if two row value arrays from Row keys are equal.
        Parameters:
        lhs - dimension value array from a Row key
        rhs - dimension value array from a Row key
        Returns:
        true if the two arrays are equal
      • getUnsortedEncodedKeyComponentHashCode

        public int getUnsortedEncodedKeyComponentHashCode​(int[] key)
        Description copied from interface: DimensionIndexer
        Given a row value array from a Row key, generate a hashcode.
        Parameters:
        key - dimension value array from a Row key
        Returns:
        hashcode of the array
      • makeDimensionSelector

        public DimensionSelector makeDimensionSelector​(DimensionSpec spec,
                                                       IncrementalIndexRowHolder currEntry,
                                                       IncrementalIndex.DimensionDesc desc)
        Description copied from interface: DimensionIndexer
        Return an object used to read values from this indexer's column as Strings.
        Parameters:
        spec - Specifies the output name of a dimension and any extraction functions to be applied.
        currEntry - Provides access to the current Row object in the Cursor
        desc - Descriptor object for this dimension within an IncrementalIndex
        Returns:
        A new object that reads rows from currEntry
      • convertUnsortedEncodedKeyComponentToActualList

        @Nullable
        public Object convertUnsortedEncodedKeyComponentToActualList​(int[] key)
        Description copied from interface: DimensionIndexer
        Given a row value array from a Row key, as described in the documentation for DimensionIndexer.compareUnsortedEncodedKeyComponents(EncodedKeyComponentType, EncodedKeyComponentType), convert the unsorted encoded values to a list of actual values. If the key has one element, this method should return a single Object instead of a list.
        Parameters:
        key - dimension value array from a Row key
        Returns:
        single value or list containing the actual values corresponding to the encoded values in the input array
      • fillBitmapsFromUnsortedEncodedKeyComponent

        public void fillBitmapsFromUnsortedEncodedKeyComponent​(int[] key,
                                                               int rowNum,
                                                               MutableBitmap[] bitmapIndexes,
                                                               BitmapFactory factory)
        Description copied from interface: DimensionIndexer
        Helper function for building bitmap indexes for integer-encoded dimensions. Called by IncrementalIndexAdapter as it iterates through its sequence of rows. Given a row value array from a Row key, with the current row number indicated by "rowNum", set the index for "rowNum" in the bitmap index for each value that appears in the row value array. For example, if key is an int[] array with values [1,3,4] for a dictionary-encoded String dimension, and rowNum is 27, this function would set bit 27 in bitmapIndexes[1], bitmapIndexes[3], and bitmapIndexes[4] See StringDimensionIndexer.fillBitmapsFromUnsortedEncodedKeyComponent() for a reference implementation. If a dimension type does not support bitmap indexes, this function will not be called and can be left unimplemented.
        Parameters:
        key - dimension value array from a Row key
        rowNum - current row number
        bitmapIndexes - array of bitmaps, indexed by integer dimension value
        factory - bitmap factory