Class

org.apache.spark.sql.execution.aggregate

VectorizedHashMapGenerator

Related Doc: package aggregate

Permalink

class VectorizedHashMapGenerator extends HashMapGenerator

This is a helper class to generate an append-only vectorized hash map that can act as a 'cache' for extremely fast key-value lookups while evaluating aggregates (and fall back to the BytesToBytesMap if a given key isn't found). This is 'codegened' in HashAggregate to speed up aggregates w/ key.

It is backed by a power-of-2-sized array for index lookups and a columnar batch that stores the key-value pairs. The index lookups in the array rely on linear probing (with a small number of maximum tries) and use an inexpensive hash function which makes it really efficient for a majority of lookups. However, using linear probing and an inexpensive hash function also makes it less robust as compared to the BytesToBytesMap (especially for a large number of keys or even for certain distribution of keys) and requires us to fall back on the latter for correctness. We also use a secondary columnar batch that logically projects over the original columnar batch and is equivalent to the BytesToBytesMap aggregate buffer.

NOTE: This vectorized hash map currently doesn't support nullable keys and falls back to the BytesToBytesMap to store them.

Linear Supertypes
HashMapGenerator, AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. VectorizedHashMapGenerator
  2. HashMapGenerator
  3. AnyRef
  4. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Instance Constructors

  1. new VectorizedHashMapGenerator(ctx: CodegenContext, aggregateExpressions: Seq[AggregateExpression], generatedClassName: String, groupingKeySchema: StructType, bufferSchema: StructType)

    Permalink

Type Members

  1. case class Buffer(dataType: DataType, name: String) extends Product with Serializable

    Permalink
    Definition Classes
    HashMapGenerator

Value Members

  1. final def !=(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  4. final def asInstanceOf[T0]: T0

    Permalink
    Definition Classes
    Any
  5. val buffVars: Seq[ExprCode]

    Permalink
    Definition Classes
    HashMapGenerator
  6. val bufferValues: Seq[Buffer]

    Permalink
    Definition Classes
    HashMapGenerator
  7. def clone(): AnyRef

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  8. final def eq(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  9. def equals(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  10. def finalize(): Unit

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  11. final def genComputeHash(ctx: CodegenContext, input: String, dataType: DataType, result: String): String

    Permalink
    Attributes
    protected
    Definition Classes
    HashMapGenerator
  12. def generate(): String

    Permalink
    Definition Classes
    HashMapGenerator
  13. final def generateClose(): String

    Permalink
    Attributes
    protected
    Definition Classes
    HashMapGenerator
  14. def generateEquals(): String

    Permalink

    Generates a method that returns true if the group-by keys exist at a given index in the associated org.apache.spark.sql.execution.vectorized.ColumnarBatch.

    Generates a method that returns true if the group-by keys exist at a given index in the associated org.apache.spark.sql.execution.vectorized.ColumnarBatch. For instance, if we have 2 long group-by keys, the generated function would be of the form:

    private boolean equals(int idx, long agg_key, long agg_key1) {
      return batch.column(0).getLong(buckets[idx]) == agg_key &&
        batch.column(1).getLong(buckets[idx]) == agg_key1;
    }
    Attributes
    protected
    Definition Classes
    VectorizedHashMapGeneratorHashMapGenerator
  15. def generateFindOrInsert(): String

    Permalink

    Generates a method that returns a mutable org.apache.spark.sql.execution.vectorized.ColumnarBatch.Row which keeps track of the aggregate value(s) for a given set of keys.

    Generates a method that returns a mutable org.apache.spark.sql.execution.vectorized.ColumnarBatch.Row which keeps track of the aggregate value(s) for a given set of keys. If the corresponding row doesn't exist, the generated method adds the corresponding row in the associated org.apache.spark.sql.execution.vectorized.ColumnarBatch. For instance, if we have 2 long group-by keys, the generated function would be of the form:

    public org.apache.spark.sql.execution.vectorized.ColumnarBatch.Row findOrInsert(
        long agg_key, long agg_key1) {
      long h = hash(agg_key, agg_key1);
      int step = 0;
      int idx = (int) h & (numBuckets - 1);
      while (step < maxSteps) {
        // Return bucket index if it's either an empty slot or already contains the key
        if (buckets[idx] == -1) {
          batch.column(0).putLong(numRows, agg_key);
          batch.column(1).putLong(numRows, agg_key1);
          batch.column(2).putLong(numRows, 0);
          buckets[idx] = numRows++;
          return batch.getRow(buckets[idx]);
        } else if (equals(idx, agg_key, agg_key1)) {
          return batch.getRow(buckets[idx]);
        }
        idx = (idx + 1) & (numBuckets - 1);
        step++;
      }
      // Didn't find it
      return null;
    }
    Attributes
    protected
    Definition Classes
    VectorizedHashMapGeneratorHashMapGenerator
  16. final def generateHashFunction(): String

    Permalink

    Generates a method that computes a hash by currently xor-ing all individual group-by keys.

    Generates a method that computes a hash by currently xor-ing all individual group-by keys. For instance, if we have 2 long group-by keys, the generated function would be of the form:

    private long hash(long agg_key, long agg_key1) {
      return agg_key ^ agg_key1;
      }
    Attributes
    protected
    Definition Classes
    HashMapGenerator
  17. def generateRowIterator(): String

    Permalink
    Attributes
    protected
    Definition Classes
    VectorizedHashMapGeneratorHashMapGenerator
  18. final def getClass(): Class[_]

    Permalink
    Definition Classes
    AnyRef → Any
  19. val groupingKeySignature: String

    Permalink
    Definition Classes
    HashMapGenerator
  20. val groupingKeys: Seq[Buffer]

    Permalink
    Definition Classes
    HashMapGenerator
  21. def hashCode(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  22. def initializeAggregateHashMap(): String

    Permalink
    Attributes
    protected
    Definition Classes
    VectorizedHashMapGeneratorHashMapGenerator
  23. final def isInstanceOf[T0]: Boolean

    Permalink
    Definition Classes
    Any
  24. final def ne(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  25. final def notify(): Unit

    Permalink
    Definition Classes
    AnyRef
  26. final def notifyAll(): Unit

    Permalink
    Definition Classes
    AnyRef
  27. final def synchronized[T0](arg0: ⇒ T0): T0

    Permalink
    Definition Classes
    AnyRef
  28. def toString(): String

    Permalink
    Definition Classes
    AnyRef → Any
  29. final def wait(): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  30. final def wait(arg0: Long, arg1: Int): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  31. final def wait(arg0: Long): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )

Inherited from HashMapGenerator

Inherited from AnyRef

Inherited from Any

Ungrouped