org.apache.spark.sql.catalyst.expressions.aggregate
HyperLogLogPlusPlus
Companion object HyperLogLogPlusPlus
case class HyperLogLogPlusPlus(child: Expression, relativeSD: Double = 0.05, mutableAggBufferOffset: Int = 0, inputAggBufferOffset: Int = 0) extends ImperativeAggregate with Product with Serializable
HyperLogLog++ (HLL++) is a state of the art cardinality estimation algorithm. This class implements the dense version of the HLL++ algorithm as an Aggregate Function.
This implementation has been based on the following papers: HyperLogLog: the analysis of a near-optimal cardinality estimation algorithm http://algo.inria.fr/flajolet/Publications/FlFuGaMe07.pdf
HyperLogLog in Practice: Algorithmic Engineering of a State of The Art Cardinality Estimation Algorithm http://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/en/us/pubs/archive/40671.pdf
Appendix to HyperLogLog in Practice: Algorithmic Engineering of a State of the Art Cardinality Estimation Algorithm https://docs.google.com/document/d/1gyjfMHy43U9OWBXxfaeG-3MjGzejW1dlpyMwEYAAWEI/view?fullscreen#
- child
to estimate the cardinality of.
- relativeSD
the maximum estimation error allowed.
- Annotations
- @ExpressionDescription()
- Alphabetic
- By Inheritance
- HyperLogLogPlusPlus
- Serializable
- Serializable
- ImperativeAggregate
- CodegenFallback
- AggregateFunction
- Expression
- TreeNode
- Product
- Equals
- AnyRef
- Any
- Hide All
- Show All
- Public
- All
Instance Constructors
- new HyperLogLogPlusPlus(child: Expression, relativeSD: Expression)
- new HyperLogLogPlusPlus(child: Expression)
-
new
HyperLogLogPlusPlus(child: Expression, relativeSD: Double = 0.05, mutableAggBufferOffset: Int = 0, inputAggBufferOffset: Int = 0)
- child
to estimate the cardinality of.
- relativeSD
the maximum estimation error allowed.
Value Members
-
final
def
!=(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
final
def
##(): Int
- Definition Classes
- AnyRef → Any
-
final
def
==(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
val
aggBufferAttributes: Seq[AttributeReference]
Allocate enough words to store all registers.
Allocate enough words to store all registers.
- Definition Classes
- HyperLogLogPlusPlus → AggregateFunction
-
def
aggBufferSchema: StructType
The schema of the aggregation buffer.
The schema of the aggregation buffer.
- Definition Classes
- HyperLogLogPlusPlus → AggregateFunction
-
def
apply(number: Int): TreeNode[_]
Returns the tree node at the specified number, used primarily for interactive debugging.
Returns the tree node at the specified number, used primarily for interactive debugging. Numbers for each node can be found in the numberedTreeString.
Note that this cannot return BaseType because logical plan's plan node might return physical plan for innerChildren, e.g. in-memory relation logical plan node has a reference to the physical plan node it is referencing.
- Definition Classes
- TreeNode
-
def
argString(maxFields: Int): String
Returns a string representing the arguments to this node, minus any children
Returns a string representing the arguments to this node, minus any children
- Definition Classes
- TreeNode
-
def
asCode: String
Returns a 'scala code' representation of this
TreeNode
and its children.Returns a 'scala code' representation of this
TreeNode
and its children. Intended for use when debugging where the prettier toString function is obfuscating the actual structure. In the case of 'pure'TreeNodes
that only contain primitives and other TreeNodes, the result can be pasted in the REPL to build an equivalent Tree.- Definition Classes
- TreeNode
-
final
def
asInstanceOf[T0]: T0
- Definition Classes
- Any
-
lazy val
canonicalized: Expression
Returns an expression where a best effort attempt has been made to transform
this
in a way that preserves the result but removes cosmetic variations (case sensitivity, ordering for commutative operations, etc.) See Canonicalize for more details.Returns an expression where a best effort attempt has been made to transform
this
in a way that preserves the result but removes cosmetic variations (case sensitivity, ordering for commutative operations, etc.) See Canonicalize for more details.deterministic
expressions wherethis.canonicalized == other.canonicalized
will always evaluate to the same result.- Definition Classes
- Expression
-
def
checkInputDataTypes(): TypeCheckResult
Checks the input data types, returns
TypeCheckResult.success
if it's valid, or returns aTypeCheckResult
with an error message if invalid.Checks the input data types, returns
TypeCheckResult.success
if it's valid, or returns aTypeCheckResult
with an error message if invalid. Note: it's not valid to call this method untilchildrenResolved == true
.- Definition Classes
- Expression
- val child: Expression
-
def
children: Seq[Expression]
Returns a Seq of the children of this node.
Returns a Seq of the children of this node. Children should not change. Immutability required for containsChild optimization
- Definition Classes
- HyperLogLogPlusPlus → TreeNode
-
def
childrenResolved: Boolean
Returns true if all the children of this expression have been resolved to a specific schema and false if any still contains any unresolved placeholders.
Returns true if all the children of this expression have been resolved to a specific schema and false if any still contains any unresolved placeholders.
- Definition Classes
- Expression
-
def
clone(): Expression
- Definition Classes
- TreeNode → AnyRef
-
def
collect[B](pf: PartialFunction[Expression, B]): Seq[B]
Returns a Seq containing the result of applying a partial function to all elements in this tree on which the function is defined.
Returns a Seq containing the result of applying a partial function to all elements in this tree on which the function is defined.
- Definition Classes
- TreeNode
-
def
collectFirst[B](pf: PartialFunction[Expression, B]): Option[B]
Finds and returns the first TreeNode of the tree for which the given partial function is defined (pre-order), and applies the partial function to it.
-
def
collectLeaves(): Seq[Expression]
Returns a Seq containing the leaves in this tree.
Returns a Seq containing the leaves in this tree.
- Definition Classes
- TreeNode
-
lazy val
containsChild: Set[TreeNode[_]]
- Definition Classes
- TreeNode
-
def
copyTagsFrom(other: Expression): Unit
- Attributes
- protected
- Definition Classes
- TreeNode
-
def
dataType: DataType
Returns the DataType of the result of evaluating this expression.
Returns the DataType of the result of evaluating this expression. It is invalid to query the dataType of an unresolved expression (i.e., when
resolved
== false).- Definition Classes
- HyperLogLogPlusPlus → Expression
-
def
defaultResult: Option[Literal]
Result of the aggregate function when the input is empty.
Result of the aggregate function when the input is empty. This is currently only used for the proper rewriting of distinct aggregate functions.
- Definition Classes
- AggregateFunction
-
lazy val
deterministic: Boolean
Returns true when the current expression always return the same result for fixed inputs from children.
Returns true when the current expression always return the same result for fixed inputs from children. The non-deterministic expressions should not change in number and order. They should not be evaluated during the query planning.
Note that this means that an expression should be considered as non-deterministic if: - it relies on some mutable internal state, or - it relies on some implicit input that is not part of the children expression list. - it has non-deterministic child or children. - it assumes the input satisfies some certain condition via the child operator.
An example would be
SparkPartitionID
that relies on the partition id returned by TaskContext. By default leaf expressions are deterministic as Nil.forall(_.deterministic) returns true.- Definition Classes
- Expression
-
def
doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode
Returns Java source code that can be compiled to evaluate this expression.
Returns Java source code that can be compiled to evaluate this expression. The default behavior is to call the eval method of the expression. Concrete expression implementations should override this to do actual code generation.
- ctx
- ev
an ExprCode with unique terms.
- returns
an ExprCode containing the Java source code to generate the given expression
- Attributes
- protected
- Definition Classes
- CodegenFallback → Expression
-
final
def
eq(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
def
eval(buffer: InternalRow): Any
Compute the HyperLogLog estimate.
Compute the HyperLogLog estimate.
- Definition Classes
- HyperLogLogPlusPlus → Expression
-
def
fastEquals(other: TreeNode[_]): Boolean
Faster version of equality which short-circuits when two treeNodes are the same instance.
Faster version of equality which short-circuits when two treeNodes are the same instance. We don't just override Object.equals, as doing so prevents the scala compiler from generating case class
equals
methods- Definition Classes
- TreeNode
-
def
finalize(): Unit
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( classOf[java.lang.Throwable] )
-
def
find(f: (Expression) ⇒ Boolean): Option[Expression]
Find the first TreeNode that satisfies the condition specified by
f
. -
def
flatArguments: Iterator[Any]
- Attributes
- protected
- Definition Classes
- Expression
-
def
flatMap[A](f: (Expression) ⇒ TraversableOnce[A]): Seq[A]
Returns a Seq by applying a function to all nodes in this tree and using the elements of the resulting collections.
Returns a Seq by applying a function to all nodes in this tree and using the elements of the resulting collections.
- Definition Classes
- TreeNode
-
final
def
foldable: Boolean
An aggregate function is not foldable.
An aggregate function is not foldable.
- Definition Classes
- AggregateFunction → Expression
-
def
foreach(f: (Expression) ⇒ Unit): Unit
Runs the given function on this node and then recursively on children.
-
def
foreachUp(f: (Expression) ⇒ Unit): Unit
Runs the given function recursively on children then on this node.
-
def
genCode(ctx: CodegenContext): ExprCode
Returns an ExprCode, that contains the Java source code to generate the result of evaluating the expression on an input row.
Returns an ExprCode, that contains the Java source code to generate the result of evaluating the expression on an input row.
- ctx
a CodegenContext
- returns
ExprCode
- Definition Classes
- Expression
-
def
generateTreeString(depth: Int, lastChildren: Seq[Boolean], append: (String) ⇒ Unit, verbose: Boolean, prefix: String = "", addSuffix: Boolean = false, maxFields: Int, printNodeId: Boolean): Unit
Appends the string representation of this node and its children to the given Writer.
Appends the string representation of this node and its children to the given Writer.
The
i
-th element inlastChildren
indicates whether the ancestor of the current node at depthi + 1
is the last child of its own parent node. The depth of the root node is 0, andlastChildren
for the root node should be empty.Note that this traversal (numbering) order must be the same as getNodeNumbered.
- Definition Classes
- TreeNode
-
final
def
getClass(): Class[_]
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
-
def
getTagValue[T](tag: TreeNodeTag[T]): Option[T]
- Definition Classes
- TreeNode
-
def
hashCode(): Int
- Definition Classes
- TreeNode → AnyRef → Any
- val hllppHelper: HyperLogLogPlusPlusHelper
-
def
initialize(buffer: InternalRow): Unit
Fill all words with zeros.
Fill all words with zeros.
- Definition Classes
- HyperLogLogPlusPlus → ImperativeAggregate
-
def
innerChildren: Seq[TreeNode[_]]
All the nodes that should be shown as a inner nested tree of this node.
All the nodes that should be shown as a inner nested tree of this node. For example, this can be used to show sub-queries.
- Definition Classes
- TreeNode
-
val
inputAggBufferAttributes: Seq[AttributeReference]
Attributes of fields in input aggregation buffers (immutable aggregation buffers that are merged with mutable aggregation buffers in the merge() function or merge expressions).
Attributes of fields in input aggregation buffers (immutable aggregation buffers that are merged with mutable aggregation buffers in the merge() function or merge expressions). These attributes are created automatically by cloning the aggBufferAttributes.
- Definition Classes
- HyperLogLogPlusPlus → AggregateFunction
-
val
inputAggBufferOffset: Int
The offset of this function's start buffer value in the underlying shared input aggregation buffer.
The offset of this function's start buffer value in the underlying shared input aggregation buffer. An input aggregation buffer is used when we merge two aggregation buffers together in the
update()
function and is immutable (we merge an input aggregation buffer and a mutable aggregation buffer and then store the new buffer values to the mutable aggregation buffer).An input aggregation buffer may contain extra fields, such as grouping keys, at its start, so mutableAggBufferOffset and inputAggBufferOffset are often different.
For example, say we have a grouping expression,
key
, and two aggregate functions,avg(x)
andavg(y)
. In the shared input aggregation buffer, the position of the first buffer value ofavg(x)
will be 1 and the position of the first buffer value ofavg(y)
will be 3 (position 0 is used for the value ofkey
):avg(x) inputAggBufferOffset = 1 | v +--------+--------+--------+--------+--------+ | key | sum1 | count1 | sum2 | count2 | +--------+--------+--------+--------+--------+ ^ | avg(y) inputAggBufferOffset = 3
- Definition Classes
- HyperLogLogPlusPlus → ImperativeAggregate
-
final
def
isInstanceOf[T0]: Boolean
- Definition Classes
- Any
-
def
jsonFields: List[JField]
- Attributes
- protected
- Definition Classes
- TreeNode
-
def
makeCopy(newArgs: Array[AnyRef]): Expression
Creates a copy of this type of tree node after a transformation.
Creates a copy of this type of tree node after a transformation. Must be overridden by child classes that have constructor arguments that are not present in the productIterator.
- newArgs
the new product arguments.
- Definition Classes
- TreeNode
-
def
map[A](f: (Expression) ⇒ A): Seq[A]
Returns a Seq containing the result of applying the given function to each node in this tree in a preorder traversal.
Returns a Seq containing the result of applying the given function to each node in this tree in a preorder traversal.
- f
the function to be applied.
- Definition Classes
- TreeNode
-
def
mapChildren(f: (Expression) ⇒ Expression): Expression
Returns a copy of this node where
f
has been applied to all the nodes inchildren
.Returns a copy of this node where
f
has been applied to all the nodes inchildren
.- Definition Classes
- TreeNode
-
def
mapProductIterator[B](f: (Any) ⇒ B)(implicit arg0: ClassTag[B]): Array[B]
Efficient alternative to
productIterator.map(f).toArray
.Efficient alternative to
productIterator.map(f).toArray
.- Attributes
- protected
- Definition Classes
- TreeNode
-
def
merge(buffer1: InternalRow, buffer2: InternalRow): Unit
Merge the HLL++ buffers.
Merge the HLL++ buffers.
- Definition Classes
- HyperLogLogPlusPlus → ImperativeAggregate
-
val
mutableAggBufferOffset: Int
The offset of this function's first buffer value in the underlying shared mutable aggregation buffer.
The offset of this function's first buffer value in the underlying shared mutable aggregation buffer.
For example, we have two aggregate functions
avg(x)
andavg(y)
, which share the same aggregation buffer. In this shared buffer, the position of the first buffer value ofavg(x)
will be 0 and the position of the first buffer value ofavg(y)
will be 2:avg(x) mutableAggBufferOffset = 0 | v +--------+--------+--------+--------+ | sum1 | count1 | sum2 | count2 | +--------+--------+--------+--------+ ^ | avg(y) mutableAggBufferOffset = 2
- Definition Classes
- HyperLogLogPlusPlus → ImperativeAggregate
-
final
def
ne(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
def
nodeName: String
Returns the name of this type of TreeNode.
Returns the name of this type of TreeNode. Defaults to the class name. Note that we remove the "Exec" suffix for physical operators here.
- Definition Classes
- TreeNode
-
final
def
notify(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
-
final
def
notifyAll(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
-
def
nullable: Boolean
- Definition Classes
- HyperLogLogPlusPlus → Expression
-
def
numberedTreeString: String
Returns a string representation of the nodes in this tree, where each operator is numbered.
Returns a string representation of the nodes in this tree, where each operator is numbered. The numbers can be used with TreeNode.apply to easily access specific subtrees.
The numbers are based on depth-first traversal of the tree (with innerChildren traversed first before children).
- Definition Classes
- TreeNode
-
val
origin: Origin
- Definition Classes
- TreeNode
-
def
otherCopyArgs: Seq[AnyRef]
Args to the constructor that should be copied, but not transformed.
Args to the constructor that should be copied, but not transformed. These are appended to the transformed args automatically by makeCopy
- Attributes
- protected
- Definition Classes
- TreeNode
-
def
p(number: Int): Expression
Returns the tree node at the specified number, used primarily for interactive debugging.
Returns the tree node at the specified number, used primarily for interactive debugging. Numbers for each node can be found in the numberedTreeString.
This is a variant of apply that returns the node as BaseType (if the type matches).
- Definition Classes
- TreeNode
-
def
prettyJson: String
- Definition Classes
- TreeNode
-
def
prettyName: String
Returns a user-facing string representation of this expression's name.
Returns a user-facing string representation of this expression's name. This should usually match the name of the function in SQL.
- Definition Classes
- HyperLogLogPlusPlus → Expression
-
def
references: AttributeSet
- Definition Classes
- Expression
- val relativeSD: Double
-
lazy val
resolved: Boolean
Returns
true
if this expression and all its children have been resolved to a specific schema and input data types checking passed, andfalse
if it still contains any unresolved placeholders or has data types mismatch.Returns
true
if this expression and all its children have been resolved to a specific schema and input data types checking passed, andfalse
if it still contains any unresolved placeholders or has data types mismatch. Implementations of expressions should override this if the resolution of this type of expression involves more than just the resolution of its children and type checking.- Definition Classes
- Expression
-
def
semanticEquals(other: Expression): Boolean
Returns true when two expressions will always compute the same result, even if they differ cosmetically (i.e.
Returns true when two expressions will always compute the same result, even if they differ cosmetically (i.e. capitalization of names in attributes may be different).
See Canonicalize for more details.
- Definition Classes
- Expression
-
def
semanticHash(): Int
Returns a
hashCode
for the calculation performed by this expression.Returns a
hashCode
for the calculation performed by this expression. Unlike the standardhashCode
, an attempt has been made to eliminate cosmetic differences.See Canonicalize for more details.
- Definition Classes
- Expression
-
def
setTagValue[T](tag: TreeNodeTag[T], value: T): Unit
- Definition Classes
- TreeNode
-
def
simpleString(maxFields: Int): String
ONE line description of this node.
ONE line description of this node.
- maxFields
Maximum number of fields that will be converted to strings. Any elements beyond the limit will be dropped.
- Definition Classes
- Expression → TreeNode
-
def
simpleStringWithNodeId(): String
ONE line description of this node containing the node identifier.
ONE line description of this node containing the node identifier.
- Definition Classes
- Expression → TreeNode
-
def
sql(isDistinct: Boolean): String
- Definition Classes
- AggregateFunction
-
def
sql: String
Returns SQL representation of this expression.
Returns SQL representation of this expression. For expressions extending NonSQLExpression, this method may return an arbitrary user facing string.
- Definition Classes
- Expression
-
def
stringArgs: Iterator[Any]
The arguments that should be included in the arg string.
The arguments that should be included in the arg string. Defaults to the
productIterator
.- Attributes
- protected
- Definition Classes
- TreeNode
-
final
def
synchronized[T0](arg0: ⇒ T0): T0
- Definition Classes
- AnyRef
-
def
toAggString(isDistinct: Boolean): String
String representation used in explain plans.
String representation used in explain plans.
- Definition Classes
- AggregateFunction
-
def
toAggregateExpression(isDistinct: Boolean): AggregateExpression
Wraps this AggregateFunction in an AggregateExpression and sets
isDistinct
flag of the AggregateExpression to the given value because AggregateExpression is the container of an AggregateFunction, aggregation mode, and the flag indicating if this aggregation is distinct aggregation or not.Wraps this AggregateFunction in an AggregateExpression and sets
isDistinct
flag of the AggregateExpression to the given value because AggregateExpression is the container of an AggregateFunction, aggregation mode, and the flag indicating if this aggregation is distinct aggregation or not. An AggregateFunction should not be used without being wrapped in an AggregateExpression.- Definition Classes
- AggregateFunction
-
def
toAggregateExpression(): AggregateExpression
Creates AggregateExpression with
isDistinct
flag disabled.Creates AggregateExpression with
isDistinct
flag disabled.- Definition Classes
- AggregateFunction
- See also
toAggregateExpression(isDistinct: Boolean)
for detailed description
-
def
toJSON: String
- Definition Classes
- TreeNode
-
def
toString(): String
- Definition Classes
- Expression → TreeNode → AnyRef → Any
-
def
transform(rule: PartialFunction[Expression, Expression]): Expression
Returns a copy of this node where
rule
has been recursively applied to the tree.Returns a copy of this node where
rule
has been recursively applied to the tree. Whenrule
does not apply to a given node it is left unchanged. Users should not expect a specific directionality. If a specific directionality is needed, transformDown or transformUp should be used.- rule
the function use to transform this nodes children
- Definition Classes
- TreeNode
-
def
transformDown(rule: PartialFunction[Expression, Expression]): Expression
Returns a copy of this node where
rule
has been recursively applied to it and all of its children (pre-order).Returns a copy of this node where
rule
has been recursively applied to it and all of its children (pre-order). Whenrule
does not apply to a given node it is left unchanged.- rule
the function used to transform this nodes children
- Definition Classes
- TreeNode
-
def
transformUp(rule: PartialFunction[Expression, Expression]): Expression
Returns a copy of this node where
rule
has been recursively applied first to all of its children and then itself (post-order).Returns a copy of this node where
rule
has been recursively applied first to all of its children and then itself (post-order). Whenrule
does not apply to a given node, it is left unchanged.- rule
the function use to transform this nodes children
- Definition Classes
- TreeNode
-
def
treeString(append: (String) ⇒ Unit, verbose: Boolean, addSuffix: Boolean, maxFields: Int, printOperatorId: Boolean): Unit
- Definition Classes
- TreeNode
-
final
def
treeString(verbose: Boolean, addSuffix: Boolean = false, maxFields: Int = SQLConf.get.maxToStringFields, printOperatorId: Boolean = false): String
- Definition Classes
- TreeNode
-
final
def
treeString: String
Returns a string representation of the nodes in this tree
Returns a string representation of the nodes in this tree
- Definition Classes
- TreeNode
-
def
unsetTagValue[T](tag: TreeNodeTag[T]): Unit
- Definition Classes
- TreeNode
-
def
update(buffer: InternalRow, input: InternalRow): Unit
Update the HLL++ buffer.
Update the HLL++ buffer.
- Definition Classes
- HyperLogLogPlusPlus → ImperativeAggregate
-
final
def
verboseString(maxFields: Int): String
ONE line description of this node with more information
ONE line description of this node with more information
- Definition Classes
- Expression → TreeNode
-
def
verboseStringWithSuffix(maxFields: Int): String
ONE line description of this node with some suffix information
ONE line description of this node with some suffix information
- Definition Classes
- TreeNode
-
final
def
wait(): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long, arg1: Int): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native()
-
def
withNewChildren(newChildren: Seq[Expression]): Expression
Returns a copy of this node with the children replaced.
Returns a copy of this node with the children replaced. TODO: Validate somewhere (in debug mode?) that children are ordered correctly.
- Definition Classes
- TreeNode
-
def
withNewInputAggBufferOffset(newInputAggBufferOffset: Int): ImperativeAggregate
Returns a copy of this ImperativeAggregate with an updated mutableAggBufferOffset.
Returns a copy of this ImperativeAggregate with an updated mutableAggBufferOffset. This new copy's attributes may have different ids than the original.
- Definition Classes
- HyperLogLogPlusPlus → ImperativeAggregate
-
def
withNewMutableAggBufferOffset(newMutableAggBufferOffset: Int): ImperativeAggregate
Returns a copy of this ImperativeAggregate with an updated mutableAggBufferOffset.
Returns a copy of this ImperativeAggregate with an updated mutableAggBufferOffset. This new copy's attributes may have different ids than the original.
- Definition Classes
- HyperLogLogPlusPlus → ImperativeAggregate