A function that get the absolute value of the numeric value.
Returns the date that is num_months after start_date.
Used to assign a new name to a computation.
Checks if the array (left) has the element (right)
Returns the numeric value of the first character of str.
A function throws an exception if 'condition' is not true.
A predicate that is evaluated to be true if there are at least n
non-null and non-NaN values.
A reference to an attribute produced by another operator in the tree.
Helper functions for working with Seq[Attribute]
.
A Set designed to hold AttributeReference objects, that performs equality checking using expression id instead of standard java equality.
Round an expression to d decimal places using HALF_EVEN rounding mode, also known as Gaussian rounding or bankers' rounding.
Converts the argument from binary to a base 64 string.
An extended version of InternalRow that implements all special getters, toString
and equals/hashCode by genericGet
.
An expression with two inputs and one output.
A binary expression specifically for math functions that take two Double
s as input and returns
a Double
.
A BinaryExpression that is an operator, with two properties:
A function that calculates bitwise and(&) of two numbers.
A function that calculates bitwise not(~) of a number.
A function that calculates bitwise or(|) of two numbers.
A function that calculates bitwise xor of two numbers.
A bound reference points to a specific slot in the input tuple, allowing the actual value to be retrieved more efficiently.
An expression that invokes a method on a class via reflection.
Case statements of the form "CASE WHEN a THEN b [WHEN c THEN d]* [ELSE e] END".
Abstract parent class for common logic in CaseWhen and CaseWhenCodegen.
CaseWhen expression used when code generation condition is satisfied.
Cast the child expression to the target data type.
Rounds the decimal to given scale and check whether the decimal can fit in provided precision or not, returns null if not.
An expression that is evaluated to the first non-null input.
An expression that concatenates multiple input strings into a single string.
An expression that concatenates multiple input strings or array of strings into a single string, using a given separator (the first child).
A function that returns true if the string left
contains the string right
.
Convert a num from one base to another
A function that computes a cyclic redundancy check value and returns it as a bigint For input of type BinaryType
Returns an Array containing the evaluation of all children expressions.
Returns a catalyst Map containing the evaluation of all children expressions as keys and values.
Creates a struct with the given field names and values
Common base class for both CreateNamedStruct and CreateNamedStructUnsafe.
Creates a struct with the given field names and values.
The CumeDist function computes the position of a value relative to all values in the partition.
Expression representing the current batch time, which is used by StreamExecution to 1.
Returns the current database of the SessionCatalog.
Returns the current date at the start of query evaluation.
Returns the current timestamp at the start of query evaluation.
Adds a number of days to startdate.
Returns the number of days from startDate to endDate.
Subtracts a number of days to startdate.
Decodes the first argument into a String using the provided character set (one of 'US-ASCII', 'ISO-8859-1', 'UTF-8', 'UTF-16BE', 'UTF-16LE', 'UTF-16').
The DenseRank function computes the rank of a value in a group of values.
Encodes the first argument into a BINARY using the provided character set (one of 'US-ASCII', 'ISO-8859-1', 'UTF-8', 'UTF-16BE', 'UTF-16LE', 'UTF-16').
A function that returns true if the string left
ends with the string right
.
This class is used to compute equality of (sub)expression trees.
Euler's number.
The Exists expression checks if a row exists in a subquery given some correlated condition.
A trait that gets mixin to define the expected input types of an expression.
Given an input array produces a sequence of rows for each value in the array.
A base class for Explode and PosExplode
A globally unique id for a given named expression.
An expression in Catalyst.
A Set where membership is determined based on a canonical representation of an Expression (i.
A function that returns the index (1-based) of the given string (left) in the comma- delimited list (right).
Formats the number X to a format like '#,###,###.
Returns the input formatted according do printf-style format strings
The trait used to represent the type of a Window Frame Boundary.
The trait used to represent the type of a Window Frame.
Given a timestamp, which corresponds to a certain time of day in UTC, returns another timestamp that corresponds to the same time of day in the given timezone.
Converts the number of seconds from unix epoch (1970-01-01 00:00:00 UTC) to a string representing the timestamp of that moment in the current system time zone in the given format.
An expression that produces zero or more rows given a single input row.
An internal row implementation that uses an array of objects as the underlying storage.
A row implementation that uses an array of objects as the underlying storage.
Returns the field at ordinal
in the Array child
.
For a child whose data type is an array of structs, extracts the ordinal
-th fields of all array
elements, and returns them as a new array.
Extracts json object from a json string based on json path specified, and returns json string of the extracted json object.
Returns the value of key key
in Map child
.
Returns the value of fields in the Struct child
.
A function that returns the greatest value of all parameters, skipping null values.
Indicates whether a specified column expression in a GROUP BY list is aggregated or not.
GroupingID is a function that computes the level of grouping.
A placeholder expression for cube/rollup, which will be replaced by analyzer
A function that calculates hash value for a group of expressions.
If the argument is an INT or binary, hex returns the number as a STRING in hexadecimal format.
Simulates Hive's hashing function at org.
A mixin for the analyzer to perform implicit type casting using org.apache.spark.sql.catalyst.analysis.TypeCoercion.ImplicitTypeCasts.
Evaluates to true
if list
contains value
.
Optimized version of In clause, when all filter values of In clause are static.
Returns string, with the first letter of each word in uppercase, all other letters in lowercase.
Explodes an array of structs into a table.
Expression that returns the name of the current file being read.
Base class for interpreted hash functions.
A MutableProjection that is calculated by calling eval
on each of the specified
expressions.
An interpreted row ordering comparator.
A Projection that is calculated by calling the eval
of each of the specified expressions.
Evaluates to true
iff it's NaN.
An expression that is evaluated to true if the input is not null.
An expression that is evaluated to true if the input is null.
A mutable wrapper that makes two rows appear as a single concatenated row.
Converts an json input string to a StructType with the specified schema.
The Lag function returns the value of input
at the offset
th row before the current row in
the window.
Returns the last day of the month which the date belongs to.
The Lead function returns the value of input
at the offset
th row after the current row in
the window.
A leaf expression, i.
A leaf expression specifically for math constants.
A function that returns the least value of all parameters, skipping null values.
A function that return the length of the given string or binary expression.
A function that return the Levenshtein distance between the two given strings.
Simple RegEx pattern matching function
A ListQuery expression defines the query which we want to search in an IN subquery expression.
In order to do type checking, use Literal.
Computes the logarithm of a number.
A function that converts the characters of a string to lowercase.
Create a Decimal from an unscaled Long value.
Returns an unordered array containing the keys of the map.
Returns an unordered array containing the values of the map.
A function that calculates an MD5 128-bit checksum and returns it as a hex string For input of type BinaryType
Returns monotonically increasing 64-bit integers.
Returns number of months between dates date1 and date2.
A MurMur3 Hash expression.
Converts a InternalRow to another Row given a sequence of expression that define each column of the new row.
A parent class for mutable container objects that are reused when the values are changed, resulting in less garbage.
The NTile function divides the rows for each window partition into n
buckets ranging from 1 to
at most n
.
An Expression evaluates to left
iff it's not NaN, or evaluates to right
otherwise.
An Expression that is named.
Returns the first date which is later than startDate and named as dayOfWeek.
Expressions that don't have SQL representation should extend this trait.
An expression that is nondeterministic.
When an expression inherits this, meaning the expression is null intolerant (i.
An offset window function is a window function that returns the value of the input column offset by a number of rows within the partition.
A place holder used to hold a reference that has been resolved to a field outside of the current plan.
Extracts a part from a URL
The PercentRank function computes the percentage ranking of a value in a group of values.
Pi.
An interface for expressions that contain a QueryPlan.
Given an input array produces a sequence of rows for each position and value in the array.
Expression used internally to convert the TimestampType to Long without losing precision, i.
An Expression that returns a boolean value.
A predicate subquery checks the existence of a value in a sub-query.
A place holder used when printing expressions without debugging information such as the expression id or the unresolved indicator.
Print the result of an expression to stderr (used for debugging codegen).
Converts a InternalRow to another Row given a sequence of expression that define each column of the new row.
An expression used to wrap the children when promote the precision of DecimalType to avoid promote multiple times.
A Random distribution generating expression.
Generate a random column with i.
Generate a random column with i.
The Rank function computes the rank of a value in a group of values.
A RankLike function is a WindowFunction that changes its value based on a change in the value of the order of the window in which is processed.
A special expression that evaluates BoundReferences by given expressions instead of the input row.
Extract a specific(idx) group identified by a Java regex.
Replace all substrings of str that match regexp with rep.
Round an expression to d decimal places using HALF_UP rounding mode.
Round the child
's result to scale
decimal place when scale
>= 0
or round at integral part when scale
< 0.
The RowNumber function computes a unique, sequential number to each row, starting with one, according to the ordering of rows within the window partition.
An expression that gets replaced at runtime (currently by the optimizer) into a different expression for evaluation.
User-defined function.
A subquery that will return only one row and one column.
Splits a string into arrays of sentences, where each sentence is an array of words.
A function that calculates a sha1 hash value and returns it as a hex string For input of type BinaryType or StringType
A function that calculates the SHA-2 family of functions (SHA-224, SHA-256, SHA-384, and SHA-512) and returns it as a hex string.
Bitwise left shift.
Bitwise (signed) right shift.
Bitwise unsigned right shift, for integer and long data type.
Given an array or map, returns its size.
A SizeBasedWindowFunction needs the size of the current window for its calculation.
Sorts the input array in ascending / descending order according to the natural ordering of the array elements and returns it.
An expression that can be used to sort a tuple.
An expression to generate a 64-bit long prefix used in sorting.
A function that return Soundex code of the given string expression.
Expression that returns the current partition id.
A row type that holds an array specialized container objects, of type MutableValue, chosen based on the dataTypes of each column.
A specified Window Frame.
Separate v1, .
A function that returns true if the string left
starts with the string right
.
A function that returns the position of the first occurrence of substr in the given string.
Returns str, left-padded with pad to a length of len.
A function that returns the position of the first occurrence of substr in given string after position pos.
A base trait for functions that compare two strings, returning a boolean.
Returns str, right-padded with pad to a length of len.
Returns the string which repeat the given string value n times.
Returns the reversed given string.
Returns a string consisting of n spaces.
Splits str around pat (pattern is a regular expression).
Creates a map after splitting the input text into key/value pairs using delimiters
A function translate any character in the srcExpr
by a character in replaceExpr
.
A function that trim the spaces from both ends for the specified string.
A function that trim the spaces from left end for given string.
A function that trim the spaces from right end for given string.
Converts a StructType to a json output string.
A base interface for expressions that contain a LogicalPlan.
A function that takes a substring of its first argument starting at a given position.
Returns the substring from string str before count occurrences of the delimiter delim.
An expression with three inputs and one output.
Adds an interval to timestamp.
Subtracts an interval from timestamp.
Returns the date part of a timestamp or string.
Given a timestamp, which corresponds to a certain time of day in the given timezone, returns another timestamp that corresponds to the same time of day in UTC.
Converts time string with given pattern.
Returns date truncated to the unit specified by the format.
Converts the argument from a base 64 string to BINARY.
An expression with one input and one output.
A unary expression specifically for math functions.
An expression that cannot be evaluated.
Performs the inverse operation of HEX.
Converts time string with given pattern.
A projection that returns UnsafeRow.
Return the unscaled Long value of a Decimal, assuming it fits in a Long.
Cast the child expression to the target data type, but will throw error if the cast might truncate, e.
A function that converts the characters of a string to uppercase.
A generator that produces its output using the provided lambda function.
<value> FOLLOWING boundary.
<value> PRECEDING boundary.
The trait used to represent the a Window Frame.
A window function is a function that can only be evaluated in the context of a window operator.
The trait of the Window Specification (specified in the OVER clause or WINDOW clause) for Window Functions.
The specification for a window function.
A Window specification reference that refers to the WindowSpecDefinition defined
under the name name
.
A xxHash64 64-bit hash expression.
Builds a map that is keyed by an Attribute's expression id.
Rewrites an expression using rules that are guaranteed preserve the result while attempting to remove cosmetic variations.
Case statements of the form "CASE a WHEN b THEN c [WHEN d THEN e]* [ELSE f] END".
Factory methods for CaseWhen.
Returns a Row containing the evaluation of all children expressions.
CURRENT ROW boundary.
Extractor for and other utility methods for decimal literals.
Used as input into expressions whose output does not depend on any input value.
An extractor that matches both standard 3VL equality and null-safe equality.
Extractor for making working with frame boundaries easier.
A projection that could turn UnsafeRow into GenericInternalRow
Extractor for retrieving Int literals.
An expression representing a not yet available attribute name.
An extractor that matches non-null literal values
RangeFrame treats rows in a partition as groups of peers.
RowFrame treats rows in a partition individually.
UNBOUNDED FOLLOWING boundary.
UNBOUNDED PRECEDING boundary.
Used as a place holder when a frame specification is not defined.
A collection of generators that build custom bytecode at runtime for performing the evaluation of catalyst expression.
A set of classes that can be used to represent trees of relational expressions. A key goal of the expression library is to hide the details of naming and scoping from developers who want to manipulate trees of relational operators. As such, the library defines a special type of expression, a NamedExpression in addition to the standard collection of expressions.
Standard Expressions
A library of standard expressions (e.g., Add, EqualTo), aggregates (e.g., SUM, COUNT), and other computations (e.g. UDFs). Each expression type is capable of determining its output schema as a function of its children's output schema.
Named Expressions
Some expression are named and thus can be referenced by later operators in the dataflow graph. The two types of named expressions are AttributeReferences and Aliases. AttributeReferences refer to attributes of the input tuple for a given operator and form the leaves of some expression trees. Aliases assign a name to intermediate computations. For example, in the SQL statement
SELECT a+b AS c FROM ...
, the expressionsa
andb
would be represented byAttributeReferences
andc
would be represented by anAlias
.During analysis, all named expressions are assigned a globally unique expression id, which can be used for equality comparisons. While the original names are kept around for debugging purposes, they should never be used to check if two attributes refer to the same value, as plan transformations can result in the introduction of naming ambiguity. For example, consider a plan that contains subqueries, both of which are reading from the same table. If an optimization removes the subqueries, scoping information would be destroyed, eliminating the ability to reason about which subquery produced a given attribute.
Evaluation
The result of expressions can be evaluated using the
Expression.apply(Row)
method.