Class ProcessTableFunction<T>
- java.lang.Object
-
- org.apache.flink.table.functions.UserDefinedFunction
-
- org.apache.flink.table.functions.ProcessTableFunction<T>
-
- Type Parameters:
T
- The type of the output row. Either an explicit composite type or an atomic type that is implicitly wrapped into a row consisting of one field.
- All Implemented Interfaces:
Serializable
,FunctionDefinition
@PublicEvolving public abstract class ProcessTableFunction<T> extends UserDefinedFunction
Base class for a user-defined process table function. A process table function (PTF) maps zero, one, or multiple tables to zero, one, or multiple rows (or structured types). Scalar arguments are also supported. If the output record consists of only one field, the wrapper can be omitted, and a scalar value can be emitted that will be implicitly wrapped into a row by the runtime.PTFs are the most powerful function kind for Flink SQL and Table API. They enable implementing user-defined operators that can be as feature-rich as built-in operations. PTFs have access to Flink's managed state, event-time and timer services, underlying table changelogs, and can take multiple ordered and/or partitioned tables to produce a new table.
Table Semantics and Virtual Processors
PTFs can produce a new table by consuming tables as arguments. For scalability, input tables are distributed across so-called "virtual processors". A virtual processor, as defined by the SQL standard, executes a PTF instance and has access only to a portion of the entire table. The argument declaration decides about the size of the portion and co-location of data. Conceptually, tables can be processed either "as row" (i.e. with row semantics) or "as set" (i.e. with set semantics).
Table Argument with Row Semantics
A PTF that takes a table with row semantics assumes that there is no correlation between rows and each row can be processed independently. The framework is free in how to distribute rows across virtual processors and each virtual processor has access only to the currently processed row.
Table Argument with Set Semantics
A PTF that takes a table with set semantics assumes that there is a correlation between rows. When calling the function, the PARTITION BY clause defines the columns for correlation. The framework ensures that all rows belonging to same set are co-located. A PTF instance is able to access all rows belonging to the same set. In other words: The virtual processor is scoped by a key context.
It is also possible not to provide a key (
ArgumentTrait.OPTIONAL_PARTITION_BY
), in which case only one virtual processor handles the entire table, thereby losing scalability benefits.Implementation
The behavior of a
ProcessTableFunction
can be defined by implementing a custom evaluation method. The evaluation method must be declared publicly, not static, and namedeval
. Overloading is not supported.For storing a user-defined function in a catalog, the class must have a default constructor and must be instantiable during runtime. Anonymous functions in Table API can only be persisted if the function object is not stateful (i.e. containing only transient and static fields).
Data Types
By default, input and output data types are automatically extracted using reflection. This includes the generic argument
T
of the class for determining an output data type. Input arguments are derived from theeval()
method. If the reflective information is not sufficient, it can be supported and enriched withFunctionHint
,ArgumentHint
, andDataTypeHint
annotations.The following examples show how to specify data types:
{@code // Function that accepts two scalar INT arguments and emits them as an implicit ROW < INT > class AdditionFunction extends ProcessTableFunction
{ public void eval(Integer a, Integer b) { collect(a + b); } } // Function that produces an explicit ROW < i INT, s STRING > from arguments, the function hint helps in // declaring the row's fields - See Also:
- Serialized Form
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static interface
ProcessTableFunction.Context
Context that can be added as a first argument to the eval() method for additional information about the input tables and other services provided by the framework.
-
Constructor Summary
Constructors Constructor Description ProcessTableFunction()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description protected void
collect(T row)
Emits an (implicit or explicit) output row.FunctionKind
getKind()
Returns the kind of function this definition describes.TypeInference
getTypeInference(DataTypeFactory typeFactory)
Returns the logic for performing type inference of a call to this function definition.void
setCollector(org.apache.flink.util.Collector<T> collector)
Internal use.-
Methods inherited from class org.apache.flink.table.functions.UserDefinedFunction
close, functionIdentifier, open, toString
-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
-
Methods inherited from interface org.apache.flink.table.functions.FunctionDefinition
getRequirements, isDeterministic, supportsConstantFolding
-
-
-
-
Method Detail
-
setCollector
public final void setCollector(org.apache.flink.util.Collector<T> collector)
Internal use. Sets the current collector.
-
collect
protected final void collect(T row)
Emits an (implicit or explicit) output row.If null is emitted as an explicit row, it will be skipped by the runtime. For implicit rows, the row's field will be null.
- Parameters:
row
- the output row
-
getKind
public final FunctionKind getKind()
Description copied from interface:FunctionDefinition
Returns the kind of function this definition describes.
-
getTypeInference
public TypeInference getTypeInference(DataTypeFactory typeFactory)
Description copied from class:UserDefinedFunction
Returns the logic for performing type inference of a call to this function definition.The type inference process is responsible for inferring unknown types of input arguments, validating input arguments, and producing result types. The type inference process happens independent of a function body. The output of the type inference is used to search for a corresponding runtime implementation.
Instances of type inference can be created by using
TypeInference.newBuilder()
.See
BuiltInFunctionDefinitions
for concrete usage examples.The type inference for user-defined functions is automatically extracted using reflection. It does this by analyzing implementation methods such as
eval() or accumulate()
and the generic parameters of a function class if present. If the reflective information is not sufficient, it can be supported and enriched withDataTypeHint
andFunctionHint
annotations.Note: Overriding this method is only recommended for advanced users. If a custom type inference is specified, it is the responsibility of the implementer to make sure that the output of the type inference process matches with the implementation method:
The implementation method must comply with each
DataType.getConversionClass()
returned by the type inference. For example, ifDataTypes.TIMESTAMP(3).bridgedTo(java.sql.Timestamp.class)
is an expected argument type, the method must accept a calleval(java.sql.Timestamp)
.Regular Java calling semantics (including type widening and autoboxing) are applied when calling an implementation method which means that the signature can be
eval(java.lang.Object)
.The runtime will take care of converting the data to the data format specified by the
DataType.getConversionClass()
coming from the type inference logic.- Specified by:
getTypeInference
in interfaceFunctionDefinition
- Specified by:
getTypeInference
in classUserDefinedFunction
-
-