Class ProcessTableFunction<T>

  • Type Parameters:
    T - The type of the output row. Either an explicit composite type or an atomic type that is implicitly wrapped into a row consisting of one field.
    All Implemented Interfaces:
    Serializable, FunctionDefinition

    @PublicEvolving
    public abstract class ProcessTableFunction<T>
    extends UserDefinedFunction
    Base class for a user-defined process table function. A process table function (PTF) maps zero, one, or multiple tables to zero, one, or multiple rows (or structured types). Scalar arguments are also supported. If the output record consists of only one field, the wrapper can be omitted, and a scalar value can be emitted that will be implicitly wrapped into a row by the runtime.

    PTFs are the most powerful function kind for Flink SQL and Table API. They enable implementing user-defined operators that can be as feature-rich as built-in operations. PTFs have access to Flink's managed state, event-time and timer services, underlying table changelogs, and can take multiple ordered and/or partitioned tables to produce a new table.

    Table Semantics and Virtual Processors

    PTFs can produce a new table by consuming tables as arguments. For scalability, input tables are distributed across so-called "virtual processors". A virtual processor, as defined by the SQL standard, executes a PTF instance and has access only to a portion of the entire table. The argument declaration decides about the size of the portion and co-location of data. Conceptually, tables can be processed either "as row" (i.e. with row semantics) or "as set" (i.e. with set semantics).

    Table Argument with Row Semantics

    A PTF that takes a table with row semantics assumes that there is no correlation between rows and each row can be processed independently. The framework is free in how to distribute rows across virtual processors and each virtual processor has access only to the currently processed row.

    Table Argument with Set Semantics

    A PTF that takes a table with set semantics assumes that there is a correlation between rows. When calling the function, the PARTITION BY clause defines the columns for correlation. The framework ensures that all rows belonging to same set are co-located. A PTF instance is able to access all rows belonging to the same set. In other words: The virtual processor is scoped by a key context.

    It is also possible not to provide a key (ArgumentTrait.OPTIONAL_PARTITION_BY), in which case only one virtual processor handles the entire table, thereby losing scalability benefits.

    Implementation

    The behavior of a ProcessTableFunction can be defined by implementing a custom evaluation method. The evaluation method must be declared publicly, not static, and named eval. Overloading is not supported.

    For storing a user-defined function in a catalog, the class must have a default constructor and must be instantiable during runtime. Anonymous functions in Table API can only be persisted if the function object is not stateful (i.e. containing only transient and static fields).

    Data Types

    By default, input and output data types are automatically extracted using reflection. This includes the generic argument T of the class for determining an output data type. Input arguments are derived from the eval() method. If the reflective information is not sufficient, it can be supported and enriched with FunctionHint, ArgumentHint, and DataTypeHint annotations.

    The following examples show how to specify data types:

    {@code
     // Function that accepts two scalar INT arguments and emits them as an implicit ROW < INT >
     class AdditionFunction extends ProcessTableFunction {
       public void eval(Integer a, Integer b) {
         collect(a + b);
       }
     }
    
     // Function that produces an explicit ROW < i INT, s STRING > from arguments, the function hint helps in
     // declaring the row's fields
    See Also:
    Serialized Form
    • Constructor Detail

      • ProcessTableFunction

        public ProcessTableFunction()
    • Method Detail

      • setCollector

        public final void setCollector​(org.apache.flink.util.Collector<T> collector)
        Internal use. Sets the current collector.
      • collect

        protected final void collect​(T row)
        Emits an (implicit or explicit) output row.

        If null is emitted as an explicit row, it will be skipped by the runtime. For implicit rows, the row's field will be null.

        Parameters:
        row - the output row
      • getKind

        public final FunctionKind getKind()
        Description copied from interface: FunctionDefinition
        Returns the kind of function this definition describes.
      • getTypeInference

        public TypeInference getTypeInference​(DataTypeFactory typeFactory)
        Description copied from class: UserDefinedFunction
        Returns the logic for performing type inference of a call to this function definition.

        The type inference process is responsible for inferring unknown types of input arguments, validating input arguments, and producing result types. The type inference process happens independent of a function body. The output of the type inference is used to search for a corresponding runtime implementation.

        Instances of type inference can be created by using TypeInference.newBuilder().

        See BuiltInFunctionDefinitions for concrete usage examples.

        The type inference for user-defined functions is automatically extracted using reflection. It does this by analyzing implementation methods such as eval() or accumulate() and the generic parameters of a function class if present. If the reflective information is not sufficient, it can be supported and enriched with DataTypeHint and FunctionHint annotations.

        Note: Overriding this method is only recommended for advanced users. If a custom type inference is specified, it is the responsibility of the implementer to make sure that the output of the type inference process matches with the implementation method:

        The implementation method must comply with each DataType.getConversionClass() returned by the type inference. For example, if DataTypes.TIMESTAMP(3).bridgedTo(java.sql.Timestamp.class) is an expected argument type, the method must accept a call eval(java.sql.Timestamp).

        Regular Java calling semantics (including type widening and autoboxing) are applied when calling an implementation method which means that the signature can be eval(java.lang.Object).

        The runtime will take care of converting the data to the data format specified by the DataType.getConversionClass() coming from the type inference logic.

        Specified by:
        getTypeInference in interface FunctionDefinition
        Specified by:
        getTypeInference in class UserDefinedFunction