Interface Procedure
-
@PublicEvolving public interface Procedure
Base interface representing a stored procedure that can be executed by Flink. An stored procedure accepts zero, one, or multiple input parameters and then return the execution result of the stored procedure.The behavior of
Procedure
can be defined by implements a custom call method. An call method must be declared publicly, not static, and namedcall
. Call methods can also be overloaded by implementing multiple methods namedcall
. Currently, it doesn't allow users to custom their own procedure, the customerProcedure
can only be provided byCatalog
. To provideProcedure
,Catalog
must implementCatalog.getProcedure(ObjectPath)
.When calling a stored procedure, Flink will always pass the
org.apache.flink.table.procedure.ProcedureContext
which provides StreamExecutionEnvironment currently as the first parameter of thecall
method. So, the customcall
method must accept theorg.apache.flink.table.procedure.ProcedureContext
as the first parameter, and the other parameters of thecall
method are the actual parameter of the stored procedure.By default, input and output data types are automatically extracted using reflection. The input arguments are derived from one or more
call()
methods. If the reflective information is not sufficient, it can be supported and enriched withDataTypeHint
andProcedureHint
. IfProcedureHint
is used to hint input arguments, it should only hint the input arguments that start from the second argument since the first argument is alwaysProcedureContext
which doesn't need to be annotated with data type hint.Note: The return type of the
call()
method should always be T[] where T can be an atomic type, Row, Pojo. An atomic type will be implicitly wrapped into a row consisting of one field. Also, theDataTypeHint
for output data type is used to hint T.The following examples with pseudocode show how to write a stored procedure:
// a stored procedure that tries to rewrite data files for iceberg, it accept STRING // and return an array of explicit ROW < STRING, STRING >. class IcebergRewriteDataFilesProcedure implements Procedure { public @DataTypeHint("ROW< rewritten_data_files_count STRING, added_data_files_count STRING >") Row[] call(ProcedureContext procedureContext, String tableName) { // plan for scanning the table to do rewriting Table table = loadTable(tableName); List<CombinedScanTask> combinedScanTasks = planScanTask(table); // now, rewrite the files according to the planning task StreamExecutionEnvironment env = procedureContext.getExecutionEnvironment(); DataStream<CombinedScanTask> dataStream = env.fromCollection(combinedScanTasks); RowDataRewriter rowDataRewriter = new RowDataRewriter(table(), caseSensitive(), fileIO(), encryptionManager()); List<DataFile> addedDataFiles; try { addedDataFiles = rowDataRewriter.rewriteDataForTasks(dataStream, parallelism); } catch (Exception e) { throw new RuntimeException("Rewrite data file error.", e); } // replace the current files List<DataFile> currentDataFiles = combinedScanTasks.stream() .flatMap(tasks -> tasks.files().stream().map(FileScanTask::file)) .collect(Collectors.toList()); replaceDataFiles(currentDataFiles, addedDataFiles, startingSnapshotId); // return the result for rewriting return new Row[] {Row.of(currentDataFiles.size(), addedDataFiles.size())}; } } // a stored procedure that accepts < STRING, LONG > and // return an array of STRING without datatype hint. class RollbackToSnapShotProcedure implements Procedure { public String[] call(ProcedureContext procedureContext, String tableName, Long snapshot) { Table table = loadTable(tableName); Long previousSnapShotId = table.currentSnapshot(); table.manageSnapshots().rollbackTo(snapshotId).commit(); return new String[] { "previous_snapshot_id: " + previousSnapShotId, "current_snapshot_id " + snapshot }; } }
In term of the API, a stored procedure can be used as follows:
// for SQL users TableEnvironment tEnv = ... tEnv.executeSql("CALL rollback_to_snapshot('t', 1001)");