Interface Procedure
-
@PublicEvolving public interface ProcedureBase interface representing a stored procedure that can be executed by Flink. An stored procedure accepts zero, one, or multiple input parameters and then return the execution result of the stored procedure.The behavior of
Procedurecan be defined by implements a custom call method. An call method must be declared publicly, not static, and namedcall. Call methods can also be overloaded by implementing multiple methods namedcall. Currently, it doesn't allow users to custom their own procedure, the customerProcedurecan only be provided byCatalog. To provideProcedure,Catalogmust implementCatalog.getProcedure(ObjectPath).When calling a stored procedure, Flink will always pass the
org.apache.flink.table.procedure.ProcedureContextwhich provides StreamExecutionEnvironment currently as the first parameter of thecallmethod. So, the customcallmethod must accept theorg.apache.flink.table.procedure.ProcedureContextas the first parameter, and the other parameters of thecallmethod are the actual parameter of the stored procedure.By default, input and output data types are automatically extracted using reflection. The input arguments are derived from one or more
call()methods. If the reflective information is not sufficient, it can be supported and enriched withDataTypeHintandProcedureHint. IfProcedureHintis used to hint input arguments, it should only hint the input arguments that start from the second argument since the first argument is alwaysProcedureContextwhich doesn't need to be annotated with data type hint.Note: The return type of the
call()method should always be T[] where T can be an atomic type, Row, Pojo. An atomic type will be implicitly wrapped into a row consisting of one field. Also, theDataTypeHintfor output data type is used to hint T.The following examples with pseudocode show how to write a stored procedure:
// a stored procedure that tries to rewrite data files for iceberg, it accept STRING // and return an array of explicit ROW < STRING, STRING >. class IcebergRewriteDataFilesProcedure implements Procedure { public @DataTypeHint("ROW< rewritten_data_files_count STRING, added_data_files_count STRING >") Row[] call(ProcedureContext procedureContext, String tableName) { // plan for scanning the table to do rewriting Table table = loadTable(tableName); List<CombinedScanTask> combinedScanTasks = planScanTask(table); // now, rewrite the files according to the planning task StreamExecutionEnvironment env = procedureContext.getExecutionEnvironment(); DataStream<CombinedScanTask> dataStream = env.fromCollection(combinedScanTasks); RowDataRewriter rowDataRewriter = new RowDataRewriter(table(), caseSensitive(), fileIO(), encryptionManager()); List<DataFile> addedDataFiles; try { addedDataFiles = rowDataRewriter.rewriteDataForTasks(dataStream, parallelism); } catch (Exception e) { throw new RuntimeException("Rewrite data file error.", e); } // replace the current files List<DataFile> currentDataFiles = combinedScanTasks.stream() .flatMap(tasks -> tasks.files().stream().map(FileScanTask::file)) .collect(Collectors.toList()); replaceDataFiles(currentDataFiles, addedDataFiles, startingSnapshotId); // return the result for rewriting return new Row[] {Row.of(currentDataFiles.size(), addedDataFiles.size())}; } } // a stored procedure that accepts < STRING, LONG > and // return an array of STRING without datatype hint. class RollbackToSnapShotProcedure implements Procedure { public String[] call(ProcedureContext procedureContext, String tableName, Long snapshot) { Table table = loadTable(tableName); Long previousSnapShotId = table.currentSnapshot(); table.manageSnapshots().rollbackTo(snapshotId).commit(); return new String[] { "previous_snapshot_id: " + previousSnapShotId, "current_snapshot_id " + snapshot }; } }In term of the API, a stored procedure can be used as follows:
// for SQL users TableEnvironment tEnv = ... tEnv.executeSql("CALL rollback_to_snapshot('t', 1001)");