Class SqlTransform
- java.lang.Object
-
- org.apache.beam.sdk.transforms.PTransform<org.apache.beam.sdk.values.PInput,org.apache.beam.sdk.values.PCollection<org.apache.beam.sdk.values.Row>>
-
- org.apache.beam.sdk.extensions.sql.SqlTransform
-
- All Implemented Interfaces:
java.io.Serializable,org.apache.beam.sdk.transforms.display.HasDisplayData
public abstract class SqlTransform extends org.apache.beam.sdk.transforms.PTransform<org.apache.beam.sdk.values.PInput,org.apache.beam.sdk.values.PCollection<org.apache.beam.sdk.values.Row>>SqlTransformis the DSL interface of Beam SQL. It translates a SQL query as aPTransform, so developers can use standard SQL queries in a Beam pipeline.Beam SQL DSL usage:
A typical pipeline with Beam SQL DSL is:
PipelineOptions options = PipelineOptionsFactory.create(); Pipeline p = Pipeline.create(options); //create table from TextIO; PCollection<Row> inputTableA = p.apply(TextIO.read().from("/my/input/patha")).apply(...); PCollection<Row> inputTableB = p.apply(TextIO.read().from("/my/input/pathb")).apply(...); //run a simple query, and register the output as a table in BeamSql; String sql1 = "select MY_FUNC(c1), c2 from PCOLLECTION"; PCollection<Row> outputTableA = inputTableA.apply( SqlTransform .query(sql1) .addUdf("MY_FUNC", MY_FUNC.class, "FUNC"); //run a JOIN with one table from TextIO, and one table from another query PCollection<Row> outputTableB = PCollectionTuple .of(new TupleTag<>("TABLE_O_A"), outputTableA) .and(new TupleTag<>("TABLE_B"), inputTableB) .apply(SqlTransform.query("select * from TABLE_O_A JOIN TABLE_B where ...")); //output the final result with TextIO outputTableB.apply(...).apply(TextIO.write().to("/my/output/path")); p.run().waitUntilFinish();A typical pipeline with Beam SQL DDL and DSL is:
PipelineOptions options = PipelineOptionsFactory.create(); Pipeline p = Pipeline.create(options); String sql1 = "INSERT INTO pubsub_sink SELECT * FROM pubsub_source"; String ddlSource = "CREATE EXTERNAL TABLE pubsub_source(" + "attributes MAP<VARCHAR, VARCHAR>, payload ROW<name VARCHAR, size INTEGER>)" + "TYPE pubsub LOCATION 'projects/myproject/topics/topic1'"; String ddlSink = "CREATE EXTERNAL TABLE pubsub_sink(" + "attributes MAP<VARCHAR, VARCHAR>, payload ROW<name VARCHAR, size INTEGER>)" + "TYPE pubsub LOCATION 'projects/myproject/topics/mytopic'"; p.apply(SqlTransform.query(sql1).withDdlString(ddlSource).withDdlString(ddlSink)) p.run().waitUntilFinish();- See Also:
- Serialized Form
-
-
Field Summary
Fields Modifier and Type Field Description static java.lang.StringPCOLLECTION_NAME
-
Constructor Summary
Constructors Constructor Description SqlTransform()
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description org.apache.beam.sdk.values.PCollection<org.apache.beam.sdk.values.Row>expand(org.apache.beam.sdk.values.PInput input)static SqlTransformquery(java.lang.String queryString)Returns aSqlTransformrepresenting an equivalent execution plan.SqlTransformregisterUdaf(java.lang.String functionName, org.apache.beam.sdk.transforms.Combine.CombineFn combineFn)register aCombine.CombineFnas UDAF function used in this query.SqlTransformregisterUdf(java.lang.String functionName, java.lang.Class<? extends BeamSqlUdf> clazz)register a UDF function used in this query.SqlTransformregisterUdf(java.lang.String functionName, org.apache.beam.sdk.transforms.SerializableFunction sfn)RegisterSerializableFunctionas a UDF function used in this query.SqlTransformwithAutoLoading(boolean autoLoading)SqlTransformwithDdlString(java.lang.String ddlString)SqlTransformwithDefaultTableProvider(java.lang.String name, TableProvider tableProvider)SqlTransformwithErrorsTransformer(org.apache.beam.sdk.transforms.PTransform<org.apache.beam.sdk.values.PCollection<org.apache.beam.sdk.values.Row>,? extends org.apache.beam.sdk.values.POutput> errorsTransformer)SqlTransformwithNamedParameters(java.util.Map<java.lang.String,?> parameters)SqlTransformwithPositionalParameters(java.util.List<?> parameters)SqlTransformwithQueryPlannerClass(java.lang.Class<? extends QueryPlanner> clazz)SqlTransformwithTableProvider(java.lang.String name, TableProvider tableProvider)-
Methods inherited from class org.apache.beam.sdk.transforms.PTransform
addAnnotation, compose, compose, getAdditionalInputs, getAnnotations, getDefaultOutputCoder, getDefaultOutputCoder, getDefaultOutputCoder, getKindString, getName, getResourceHints, populateDisplayData, setDisplayData, setResourceHints, toString, validate, validate
-
-
-
-
Field Detail
-
PCOLLECTION_NAME
public static final java.lang.String PCOLLECTION_NAME
- See Also:
- Constant Field Values
-
-
Method Detail
-
expand
public org.apache.beam.sdk.values.PCollection<org.apache.beam.sdk.values.Row> expand(org.apache.beam.sdk.values.PInput input)
- Specified by:
expandin classorg.apache.beam.sdk.transforms.PTransform<org.apache.beam.sdk.values.PInput,org.apache.beam.sdk.values.PCollection<org.apache.beam.sdk.values.Row>>
-
query
public static SqlTransform query(java.lang.String queryString)
Returns aSqlTransformrepresenting an equivalent execution plan.The
SqlTransformcan be applied to aPCollectionorPCollectionTuplerepresenting all the input tables.The
PTransformoutputs aPCollectionofRow.If the
PTransformis applied toPCollectionthen it gets registered with name PCOLLECTION.If the
PTransformis applied toPCollectionTuplethenTupleTag.getId()is used as the correspondingPCollections name.- If the sql query only uses a subset of tables from the upstream
PCollectionTuple, this is valid; - If the sql query references a table not included in the upstream
PCollectionTuple, anIllegalStateExceptionis thrown during query validati on; - Always, tables from the upstream
PCollectionTupleare only valid in the scope of the current query call.
Any available implementation of
QueryPlannercan be used as the query planner inSqlTransform. An implementation can be specified globally for the entire pipeline withBeamSqlPipelineOptions.getPlannerName(). The global planner can be overridden per-transform withwithQueryPlannerClass(Class). - If the sql query only uses a subset of tables from the upstream
-
withTableProvider
public SqlTransform withTableProvider(java.lang.String name, TableProvider tableProvider)
-
withDefaultTableProvider
public SqlTransform withDefaultTableProvider(java.lang.String name, TableProvider tableProvider)
-
withQueryPlannerClass
public SqlTransform withQueryPlannerClass(java.lang.Class<? extends QueryPlanner> clazz)
-
withNamedParameters
public SqlTransform withNamedParameters(java.util.Map<java.lang.String,?> parameters)
-
withPositionalParameters
public SqlTransform withPositionalParameters(java.util.List<?> parameters)
-
withDdlString
public SqlTransform withDdlString(java.lang.String ddlString)
-
withAutoLoading
public SqlTransform withAutoLoading(boolean autoLoading)
-
registerUdf
public SqlTransform registerUdf(java.lang.String functionName, java.lang.Class<? extends BeamSqlUdf> clazz)
register a UDF function used in this query.Refer to
BeamSqlUdffor more about how to implement a UDF in BeamSql.
-
registerUdf
public SqlTransform registerUdf(java.lang.String functionName, org.apache.beam.sdk.transforms.SerializableFunction sfn)
RegisterSerializableFunctionas a UDF function used in this query. Note,SerializableFunctionmust have a constructor without arguments.
-
registerUdaf
public SqlTransform registerUdaf(java.lang.String functionName, org.apache.beam.sdk.transforms.Combine.CombineFn combineFn)
register aCombine.CombineFnas UDAF function used in this query.
-
withErrorsTransformer
public SqlTransform withErrorsTransformer(org.apache.beam.sdk.transforms.PTransform<org.apache.beam.sdk.values.PCollection<org.apache.beam.sdk.values.Row>,? extends org.apache.beam.sdk.values.POutput> errorsTransformer)
-
-