public class BigQueryIO extends Object
PTransforms for reading and writing
BigQuery tables.
projectId: the Cloud project id (defaults to
GcpOptions.getProject()).
datasetId: the BigQuery dataset id, unique within a project.
tableId: a table id, unique within a dataset.
BigQuery table references are stored as a TableReference, which comes
from the
BigQuery Java Client API.
Tables can be referred to as Strings, with or without the projectId.
A helper function is provided (parseTableSpec(String))
that parses the following string forms into a TableReference:
project_id]:[dataset_id].[table_id]
dataset_id].[table_id]
BigQueryIO.Read transformation.
This produces a PCollection<TableRow> as output:
PCollection<TableRow> shakespeare = pipeline.apply(
BigQueryIO.Read
.named("Read")
.from("clouddataflow-readonly:samples.weather_stations");
Users may provide a query to read from rather than reading all of a BigQuery table. If specified, the result obtained by executing the specified query will be used as the data of the input transform.
PCollection<TableRow> shakespeare = pipeline.apply(
BigQueryIO.Read
.named("Read")
.fromQuery("SELECT year, mean_temp FROM samples.weather_stations");
When creating a BigQuery input transform, users should provide either a query or a table. Pipeline construction will fail with a validation error if neither or both are specified.
BigQueryIO.Write transformation.
This consumes a PCollection<TableRow> as input.
PCollection<TableRow> quotes = ...
List<TableFieldSchema> fields = new ArrayList<>();
fields.add(new TableFieldSchema().setName("source").setType("STRING"));
fields.add(new TableFieldSchema().setName("quote").setType("STRING"));
TableSchema schema = new TableSchema().setFields(fields);
quotes.apply(BigQueryIO.Write
.named("Write")
.to("my-project:output.output_table")
.withSchema(schema)
.withWriteDisposition(BigQueryIO.Write.WriteDisposition.WRITE_TRUNCATE));
See BigQueryIO.Write for details on how to specify if a write should
append to an existing table, replace the table, or verify that the table is
empty. Note that the dataset being written to must already exist. Write
dispositions are not supported in streaming mode.
BigQueryIO.Write.to(SerializableFunction)
accepts a function mapping the current window to a tablespec. For example,
here's code that outputs daily tables to BigQuery:
PCollection<TableRow> quotes = ...
quotes.apply(Window.<TableRow>info(CalendarWindows.days(1)))
.apply(BigQueryIO.Write
.named("Write")
.withSchema(schema)
.to(new SerializableFunction<BoundedWindow, String>() {
public String apply(BoundedWindow window) {
String dayString = DateTimeFormat.forPattern("yyyy_MM_dd").parseDateTime(
((DaysWindow) window).getStartDate());
return "my-project:output.output_table_" + dayString;
}
}));
Per-window tables are not yet supported in batch mode.
Please see BigQuery Access Control for security and permission related information specific to BigQuery.
| Modifier and Type | Class and Description |
|---|---|
static class |
BigQueryIO.Read
A
PTransform that reads from a BigQuery table and returns a
PCollection of TableRows containing each of the rows of the table. |
static class |
BigQueryIO.ShardedKeyCoder<KeyT>
|
static class |
BigQueryIO.Write
|
| Modifier and Type | Field and Description |
|---|---|
static String |
SET_PROJECT_FROM_OPTIONS_WARNING |
| Constructor and Description |
|---|
BigQueryIO() |
| Modifier and Type | Method and Description |
|---|---|
static com.google.api.services.bigquery.model.TableReference |
parseTableSpec(String tableSpec)
Parse a table specification in the form
"[project_id]:[dataset_id].[table_id]" or "[dataset_id].[table_id]".
|
static String |
toTableSpec(com.google.api.services.bigquery.model.TableReference ref)
Returns a canonical string representation of the TableReference.
|
static void |
verifyDatasetPresence(BigQueryOptions options,
com.google.api.services.bigquery.model.TableReference table) |
static void |
verifyTablePresence(BigQueryOptions options,
com.google.api.services.bigquery.model.TableReference table) |
public static final String SET_PROJECT_FROM_OPTIONS_WARNING
public static com.google.api.services.bigquery.model.TableReference parseTableSpec(String tableSpec)
If the project id is omitted, the default project id is used.
public static String toTableSpec(com.google.api.services.bigquery.model.TableReference ref)
public static void verifyDatasetPresence(BigQueryOptions options, com.google.api.services.bigquery.model.TableReference table)
public static void verifyTablePresence(BigQueryOptions options, com.google.api.services.bigquery.model.TableReference table)