Class BigQueryIO.Write<T>
- java.lang.Object
-
- org.apache.beam.sdk.transforms.PTransform<org.apache.beam.sdk.values.PCollection<T>,WriteResult>
-
- org.apache.beam.sdk.io.gcp.bigquery.BigQueryIO.Write<T>
-
- All Implemented Interfaces:
java.io.Serializable
,org.apache.beam.sdk.transforms.display.HasDisplayData
- Enclosing class:
- BigQueryIO
public abstract static class BigQueryIO.Write<T> extends org.apache.beam.sdk.transforms.PTransform<org.apache.beam.sdk.values.PCollection<T>,WriteResult>
Implementation ofBigQueryIO.write()
.- See Also:
- Serialized Form
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static class
BigQueryIO.Write.CreateDisposition
An enumeration type for the BigQuery create disposition strings.static class
BigQueryIO.Write.Method
Determines the method used to insert data in BigQuery.static class
BigQueryIO.Write.SchemaUpdateOption
An enumeration type for the BigQuery schema update options strings.static class
BigQueryIO.Write.WriteDisposition
An enumeration type for the BigQuery write disposition strings.
-
Constructor Summary
Constructors Constructor Description Write()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description WriteResult
expand(org.apache.beam.sdk.values.PCollection<T> input)
@Nullable org.apache.beam.sdk.options.ValueProvider<com.google.api.services.bigquery.model.TableReference>
getTable()
Returns the table reference, ornull
.BigQueryIO.Write<T>
ignoreInsertIds()
Setting this option to true disables insertId based data deduplication offered by BigQuery.BigQueryIO.Write<T>
ignoreUnknownValues()
Accept rows that contain values that do not match the schema.BigQueryIO.Write<T>
optimizedWrites()
If true, enables new codepaths that are expected to use less resources while writing to BigQuery.void
populateDisplayData(org.apache.beam.sdk.transforms.display.DisplayData.Builder builder)
BigQueryIO.Write<T>
skipInvalidRows()
Insert all valid rows of a request, even if invalid rows exist.BigQueryIO.Write<T>
to(com.google.api.services.bigquery.model.TableReference table)
Writes to the given table, specified as aTableReference
.BigQueryIO.Write<T>
to(java.lang.String tableSpec)
Writes to the given table, specified in the format described inBigQueryHelpers.parseTableSpec(java.lang.String)
.BigQueryIO.Write<T>
to(DynamicDestinations<T,?> dynamicDestinations)
Writes to the table and schema specified by theDynamicDestinations
object.BigQueryIO.Write<T>
to(org.apache.beam.sdk.options.ValueProvider<java.lang.String> tableSpec)
Same asto(String)
, but with aValueProvider
.BigQueryIO.Write<T>
to(org.apache.beam.sdk.transforms.SerializableFunction<org.apache.beam.sdk.values.ValueInSingleWindow<T>,TableDestination> tableFunction)
Writes to table specified by the specified table function.BigQueryIO.Write<T>
useAvroLogicalTypes()
Enables interpreting logical types into their corresponding types (ie.BigQueryIO.Write<T>
useBeamSchema()
If true, then the BigQuery schema will be inferred from the input schema.void
validate(org.apache.beam.sdk.options.PipelineOptions pipelineOptions)
BigQueryIO.Write<T>
withAutoSchemaUpdate(boolean autoSchemaUpdate)
If true, enables automatically detecting BigQuery table schema updates.BigQueryIO.Write<T>
withAutoSharding()
If true, enables using a dynamically determined number of shards to write to BigQuery.BigQueryIO.Write<T>
withAvroFormatFunction(org.apache.beam.sdk.transforms.SerializableFunction<AvroWriteRequest<T>,org.apache.avro.generic.GenericRecord> avroFormatFunction)
Formats the user's type into aGenericRecord
to be written to BigQuery.BigQueryIO.Write<T>
withAvroSchemaFactory(org.apache.beam.sdk.transforms.SerializableFunction<@Nullable com.google.api.services.bigquery.model.TableSchema,org.apache.avro.Schema> avroSchemaFactory)
Uses the specified function to convert aTableSchema
to aSchema
.BigQueryIO.Write<T>
withAvroWriter(org.apache.beam.sdk.transforms.SerializableFunction<org.apache.avro.Schema,org.apache.avro.io.DatumWriter<T>> writerFactory)
Writes the user's type as avro using the suppliedDatumWriter
.<AvroT> BigQueryIO.Write<T>
withAvroWriter(org.apache.beam.sdk.transforms.SerializableFunction<AvroWriteRequest<T>,AvroT> avroFormatFunction, org.apache.beam.sdk.transforms.SerializableFunction<org.apache.avro.Schema,org.apache.avro.io.DatumWriter<AvroT>> writerFactory)
Convert's the user's type to an avro record using the supplied avroFormatFunction.BigQueryIO.Write<T>
withClustering()
Allows writing to clustered tables whento(SerializableFunction)
orto(DynamicDestinations)
is used.BigQueryIO.Write<T>
withClustering(com.google.api.services.bigquery.model.Clustering clustering)
Specifies the clustering fields to use when writing to a single output table.BigQueryIO.Write<T>
withCreateDisposition(BigQueryIO.Write.CreateDisposition createDisposition)
Specifies whether the table should be created if it does not exist.BigQueryIO.Write<T>
withCustomGcsTempLocation(org.apache.beam.sdk.options.ValueProvider<java.lang.String> customGcsTempLocation)
Provides a custom location on GCS for storing temporary files to be loaded via BigQuery batch load jobs.BigQueryIO.Write<T>
withDeterministicRecordIdFn(org.apache.beam.sdk.transforms.SerializableFunction<T,java.lang.String> toUniqueIdFunction)
BigQueryIO.Write<T>
withExtendedErrorInfo()
Enables extended error information by enablingWriteResult.getFailedInsertsWithErr()
BigQueryIO.Write<T>
withFailedInsertRetryPolicy(InsertRetryPolicy retryPolicy)
Specfies a policy for handling failed inserts.BigQueryIO.Write<T>
withFormatFunction(org.apache.beam.sdk.transforms.SerializableFunction<T,com.google.api.services.bigquery.model.TableRow> formatFunction)
Formats the user's type into aTableRow
to be written to BigQuery.BigQueryIO.Write<T>
withFormatRecordOnFailureFunction(org.apache.beam.sdk.transforms.SerializableFunction<T,com.google.api.services.bigquery.model.TableRow> formatFunction)
If an insert failure occurs, this function is applied to the originally supplied row T.BigQueryIO.Write<T>
withJsonSchema(java.lang.String jsonSchema)
Similar towithSchema(TableSchema)
but takes in a JSON-serializedTableSchema
.BigQueryIO.Write<T>
withJsonSchema(org.apache.beam.sdk.options.ValueProvider<java.lang.String> jsonSchema)
Same aswithJsonSchema(String)
but using a deferredValueProvider
.BigQueryIO.Write<T>
withJsonTimePartitioning(org.apache.beam.sdk.options.ValueProvider<java.lang.String> partitioning)
The same aswithTimePartitioning(com.google.api.services.bigquery.model.TimePartitioning)
, but takes a JSON-serialized object.BigQueryIO.Write<T>
withKmsKey(java.lang.String kmsKey)
BigQueryIO.Write<T>
withLoadJobProjectId(java.lang.String loadJobProjectId)
Set the project the BigQuery load job will be initiated from.BigQueryIO.Write<T>
withLoadJobProjectId(org.apache.beam.sdk.options.ValueProvider<java.lang.String> loadJobProjectId)
BigQueryIO.Write<T>
withMaxBytesPerPartition(long maxBytesPerPartition)
Control how much data will be assigned to a single BigQuery load job.BigQueryIO.Write<T>
withMaxFilesPerBundle(int maxFilesPerBundle)
Control how many files will be written concurrently by a single worker when using BigQuery load jobs before spilling to a shuffle.BigQueryIO.Write<T>
withMethod(BigQueryIO.Write.Method method)
Choose the method used to write data to BigQuery.BigQueryIO.Write<T>
withNumFileShards(int numFileShards)
Control how many file shards are written when using BigQuery load jobs.BigQueryIO.Write<T>
withNumStorageWriteApiStreams(int numStorageWriteApiStreams)
Control how many parallel streams are used when using Storage API writes.BigQueryIO.Write<T>
withoutValidation()
Disables BigQuery table validation.BigQueryIO.Write<T>
withSchema(com.google.api.services.bigquery.model.TableSchema schema)
Uses the specified schema for rows to be written.BigQueryIO.Write<T>
withSchema(org.apache.beam.sdk.options.ValueProvider<com.google.api.services.bigquery.model.TableSchema> schema)
Same aswithSchema(TableSchema)
but using a deferredValueProvider
.BigQueryIO.Write<T>
withSchemaFromView(org.apache.beam.sdk.values.PCollectionView<java.util.Map<java.lang.String,java.lang.String>> view)
Allows the schemas for each table to be computed within the pipeline itself.BigQueryIO.Write<T>
withSchemaUpdateOptions(java.util.Set<BigQueryIO.Write.SchemaUpdateOption> schemaUpdateOptions)
Allows the schema of the destination table to be updated as a side effect of the write.BigQueryIO.Write<T>
withSuccessfulInsertsPropagation(boolean propagateSuccessful)
If true, it enables the propagation of the successfully inserted TableRows on BigQuery as part of theWriteResult
object when usingBigQueryIO.Write.Method.STREAMING_INSERTS
.BigQueryIO.Write<T>
withTableDescription(java.lang.String tableDescription)
Specifies the table description.BigQueryIO.Write<T>
withTestServices(BigQueryServices testServices)
BigQueryIO.Write<T>
withTimePartitioning(com.google.api.services.bigquery.model.TimePartitioning partitioning)
Allows newly created tables to include aTimePartitioning
class.BigQueryIO.Write<T>
withTimePartitioning(org.apache.beam.sdk.options.ValueProvider<com.google.api.services.bigquery.model.TimePartitioning> partitioning)
LikewithTimePartitioning(TimePartitioning)
but using a deferredValueProvider
.BigQueryIO.Write<T>
withTriggeringFrequency(org.joda.time.Duration triggeringFrequency)
Choose the frequency at which file writes are triggered.BigQueryIO.Write<T>
withWriteDisposition(BigQueryIO.Write.WriteDisposition writeDisposition)
Specifies what to do with existing data in the table, in case the table already exists.BigQueryIO.Write<T>
withWriteTempDataset(java.lang.String writeTempDataset)
Temporary dataset.
-
-
-
Method Detail
-
to
public BigQueryIO.Write<T> to(java.lang.String tableSpec)
Writes to the given table, specified in the format described inBigQueryHelpers.parseTableSpec(java.lang.String)
.
-
to
public BigQueryIO.Write<T> to(com.google.api.services.bigquery.model.TableReference table)
Writes to the given table, specified as aTableReference
.
-
to
public BigQueryIO.Write<T> to(org.apache.beam.sdk.options.ValueProvider<java.lang.String> tableSpec)
Same asto(String)
, but with aValueProvider
.
-
to
public BigQueryIO.Write<T> to(org.apache.beam.sdk.transforms.SerializableFunction<org.apache.beam.sdk.values.ValueInSingleWindow<T>,TableDestination> tableFunction)
Writes to table specified by the specified table function. The table is a function ofValueInSingleWindow
, so can be determined by the value or by the window.If the function produces destinations configured with clustering fields, ensure that
withClustering()
is also set so that the clustering configurations get properly encoded and decoded.
-
to
public BigQueryIO.Write<T> to(DynamicDestinations<T,?> dynamicDestinations)
Writes to the table and schema specified by theDynamicDestinations
object.If any of the returned destinations are configured with clustering fields, ensure that the passed
DynamicDestinations
object returnsTableDestinationCoderV3
whenDynamicDestinations.getDestinationCoder()
is called.
-
withFormatFunction
public BigQueryIO.Write<T> withFormatFunction(org.apache.beam.sdk.transforms.SerializableFunction<T,com.google.api.services.bigquery.model.TableRow> formatFunction)
Formats the user's type into aTableRow
to be written to BigQuery.
-
withFormatRecordOnFailureFunction
public BigQueryIO.Write<T> withFormatRecordOnFailureFunction(org.apache.beam.sdk.transforms.SerializableFunction<T,com.google.api.services.bigquery.model.TableRow> formatFunction)
If an insert failure occurs, this function is applied to the originally supplied row T. The resultingTableRow
will be accessed viaWriteResult.getFailedInsertsWithErr()
.
-
withAvroFormatFunction
public BigQueryIO.Write<T> withAvroFormatFunction(org.apache.beam.sdk.transforms.SerializableFunction<AvroWriteRequest<T>,org.apache.avro.generic.GenericRecord> avroFormatFunction)
Formats the user's type into aGenericRecord
to be written to BigQuery. The GenericRecords are written as avro using the standardGenericDatumWriter
.This is mutually exclusive with
withFormatFunction(org.apache.beam.sdk.transforms.SerializableFunction<T, com.google.api.services.bigquery.model.TableRow>)
, only one may be set.
-
withAvroWriter
public BigQueryIO.Write<T> withAvroWriter(org.apache.beam.sdk.transforms.SerializableFunction<org.apache.avro.Schema,org.apache.avro.io.DatumWriter<T>> writerFactory)
Writes the user's type as avro using the suppliedDatumWriter
.This is mutually exclusive with
withFormatFunction(org.apache.beam.sdk.transforms.SerializableFunction<T, com.google.api.services.bigquery.model.TableRow>)
, only one may be set.Overwrites
withAvroFormatFunction(org.apache.beam.sdk.transforms.SerializableFunction<org.apache.beam.sdk.io.gcp.bigquery.AvroWriteRequest<T>, org.apache.avro.generic.GenericRecord>)
if it has been set.
-
withAvroWriter
public <AvroT> BigQueryIO.Write<T> withAvroWriter(org.apache.beam.sdk.transforms.SerializableFunction<AvroWriteRequest<T>,AvroT> avroFormatFunction, org.apache.beam.sdk.transforms.SerializableFunction<org.apache.avro.Schema,org.apache.avro.io.DatumWriter<AvroT>> writerFactory)
Convert's the user's type to an avro record using the supplied avroFormatFunction. Records are then written using the supplied writer instances returned from writerFactory.This is mutually exclusive with
withFormatFunction(org.apache.beam.sdk.transforms.SerializableFunction<T, com.google.api.services.bigquery.model.TableRow>)
, only one may be set.Overwrites
withAvroFormatFunction(org.apache.beam.sdk.transforms.SerializableFunction<org.apache.beam.sdk.io.gcp.bigquery.AvroWriteRequest<T>, org.apache.avro.generic.GenericRecord>)
if it has been set.
-
withAvroSchemaFactory
public BigQueryIO.Write<T> withAvroSchemaFactory(org.apache.beam.sdk.transforms.SerializableFunction<@Nullable com.google.api.services.bigquery.model.TableSchema,org.apache.avro.Schema> avroSchemaFactory)
Uses the specified function to convert aTableSchema
to aSchema
.If not specified, the TableSchema will automatically be converted to an avro schema.
-
withSchema
public BigQueryIO.Write<T> withSchema(com.google.api.services.bigquery.model.TableSchema schema)
Uses the specified schema for rows to be written.The schema is required only if writing to a table that does not already exist, and
BigQueryIO.Write.CreateDisposition
is set toBigQueryIO.Write.CreateDisposition.CREATE_IF_NEEDED
.
-
withSchema
public BigQueryIO.Write<T> withSchema(org.apache.beam.sdk.options.ValueProvider<com.google.api.services.bigquery.model.TableSchema> schema)
Same aswithSchema(TableSchema)
but using a deferredValueProvider
.
-
withJsonSchema
public BigQueryIO.Write<T> withJsonSchema(java.lang.String jsonSchema)
Similar towithSchema(TableSchema)
but takes in a JSON-serializedTableSchema
.
-
withJsonSchema
public BigQueryIO.Write<T> withJsonSchema(org.apache.beam.sdk.options.ValueProvider<java.lang.String> jsonSchema)
Same aswithJsonSchema(String)
but using a deferredValueProvider
.
-
withSchemaFromView
public BigQueryIO.Write<T> withSchemaFromView(org.apache.beam.sdk.values.PCollectionView<java.util.Map<java.lang.String,java.lang.String>> view)
Allows the schemas for each table to be computed within the pipeline itself.The input is a map-valued
PCollectionView
mapping string tablespecs to JSON-formattedTableSchema
s. Tablespecs must be in the same format as taken byto(String)
.
-
withTimePartitioning
public BigQueryIO.Write<T> withTimePartitioning(com.google.api.services.bigquery.model.TimePartitioning partitioning)
Allows newly created tables to include aTimePartitioning
class. Can only be used when writing to a single table. Ifto(SerializableFunction)
orto(DynamicDestinations)
is used to write dynamic tables, time partitioning can be directly set in the returnedTableDestination
.
-
withTimePartitioning
public BigQueryIO.Write<T> withTimePartitioning(org.apache.beam.sdk.options.ValueProvider<com.google.api.services.bigquery.model.TimePartitioning> partitioning)
LikewithTimePartitioning(TimePartitioning)
but using a deferredValueProvider
.
-
withJsonTimePartitioning
public BigQueryIO.Write<T> withJsonTimePartitioning(org.apache.beam.sdk.options.ValueProvider<java.lang.String> partitioning)
The same aswithTimePartitioning(com.google.api.services.bigquery.model.TimePartitioning)
, but takes a JSON-serialized object.
-
withClustering
public BigQueryIO.Write<T> withClustering(com.google.api.services.bigquery.model.Clustering clustering)
Specifies the clustering fields to use when writing to a single output table. Can only be used whenwithTimePartitioning(TimePartitioning)
is set. Ifto(SerializableFunction)
orto(DynamicDestinations)
is used to write to dynamic tables, the fields here will be ignored; callwithClustering()
instead.
-
withClustering
public BigQueryIO.Write<T> withClustering()
Allows writing to clustered tables whento(SerializableFunction)
orto(DynamicDestinations)
is used. The returnedTableDestination
objects should specify the clustering fields per table. If writing to a single table, usewithClustering(Clustering)
instead to pass aClustering
instance that specifies the static clustering fields to use.Setting this option enables use of
TableDestinationCoderV3
which encodes clustering information. The updated coder is compatible with non-clustered tables, so can be freely set for newly deployed pipelines, but note that pipelines using an older coder must be drained before setting this option, sinceTableDestinationCoderV3
will not be able to read state written with a previous version.
-
withCreateDisposition
public BigQueryIO.Write<T> withCreateDisposition(BigQueryIO.Write.CreateDisposition createDisposition)
Specifies whether the table should be created if it does not exist.
-
withWriteDisposition
public BigQueryIO.Write<T> withWriteDisposition(BigQueryIO.Write.WriteDisposition writeDisposition)
Specifies what to do with existing data in the table, in case the table already exists.
-
withSchemaUpdateOptions
public BigQueryIO.Write<T> withSchemaUpdateOptions(java.util.Set<BigQueryIO.Write.SchemaUpdateOption> schemaUpdateOptions)
Allows the schema of the destination table to be updated as a side effect of the write.This configuration applies only when writing to BigQuery with
BigQueryIO.Write.Method.FILE_LOADS
as method.
-
withTableDescription
public BigQueryIO.Write<T> withTableDescription(java.lang.String tableDescription)
Specifies the table description.
-
withFailedInsertRetryPolicy
public BigQueryIO.Write<T> withFailedInsertRetryPolicy(InsertRetryPolicy retryPolicy)
Specfies a policy for handling failed inserts.Currently this only is allowed when writing an unbounded collection to BigQuery. Bounded collections are written using batch load jobs, so we don't get per-element failures. Unbounded collections are written using streaming inserts, so we have access to per-element insert results.
-
withoutValidation
public BigQueryIO.Write<T> withoutValidation()
Disables BigQuery table validation.
-
withMethod
public BigQueryIO.Write<T> withMethod(BigQueryIO.Write.Method method)
Choose the method used to write data to BigQuery. See the Javadoc onBigQueryIO.Write.Method
for information and restrictions of the different methods.
-
withLoadJobProjectId
public BigQueryIO.Write<T> withLoadJobProjectId(java.lang.String loadJobProjectId)
Set the project the BigQuery load job will be initiated from. This is only applicable when the write method is set toBigQueryIO.Write.Method.FILE_LOADS
. If omitted, the project of the destination table is used.
-
withLoadJobProjectId
public BigQueryIO.Write<T> withLoadJobProjectId(org.apache.beam.sdk.options.ValueProvider<java.lang.String> loadJobProjectId)
-
withTriggeringFrequency
public BigQueryIO.Write<T> withTriggeringFrequency(org.joda.time.Duration triggeringFrequency)
Choose the frequency at which file writes are triggered.This is only applicable when the write method is set to
BigQueryIO.Write.Method.FILE_LOADS
, and only when writing an unboundedPCollection
.Every triggeringFrequency duration, a BigQuery load job will be generated for all the data written since the last load job. BigQuery has limits on how many load jobs can be triggered per day, so be careful not to set this duration too low, or you may exceed daily quota. Often this is set to 5 or 10 minutes to ensure that the project stays well under the BigQuery quota. See Quota Policy for more information about BigQuery quotas.
-
withNumFileShards
public BigQueryIO.Write<T> withNumFileShards(int numFileShards)
Control how many file shards are written when using BigQuery load jobs. Applicable only when also settingwithTriggeringFrequency(org.joda.time.Duration)
. To let runner determine the sharding at runtime, setwithAutoSharding()
instead.
-
withNumStorageWriteApiStreams
public BigQueryIO.Write<T> withNumStorageWriteApiStreams(int numStorageWriteApiStreams)
Control how many parallel streams are used when using Storage API writes. Applicable only when also settingwithTriggeringFrequency(org.joda.time.Duration)
. To let runner determine the sharding at runtime, setwithAutoSharding()
instead.
-
withCustomGcsTempLocation
public BigQueryIO.Write<T> withCustomGcsTempLocation(org.apache.beam.sdk.options.ValueProvider<java.lang.String> customGcsTempLocation)
Provides a custom location on GCS for storing temporary files to be loaded via BigQuery batch load jobs. See "Usage with templates" inBigQueryIO
documentation for discussion.
-
withExtendedErrorInfo
public BigQueryIO.Write<T> withExtendedErrorInfo()
Enables extended error information by enablingWriteResult.getFailedInsertsWithErr()
ATM this only works if using
BigQueryIO.Write.Method.STREAMING_INSERTS
. SeewithMethod(Method)
.
-
skipInvalidRows
public BigQueryIO.Write<T> skipInvalidRows()
Insert all valid rows of a request, even if invalid rows exist. This is only applicable when the write method is set toBigQueryIO.Write.Method.STREAMING_INSERTS
. The default value is false, which causes the entire request to fail if any invalid rows exist.
-
ignoreUnknownValues
public BigQueryIO.Write<T> ignoreUnknownValues()
Accept rows that contain values that do not match the schema. The unknown values are ignored. Default is false, which treats unknown values as errors.
-
useAvroLogicalTypes
public BigQueryIO.Write<T> useAvroLogicalTypes()
Enables interpreting logical types into their corresponding types (ie. TIMESTAMP), instead of only using their raw types (ie. LONG).
-
ignoreInsertIds
public BigQueryIO.Write<T> ignoreInsertIds()
Setting this option to true disables insertId based data deduplication offered by BigQuery. For more information, please see https://cloud.google.com/bigquery/streaming-data-into-bigquery#disabling_best_effort_de-duplication.
-
withKmsKey
public BigQueryIO.Write<T> withKmsKey(java.lang.String kmsKey)
-
optimizedWrites
public BigQueryIO.Write<T> optimizedWrites()
If true, enables new codepaths that are expected to use less resources while writing to BigQuery. Not enabled by default in order to maintain backwards compatibility.
-
useBeamSchema
@Experimental(SCHEMAS) public BigQueryIO.Write<T> useBeamSchema()
If true, then the BigQuery schema will be inferred from the input schema. If no formatFunction is set, then BigQueryIO will automatically turn the input records into TableRows that match the schema.
-
withAutoSharding
@Experimental public BigQueryIO.Write<T> withAutoSharding()
If true, enables using a dynamically determined number of shards to write to BigQuery. This can be used for bothBigQueryIO.Write.Method.FILE_LOADS
andBigQueryIO.Write.Method.STREAMING_INSERTS
. Only applicable to unbounded data. If usingBigQueryIO.Write.Method.FILE_LOADS
, numFileShards set viawithNumFileShards(int)
will be ignored.
-
withSuccessfulInsertsPropagation
public BigQueryIO.Write<T> withSuccessfulInsertsPropagation(boolean propagateSuccessful)
If true, it enables the propagation of the successfully inserted TableRows on BigQuery as part of theWriteResult
object when usingBigQueryIO.Write.Method.STREAMING_INSERTS
. By default this property is set on true. In the cases where a pipeline won't make use of the insert results this property can be set on false, which will make the pipeline let go of those inserted TableRows and reclaim worker resources.
-
withAutoSchemaUpdate
public BigQueryIO.Write<T> withAutoSchemaUpdate(boolean autoSchemaUpdate)
If true, enables automatically detecting BigQuery table schema updates. Table schema updates are usually noticed within several minutes. Only supported when using one of the STORAGE_API insert methods.
-
withDeterministicRecordIdFn
@Experimental public BigQueryIO.Write<T> withDeterministicRecordIdFn(org.apache.beam.sdk.transforms.SerializableFunction<T,java.lang.String> toUniqueIdFunction)
-
withTestServices
public BigQueryIO.Write<T> withTestServices(BigQueryServices testServices)
-
withMaxFilesPerBundle
public BigQueryIO.Write<T> withMaxFilesPerBundle(int maxFilesPerBundle)
Control how many files will be written concurrently by a single worker when using BigQuery load jobs before spilling to a shuffle. When data comes into this transform, it is written to one file per destination per worker. When there are more files than maxFilesPerBundle (DEFAULT: 20), the data is shuffled (i.e. Grouped By Destination), and written to files one-by-one-per-worker. This flag sets the maximum number of files that a single worker can write concurrently before shuffling the data. This flag should be used with caution. Setting a high number can increase the memory pressure on workers, and setting a low number can make a pipeline slower (due to the need to shuffle data).
-
withMaxBytesPerPartition
public BigQueryIO.Write<T> withMaxBytesPerPartition(long maxBytesPerPartition)
Control how much data will be assigned to a single BigQuery load job. If the amount of data flowing into oneBatchLoads
partition exceeds this value, that partition will be handled via multiple load jobs.The default value (11 TiB) respects BigQuery's maximum size per load job limit and is appropriate for most use cases. Reducing the value of this parameter can improve stability when loading to tables with complex schemas containing thousands of fields.
- See Also:
- BigQuery Load Job Limits
-
withWriteTempDataset
public BigQueryIO.Write<T> withWriteTempDataset(java.lang.String writeTempDataset)
Temporary dataset. When writing to BigQuery from large file loads, theBigQueryIO.write()
will create temporary tables in a dataset to store staging data from partitions. With this option, you can set an existing dataset to create the temporary tables. BigQueryIO will create temporary tables in that dataset, and will remove it once it is not needed. No other tables in the dataset will be modified. Remember that the dataset must exist and your job needs permissions to create and remove tables inside that dataset.
-
validate
public void validate(org.apache.beam.sdk.options.PipelineOptions pipelineOptions)
- Overrides:
validate
in classorg.apache.beam.sdk.transforms.PTransform<org.apache.beam.sdk.values.PCollection<T>,WriteResult>
-
expand
public WriteResult expand(org.apache.beam.sdk.values.PCollection<T> input)
- Specified by:
expand
in classorg.apache.beam.sdk.transforms.PTransform<org.apache.beam.sdk.values.PCollection<T>,WriteResult>
-
populateDisplayData
public void populateDisplayData(org.apache.beam.sdk.transforms.display.DisplayData.Builder builder)
- Specified by:
populateDisplayData
in interfaceorg.apache.beam.sdk.transforms.display.HasDisplayData
- Overrides:
populateDisplayData
in classorg.apache.beam.sdk.transforms.PTransform<org.apache.beam.sdk.values.PCollection<T>,WriteResult>
-
getTable
public @Nullable org.apache.beam.sdk.options.ValueProvider<com.google.api.services.bigquery.model.TableReference> getTable()
Returns the table reference, ornull
.
-
-