DatastoreIO@Deprecated @Experimental(value=SOURCE_SINK) public class DatastoreIO extends Object
DatastoreIO provides an API to Read and Write PCollections of
Google Cloud Datastore
DatastoreV1.Entity objects.
Google Cloud Datastore is a fully managed NoSQL data storage service.
An Entity is an object in Datastore, analogous to a row in traditional
database table.
This API currently requires an authentication workaround. To use DatastoreIO, users
must use the gcloud command line tool to get credentials for Datastore:
$ gcloud auth login
To read a PCollection from a query to Datastore, use source() and
its methods DatastoreIO.Source.withDataset(java.lang.String) and DatastoreIO.Source.withQuery(com.google.api.services.datastore.DatastoreV1.Query) to
specify the dataset to query and the query to read from. You can optionally provide a namespace
to query within using DatastoreIO.Source.withNamespace(java.lang.String) or a Datastore host using
DatastoreIO.Source.withHost(java.lang.String).
For example:
// Read a query from Datastore
PipelineOptions options = PipelineOptionsFactory.fromArgs(args).create();
Query query = ...;
String dataset = "...";
Pipeline p = Pipeline.create(options);
PCollection<Entity> entities = p.apply(
Read.from(DatastoreIO.source()
.withDataset(datasetId)
.withQuery(query)
.withHost(host)));
or:
// Read a query from Datastore using the default namespace and host
PipelineOptions options = PipelineOptionsFactory.fromArgs(args).create();
Query query = ...;
String dataset = "...";
Pipeline p = Pipeline.create(options);
PCollection<Entity> entities = p.apply(DatastoreIO.readFrom(datasetId, query));
p.run();
Note: Normally, a Cloud Dataflow job will read from Cloud Datastore in parallel across
many workers. However, when the DatastoreV1.Query is configured with a limit using
DatastoreV1.Query.Builder.setLimit(int), then
all returned results will be read by a single Dataflow worker in order to ensure correct data.
To write a PCollection to a Datastore, use writeTo(java.lang.String),
specifying the datastore to write to:
PCollection<Entity> entities = ...;
entities.apply(DatastoreIO.writeTo(dataset));
p.run();
To optionally change the host that is used to write to the Datastore, use sink() to build a DatastoreIO.Sink and write to it using the Write
transform:
PCollection<Entity> entities = ...;
entities.apply(Write.to(DatastoreIO.sink().withDataset(dataset).withHost(host)));
Entities in the PCollection to be written must have complete
Keys. Complete Keys specify the name and id of the
Entity, where incomplete Keys do not. A namespace other than the
project default may be written to by specifying it in the Entity Keys.
Key.Builder keyBuilder = DatastoreHelper.makeKey(...);
keyBuilder.getPartitionIdBuilder().setNamespace(namespace);
Entities will be committed as upsert (update or insert) mutations. Please read
Entities, Properties, and
Keys for more information about Entity keys.
PipelineRunner that is used to execute the
Dataflow job. Please refer to the documentation of corresponding PipelineRunners for
more details.
Please see Cloud Datastore Sign Up for security and permission related information specific to Datastore.
PipelineRunner| Modifier and Type | Class and Description |
|---|---|
static class |
DatastoreIO.DatastoreReader
Deprecated.
A
Source.Reader over the records from a query of the datastore. |
static class |
DatastoreIO.Sink
Deprecated.
|
static class |
DatastoreIO.Source
Deprecated.
A
Source that reads the result rows of a Datastore query as Entity objects. |
| Modifier and Type | Field and Description |
|---|---|
static int |
DATASTORE_BATCH_UPDATE_LIMIT
Deprecated.
Datastore has a limit of 500 mutations per batch operation, so we flush
changes to Datastore every 500 entities.
|
static String |
DEFAULT_HOST
Deprecated.
|
| Constructor and Description |
|---|
DatastoreIO()
Deprecated.
|
| Modifier and Type | Method and Description |
|---|---|
static DatastoreIO.Source |
read()
Deprecated.
the name and return type do not match. Use
source(). |
static Read.Bounded<DatastoreV1.Entity> |
readFrom(String datasetId,
DatastoreV1.Query query)
Deprecated.
Returns a
PTransform that reads Datastore entities from the query
against the given dataset. |
static Read.Bounded<DatastoreV1.Entity> |
readFrom(String host,
String datasetId,
DatastoreV1.Query query)
|
static DatastoreIO.Sink |
sink()
Deprecated.
Returns a new
DatastoreIO.Sink builder using the default host. |
static DatastoreIO.Source |
source()
Deprecated.
Returns an empty
DatastoreIO.Source builder with the default host. |
static Write.Bound<DatastoreV1.Entity> |
writeTo(String datasetId)
Deprecated.
|
public static final String DEFAULT_HOST
public static final int DATASTORE_BATCH_UPDATE_LIMIT
@Deprecated public static DatastoreIO.Source read()
source().DatastoreIO.Source builder with the default host.
Configure the dataset, query, and namespace using
DatastoreIO.Source.withDataset(java.lang.String), DatastoreIO.Source.withQuery(com.google.api.services.datastore.DatastoreV1.Query),
and DatastoreIO.Source.withNamespace(java.lang.String).public static DatastoreIO.Source source()
DatastoreIO.Source builder with the default host.
Configure the dataset, query, and namespace using
DatastoreIO.Source.withDataset(java.lang.String), DatastoreIO.Source.withQuery(com.google.api.services.datastore.DatastoreV1.Query),
and DatastoreIO.Source.withNamespace(java.lang.String).
The resulting DatastoreIO.Source object can be passed to Read to create a
PTransform that will read from Datastore.
public static Read.Bounded<DatastoreV1.Entity> readFrom(String datasetId, DatastoreV1.Query query)
PTransform that reads Datastore entities from the query
against the given dataset.@Deprecated public static Read.Bounded<DatastoreV1.Entity> readFrom(String host, String datasetId, DatastoreV1.Query query)
source() with DatastoreIO.Source.withHost(java.lang.String), DatastoreIO.Source.withDataset(java.lang.String),
DatastoreIO.Source.withQuery(com.google.api.services.datastore.DatastoreV1.Query)s.PTransform that reads Datastore entities from the query
against the given dataset and host.public static DatastoreIO.Sink sink()
DatastoreIO.Sink builder using the default host.
You need to further configure it using DatastoreIO.Sink.withDataset(java.lang.String), and optionally
DatastoreIO.Sink.withHost(java.lang.String) before using it in a Write transform.
For example: p.apply(Write.to(DatastoreIO.sink().withDataset(dataset)));
public static Write.Bound<DatastoreV1.Entity> writeTo(String datasetId)