Class DatastoreV1.Read

  • All Implemented Interfaces:
    java.io.Serializable, org.apache.beam.sdk.transforms.display.HasDisplayData
    Enclosing class:
    DatastoreV1

    public abstract static class DatastoreV1.Read
    extends org.apache.beam.sdk.transforms.PTransform<org.apache.beam.sdk.values.PBegin,​org.apache.beam.sdk.values.PCollection<com.google.datastore.v1.Entity>>
    A PTransform that reads the result rows of a Cloud Datastore query as Entity objects.
    See Also:
    DatastoreIO, Serialized Form
    • Field Detail

      • NUM_QUERY_SPLITS_MAX

        public static final int NUM_QUERY_SPLITS_MAX
        An upper bound on the number of splits for a query.
        See Also:
        Constant Field Values
    • Constructor Detail

      • Read

        public Read()
    • Method Detail

      • getProjectId

        public abstract @Nullable org.apache.beam.sdk.options.ValueProvider<java.lang.String> getProjectId()
      • getQuery

        public abstract @Nullable com.google.datastore.v1.Query getQuery()
      • getLiteralGqlQuery

        public abstract @Nullable org.apache.beam.sdk.options.ValueProvider<java.lang.String> getLiteralGqlQuery()
      • getNamespace

        public abstract @Nullable org.apache.beam.sdk.options.ValueProvider<java.lang.String> getNamespace()
      • getNumQuerySplits

        public abstract int getNumQuerySplits()
      • getLocalhost

        public abstract @Nullable java.lang.String getLocalhost()
      • getReadTime

        public abstract @Nullable org.joda.time.Instant getReadTime()
      • toString

        public abstract java.lang.String toString()
        Overrides:
        toString in class org.apache.beam.sdk.transforms.PTransform<org.apache.beam.sdk.values.PBegin,​org.apache.beam.sdk.values.PCollection<com.google.datastore.v1.Entity>>
      • withProjectId

        public DatastoreV1.Read withProjectId​(java.lang.String projectId)
        Returns a new DatastoreV1.Read that reads from the Cloud Datastore for the specified project.
      • withProjectId

        public DatastoreV1.Read withProjectId​(org.apache.beam.sdk.options.ValueProvider<java.lang.String> projectId)
        Same as withProjectId(String) but with a ValueProvider.
      • withQuery

        public DatastoreV1.Read withQuery​(com.google.datastore.v1.Query query)
        Returns a new DatastoreV1.Read that reads the results of the specified query.

        Note: Normally, DatastoreIO will read from Cloud Datastore in parallel across many workers. However, when the Query is configured with a limit using Query.Builder.setLimit(com.google.protobuf.Int32Value), then all results will be read by a single worker in order to ensure correct results.

      • withLiteralGqlQuery

        public DatastoreV1.Read withLiteralGqlQuery​(java.lang.String gqlQuery)
        Returns a new DatastoreV1.Read that reads the results of the specified GQL query. See GQL Reference to know more about GQL grammar.

        Note: This query is executed with literals allowed, so the users should ensure that the query is originated from trusted sources to avoid any security vulnerabilities via SQL Injection.

        Cloud Datastore does not a provide a clean way to translate a gql query string to Query, so we end up making a query to the service for translation but this may read the actual data, although it will be a small amount. It needs more validation through production use cases before marking it as stable.

      • withNamespace

        public DatastoreV1.Read withNamespace​(org.apache.beam.sdk.options.ValueProvider<java.lang.String> namespace)
        Same as withNamespace(String) but with a ValueProvider.
      • withNumQuerySplits

        public DatastoreV1.Read withNumQuerySplits​(int numQuerySplits)
        Returns a new DatastoreV1.Read that reads by splitting the given query into numQuerySplits.

        The semantics for the query splitting is defined below:

        • Any value less than or equal to 0 will be ignored, and the number of splits will be chosen dynamically at runtime based on the query data size.
        • Any value greater than NUM_QUERY_SPLITS_MAX will be capped at NUM_QUERY_SPLITS_MAX.
        • If the query has a user limit set, then numQuerySplits will be ignored and no split will be performed.
        • Under certain cases Cloud Datastore is unable to split query to the requested number of splits. In such cases we just use whatever the Cloud Datastore returns.
      • withLocalhost

        public DatastoreV1.Read withLocalhost​(java.lang.String localhost)
        Returns a new DatastoreV1.Read that reads from a Datastore Emulator running at the given localhost address.
      • getNumEntities

        public long getNumEntities​(org.apache.beam.sdk.options.PipelineOptions options,
                                   java.lang.String ourKind,
                                   @Nullable java.lang.String namespace)
        Returns Number of entities available for reading.
      • expand

        public org.apache.beam.sdk.values.PCollection<com.google.datastore.v1.Entity> expand​(org.apache.beam.sdk.values.PBegin input)
        Specified by:
        expand in class org.apache.beam.sdk.transforms.PTransform<org.apache.beam.sdk.values.PBegin,​org.apache.beam.sdk.values.PCollection<com.google.datastore.v1.Entity>>
      • populateDisplayData

        public void populateDisplayData​(org.apache.beam.sdk.transforms.display.DisplayData.Builder builder)
        Specified by:
        populateDisplayData in interface org.apache.beam.sdk.transforms.display.HasDisplayData
        Overrides:
        populateDisplayData in class org.apache.beam.sdk.transforms.PTransform<org.apache.beam.sdk.values.PBegin,​org.apache.beam.sdk.values.PCollection<com.google.datastore.v1.Entity>>