Class ReadConfig

  • All Implemented Interfaces:
    MongoConfig, java.io.Serializable

    public final class ReadConfig
    extends java.lang.Object
    The Read Configuration

    The MongoConfig for reads.

    See Also:
    Serialized Form
    • Field Detail

      • PARTITIONER_DEFAULT

        public static final java.lang.String PARTITIONER_DEFAULT
        The default partitioner if none is set: "com.mongodb.spark.sql.connector.read.partitioner.SamplePartitioner"
        See Also:
        PARTITIONER_CONFIG, Constant Field Values
      • PARTITIONER_OPTIONS_PREFIX

        public static final java.lang.String PARTITIONER_OPTIONS_PREFIX
        The prefix for specific partitioner based configuration.

        Any configuration beginning with this prefix is available via getPartitionerOptions().

        Configuration: "partitioner.options."

        See Also:
        Constant Field Values
      • INFER_SCHEMA_SAMPLE_SIZE_CONFIG

        public static final java.lang.String INFER_SCHEMA_SAMPLE_SIZE_CONFIG
        The size of the sample of documents from the collection to use when inferring the schema

        Configuration: "sampleSize"

        Default: 1000

        See Also:
        Constant Field Values
      • INFER_SCHEMA_MAP_TYPE_ENABLED_CONFIG

        public static final java.lang.String INFER_SCHEMA_MAP_TYPE_ENABLED_CONFIG
        Enable Map Types when inferring the schema.

        If enabled large compatible struct types will be inferred to a MapType instead.

        Configuration: "sql.inferSchema.mapTypes.enabled"

        Default: true

        See Also:
        Constant Field Values
      • INFER_SCHEMA_MAP_TYPE_MINIMUM_KEY_SIZE_CONFIG

        public static final java.lang.String INFER_SCHEMA_MAP_TYPE_MINIMUM_KEY_SIZE_CONFIG
        The minimum size of a StructType before its inferred to a MapType instead.

        Configuration: "sql.inferSchema.mapTypes.minimum.key.size"

        Default: 250. Requires INFER_SCHEMA_MAP_TYPE_ENABLED_CONFIG

        See Also:
        Constant Field Values
      • AGGREGATION_PIPELINE_CONFIG

        public static final java.lang.String AGGREGATION_PIPELINE_CONFIG
        Provide a custom aggregation pipeline.

        Enables a custom aggregation pipeline to be applied to the collection before sending data to Spark.

        When configuring this should either be an extended json representation of a list of documents:

        
         [{"$match": {"closed": false}}, {"$project": {"status": 1, "name": 1, "description": 1}}]
         
        Or the extended json syntax of a single document:
        
         {"$match": {"closed": false}}
         

        Note: Custom aggregation pipelines must work with the partitioner strategy. Some aggregation stages such as "$group" are not suitable for any partitioner that produces more than one partition.

        Configuration: "aggregation.pipeline"

        Default: no aggregation pipeline.

        See Also:
        Constant Field Values
      • AGGREGATION_PIPELINE_DEFAULT

        public static final java.lang.String AGGREGATION_PIPELINE_DEFAULT
        See Also:
        Constant Field Values
      • AGGREGATION_ALLOW_DISK_USE_CONFIG

        public static final java.lang.String AGGREGATION_ALLOW_DISK_USE_CONFIG
        Allow disk use when running the aggregation.

        Configuration: "aggregation.allowDiskUse"

        Default: true and allows users to disable writing to disk.

        See Also:
        Constant Field Values
      • STREAM_PUBLISH_FULL_DOCUMENT_ONLY_CONFIG

        public static final java.lang.String STREAM_PUBLISH_FULL_DOCUMENT_ONLY_CONFIG
        Publish Full Document only when streaming.

        Note: Only publishes the actual changed document rather than the full change stream document. Overrides any configured `"change.stream.lookup.full.document"` values. Also filters the change stream events to include only events with a "fullDocument" field.

        Configuration: "change.stream.publish.full.document.only"

        Default: false.

        See Also:
        Constant Field Values
      • STREAM_LOOKUP_FULL_DOCUMENT_CONFIG

        public static final java.lang.String STREAM_LOOKUP_FULL_DOCUMENT_CONFIG
        Streaming full document configuration.

        Note: Determines what to return for update operations when using a Change Stream. See: Change streams lookup full document for update operations. for further information.

        Set to "updateLookup" to look up the most current majority-committed version of the updated document.

        Configuration: "change.stream.lookup.full.document"

        Default: "default" - the servers default value in the fullDocument field.

        See Also:
        Constant Field Values
    • Method Detail

      • withOption

        public ReadConfig withOption​(java.lang.String key,
                                     java.lang.String value)
        Description copied from interface: MongoConfig
        Return a MongoConfig instance with the extra options applied.

        Existing configurations may be overwritten by the new options.

        Parameters:
        key - the key to add
        value - the value to add
        Returns:
        a new MongoConfig
      • withOptions

        public ReadConfig withOptions​(java.util.Map<java.lang.String,​java.lang.String> options)
        Description copied from interface: MongoConfig
        Return a MongoConfig instance with the extra options applied.

        Existing configurations may be overwritten by the new options.

        Parameters:
        options - the context specific options.
        Returns:
        a new MongoConfig
      • getInferSchemaSampleSize

        public int getInferSchemaSampleSize()
        Returns:
        the configured infer sample size
      • inferSchemaMapType

        public boolean inferSchemaMapType()
        Returns:
        the configured infer sample size
      • getInferSchemaMapTypeMinimumKeySize

        public int getInferSchemaMapTypeMinimumKeySize()
        Returns:
        the configured infer sample size
      • getPartitioner

        public Partitioner getPartitioner()
        Returns:
        the partitioner instance
      • getPartitionerOptions

        public MongoConfig getPartitionerOptions()
        Returns:
        any partitioner configuration
      • getAggregationPipeline

        public java.util.List<org.bson.BsonDocument> getAggregationPipeline()
        Returns:
        the aggregation pipeline to filter the collection with
      • getAggregationAllowDiskUse

        public boolean getAggregationAllowDiskUse()
        Returns:
        the aggregation allow disk use value
      • streamPublishFullDocumentOnly

        public boolean streamPublishFullDocumentOnly()
        Returns:
        true if the stream should publish the full document only.
      • getStreamFullDocument

        public com.mongodb.client.model.changestream.FullDocument getStreamFullDocument()
        Returns:
        the stream full document configuration or null if not set.
      • getOriginals

        public java.util.Map<java.lang.String,​java.lang.String> getOriginals()
        Specified by:
        getOriginals in interface MongoConfig
        Returns:
        the original options for this MongoConfig instance
      • getOptions

        public java.util.Map<java.lang.String,​java.lang.String> getOptions()
        Specified by:
        getOptions in interface MongoConfig
        Returns:
        the options for this MongoConfig instance
      • getDatabaseName

        public java.lang.String getDatabaseName()
        Specified by:
        getDatabaseName in interface MongoConfig
        Returns:
        the database name to use for this configuration
      • getCollectionName

        public java.lang.String getCollectionName()
        Specified by:
        getCollectionName in interface MongoConfig
        Returns:
        the collection name to use for this configuration
      • getMongoClient

        public com.mongodb.client.MongoClient getMongoClient()
        Returns a MongoClient

        Once the MongoClient is no longer required, it MUST be closed by calling mongoClient.close().

        Returns:
        the MongoClient from the cache or create a new one using the MongoClientFactory.
      • withClient

        public <T> T withClient​(java.util.function.Function<com.mongodb.client.MongoClient,​T> function)
        Runs a function against a MongoClient
        Type Parameters:
        T - The return type
        Parameters:
        function - the function that is passed the MongoClient
        Returns:
        the result of the function
      • doWithClient

        public void doWithClient​(java.util.function.Consumer<com.mongodb.client.MongoClient> consumer)
        Loans a MongoClient to the user, does not return a result.
        Parameters:
        consumer - the consumer of the MongoClient
      • withCollection

        public <T> T withCollection​(java.util.function.Function<com.mongodb.client.MongoCollection<org.bson.BsonDocument>,​T> function)
        Runs a function against a MongoCollection
        Type Parameters:
        T - The return type
        Parameters:
        function - the function that is passed the MongoCollection
        Returns:
        the result of the function
      • doWithCollection

        public void doWithCollection​(java.util.function.Consumer<com.mongodb.client.MongoCollection<org.bson.BsonDocument>> consumer)
        Loans a MongoCollection to the user, does not return a result.
        Parameters:
        consumer - the consumer of the MongoCollection<BsonDocument>
      • toString

        public java.lang.String toString()
        Overrides:
        toString in class java.lang.Object
      • equals

        @TestOnly
        public boolean equals​(java.lang.Object o)
        Overrides:
        equals in class java.lang.Object
      • hashCode

        public int hashCode()
        Overrides:
        hashCode in class java.lang.Object