Class MongoTableProvider

  • All Implemented Interfaces:
    TableProvider, DataSourceRegister

    public final class MongoTableProvider
    extends java.lang.Object
    implements TableProvider, DataSourceRegister
    The MongoDB collection provider

    Note that: TableProvider can only apply data operations to existing tables, like read, append, delete, and overwrite. It does not support the operations that require metadata changes, like create/drop tables. // TODO support table creation and dropping on write.

    The major responsibility of this interface is to return a MongoTable for read/write.

    Also registers a shortname for use via the services api: spark.read().format("mongodb").load();

    • Constructor Detail

      • MongoTableProvider

        public MongoTableProvider()
        Construct a new instance
    • Method Detail

      • inferSchema

        public StructType inferSchema​(CaseInsensitiveStringMap options)
        Infer the schema of the table identified by the given options.
        Specified by:
        inferSchema in interface TableProvider
        Parameters:
        options - an immutable case-insensitive string-to-string map that can identify a table, e.g. file path, Kafka topic name, etc.
      • getTable

        public Table getTable​(StructType schema,
                              Transform[] partitioning,
                              java.util.Map<java.lang.String,​java.lang.String> properties)
        Return a Table instance with the specified table schema, partitioning and properties to do read/write. The returned table should report the same schema and partitioning with the specified ones, or Spark may fail the operation.
        Specified by:
        getTable in interface TableProvider
        Parameters:
        schema - The specified table schema.
        partitioning - The specified table partitioning.
        properties - The specified table properties. It's case preserving (contains exactly what users specified) and implementations are free to use it case sensitively or insensitively. It should be able to identify a table, e.g. file path, Kafka topic name, etc.
      • supportsExternalMetadata

        public boolean supportsExternalMetadata()
        Returns true if the source has the ability of accepting external table metadata when getting tables. The external table metadata includes: 1. For table reader: user-specified schema from `DataFrameReader`/`DataStreamReader` and schema/partitioning stored in Spark catalog. 2. For table writer: the schema of the input `Dataframe` of `DataframeWriter`/`DataStreamWriter`.
        Specified by:
        supportsExternalMetadata in interface TableProvider