the schema as present in the metastore and used to match up with the raw data in dialects where the schema is not present. For example with a CSV format in Hive, the metastoreSchema is required in order to know what each column represents. We can't use the projection schema for this because the projection schema might be in a different order.
the schema actually required, optional in which case the metastoreSchema will be used. The reason the projectionSchema is pushed down to the dialects rather than being applied after is because some file schemas can read data more efficiently if they know they can omit some fields (eg Parquet).
is pushed down to the parquet reader for efficiency
a list of partition key-values for this file. We require this to repopulate the partition values when creating the final Row.
Returns the data contained in this part in the form of an Observable that a subscriber can subscribe to.
Returns the data contained in this part in the form of an Observable that a subscriber can subscribe to. This function should createReader a clean rows on each invocation. By clean, we mean that each seperate rows should provide the full set of data contained in the part, in a thread safe manner. Ie, it should be possible to invoke this method k times, and subscribe to those k observables concurrently, and each rows should emit the same data.
the schema as present in the metastore and used to match up with the raw data in dialects where the schema is not present.
the schema as present in the metastore and used to match up with the raw data in dialects where the schema is not present. For example with a CSV format in Hive, the metastoreSchema is required in order to know what each column represents. We can't use the projection schema for this because the projection schema might be in a different order.
a list of partition key-values for this file.
a list of partition key-values for this file. We require this to repopulate the partition values when creating the final Row.
is pushed down to the parquet reader for efficiency
the schema actually required, optional in which case the metastoreSchema will be used.
the schema actually required, optional in which case the metastoreSchema will be used. The reason the projectionSchema is pushed down to the dialects rather than being applied after is because some file schemas can read data more efficiently if they know they can omit some fields (eg Parquet).