Bucketing configuration
Bucketing configuration
number of buckets
columns for bucketing
optional sort columns for bucketing
Common trait for writing an already defined data DataFrame to an external resource
Factory trait for DataAwareSinkFactory
Common trait for writing a DataFrame to an external resource
Common marker trait for DataSink
configuration
Common trait for reading a DataFrame from an external resource
Common marker trait for DataSource
configuration
Factory trait for DataSourceFactory
FileDataSink trait that is data aware, so it can perform a write call with no arguments
FileDataSink trait
Output DataFrame sink configuration for Hadoop files.
Output DataFrame sink configuration for Hadoop files.
the path of the target file
the format can be csv
, json
, orc
, parquet
, com.databricks.spark.avro
or just avro
and
com.databricks.spark.xml
or just xml
the save mode can be overwrite
, append
, ignore
and error
;
more details available at https://spark.apache.org/docs/2.3.1/api/java/org/apache/spark/sql/FileDataSink.html#mode-java.lang.String-
the number of partitions that the data will be partitioned to; if not given the number of partitions will be left unchanged
optionally the writer can layout data in partitions similar to the hive partitions
optionally the writer can bucket the data, similar to Hive bucketing
other sink specific options
Basic configuration for the FileDataSource
For things that should be aware of their format type
Common marker trait for DataSink
configuration that also knows the data format
Common marker trait for DataSource
configuration that also knows the data format
JdbcDataSink trait that is data aware, so it can perform a write call with no arguments
JdbcDataSink trait
Basic configuration for the JdbcDataSource
Factory for DataSourceConfiguration
Extended Configuration extractor for Schemas.
Extended Configuration extractor for Schemas.
This extractor will try first to get the schema from an external resources specified through a path. If that fails it will try to load the schema straight from the given configuration.
It can be used as
config.extract[Option[StructType]]("configuration_path_to_schema")
or as
config.extract[StructType]("configuration_path_to_schema")
Factory for DataSourceConfiguration
Factory for FormatAwareDataSourceConfiguration
Configuration extractor for FormatType.
Configuration extractor for FormatType.
It can be used as
config.extract[Option[FormatType]]("configuration_path_to_format")
or as
config.extract[FormatType]("configuration_path_to_format")
Common IO utilities