Package

org.tupol.spark

io

Permalink

package io

Common IO utilities

Linear Supertypes
AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. io
  2. AnyRef
  3. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Type Members

  1. case class BucketsConfiguration(number: Int, bucketColumns: Seq[String], sortByColumns: Seq[String] = Seq()) extends Product with Serializable

    Permalink

    Bucketing configuration

    Bucketing configuration

    number

    number of buckets

    bucketColumns

    columns for bucketing

    sortByColumns

    optional sort columns for bucketing

  2. trait DataAwareSink[Config <: DataSinkConfiguration, WriteOut] extends AnyRef

    Permalink

    Common trait for writing an already defined data DataFrame to an external resource

  3. trait DataAwareSinkFactory extends AnyRef

    Permalink

    Factory trait for DataAwareSinkFactory

  4. trait DataSink[Config <: DataSinkConfiguration, WriteOut] extends AnyRef

    Permalink

    Common trait for writing a DataFrame to an external resource

  5. trait DataSinkConfiguration extends AnyRef

    Permalink

    Common marker trait for DataSink configuration

  6. case class DataSinkException(message: String = "", cause: Throwable = None.orNull) extends Exception with Product with Serializable

    Permalink
  7. trait DataSource[Config <: DataSourceConfiguration] extends AnyRef

    Permalink

    Common trait for reading a DataFrame from an external resource

  8. trait DataSourceConfiguration extends AnyRef

    Permalink

    Common marker trait for DataSource configuration

  9. case class DataSourceException(message: String = "", cause: Throwable = None.orNull) extends Exception with Product with Serializable

    Permalink
  10. trait DataSourceFactory extends AnyRef

    Permalink

    Factory trait for DataSourceFactory

  11. case class FileDataAwareSink(configuration: FileSinkConfiguration, data: DataFrame) extends DataAwareSink[FileSinkConfiguration, DataFrame] with Product with Serializable

    Permalink

    FileDataSink trait that is data aware, so it can perform a write call with no arguments

  12. case class FileDataSink(configuration: FileSinkConfiguration) extends DataSink[FileSinkConfiguration, DataFrame] with Logging with Product with Serializable

    Permalink

    FileDataSink trait

  13. case class FileDataSource(configuration: FileSourceConfiguration) extends DataSource[FileSourceConfiguration] with Logging with Product with Serializable

    Permalink
  14. case class FileSinkConfiguration(path: String, format: FormatType, optionalSaveMode: Option[String] = None, partitionFilesNumber: Option[Int] = None, partitionColumns: Seq[String] = Seq(), buckets: Option[BucketsConfiguration] = None, options: Map[String, String] = Map()) extends FormatAwareDataSinkConfiguration with Product with Serializable

    Permalink

    Output DataFrame sink configuration for Hadoop files.

    Output DataFrame sink configuration for Hadoop files.

    path

    the path of the target file

    format

    the format can be csv, json, orc, parquet, com.databricks.spark.avro or just avro and com.databricks.spark.xml or just xml

    optionalSaveMode

    the save mode can be overwrite, append, ignore and error; more details available at https://spark.apache.org/docs/2.3.1/api/java/org/apache/spark/sql/FileDataSink.html#mode-java.lang.String-

    partitionFilesNumber

    the number of partitions that the data will be partitioned to; if not given the number of partitions will be left unchanged

    partitionColumns

    optionally the writer can layout data in partitions similar to the hive partitions

    buckets

    optionally the writer can bucket the data, similar to Hive bucketing

    options

    other sink specific options

  15. case class FileSourceConfiguration(path: String, sourceConfiguration: SourceConfiguration) extends FormatAwareDataSourceConfiguration with Product with Serializable

    Permalink

    Basic configuration for the FileDataSource

  16. trait FormatAware extends AnyRef

    Permalink

    For things that should be aware of their format type

  17. trait FormatAwareDataSinkConfiguration extends DataSinkConfiguration with FormatAware

    Permalink

    Common marker trait for DataSink configuration that also knows the data format

  18. trait FormatAwareDataSourceConfiguration extends DataSourceConfiguration with FormatAware

    Permalink

    Common marker trait for DataSource configuration that also knows the data format

  19. sealed trait FormatType extends AnyRef

    Permalink
  20. case class JdbcDataAwareSink(configuration: JdbcSinkConfiguration, data: DataFrame) extends DataAwareSink[JdbcSinkConfiguration, DataFrame] with Product with Serializable

    Permalink

    JdbcDataSink trait that is data aware, so it can perform a write call with no arguments

  21. case class JdbcDataSink(configuration: JdbcSinkConfiguration) extends DataSink[JdbcSinkConfiguration, DataFrame] with Logging with Product with Serializable

    Permalink

    JdbcDataSink trait

  22. case class JdbcDataSource(configuration: JdbcSourceConfiguration) extends DataSource[JdbcSourceConfiguration] with Logging with Product with Serializable

    Permalink
  23. case class JdbcSinkConfiguration(url: String, table: String, user: Option[String], password: Option[String], driver: Option[String], optionalSaveMode: Option[String], options: Map[String, String]) extends FormatAwareDataSinkConfiguration with Product with Serializable

    Permalink

    Basic configuration for the JdbcDataSource

Value Members

  1. object BucketsConfiguration extends Configurator[BucketsConfiguration] with Serializable

    Permalink
  2. implicit val DataAwareSinkFactory: DataAwareSinkFactory

    Permalink
  3. implicit val DataSinkConfigExtractor: DataSinkConfiguration.type

    Permalink
  4. object DataSinkConfiguration extends Configurator[DataSinkConfiguration]

    Permalink

    Factory for DataSourceConfiguration

  5. implicit val DataSourceFactory: DataSourceFactory

    Permalink
  6. implicit val ExtendedStructTypeExtractor: Extractor[StructType]

    Permalink

    Extended Configuration extractor for Schemas.

    Extended Configuration extractor for Schemas.

    This extractor will try first to get the schema from an external resources specified through a path. If that fails it will try to load the schema straight from the given configuration.

    It can be used as config.extract[Option[StructType]]("configuration_path_to_schema") or as config.extract[StructType]("configuration_path_to_schema")

  7. implicit val FileSinkConfigExtractor: FileSinkConfiguration.type

    Permalink
  8. object FileSinkConfiguration extends Configurator[FileSinkConfiguration] with Logging with Serializable

    Permalink
  9. implicit val FileSourceConfigExtractor: FileSourceConfiguration.type

    Permalink
  10. object FileSourceConfiguration extends Configurator[FileSourceConfiguration] with Serializable

    Permalink
  11. implicit val FileStreamDataSinkConfigurationExtractor: FileStreamDataSinkConfiguration.type

    Permalink
  12. implicit val FileStreamDataSourceConfigurationExtractor: FileStreamDataSourceConfiguration.type

    Permalink
  13. implicit val FormatAwareDataSinkConfigExtractor: FormatAwareDataSinkConfiguration.type

    Permalink
  14. object FormatAwareDataSinkConfiguration extends Configurator[FormatAwareDataSinkConfiguration]

    Permalink

    Factory for DataSourceConfiguration

  15. implicit val FormatAwareDataSourceConfigExtractor: FormatAwareDataSourceConfiguration.type

    Permalink
  16. object FormatAwareDataSourceConfiguration extends Configurator[FormatAwareDataSourceConfiguration]

    Permalink

    Factory for FormatAwareDataSourceConfiguration

  17. implicit val FormatAwareStreamingSinkConfigExtractor: FormatAwareStreamingSinkConfiguration.type

    Permalink
  18. implicit val FormatAwareStreamingSourceConfigExtractor: FormatAwareStreamingSourceConfiguration.type

    Permalink
  19. object FormatType

    Permalink
  20. implicit val FormatTypeExtractor: Extractor[FormatType]

    Permalink

    Configuration extractor for FormatType.

    Configuration extractor for FormatType.

    It can be used as config.extract[Option[FormatType]]("configuration_path_to_format") or as config.extract[FormatType]("configuration_path_to_format")

  21. implicit val GenericStreamDataSinkConfigurationExtractor: GenericStreamDataSinkConfiguration.type

    Permalink
  22. implicit val GenericStreamDataSourceConfigurationExtractor: GenericStreamDataSourceConfiguration.type

    Permalink
  23. implicit val JdbcSinkConfigExtractor: JdbcSinkConfiguration.type

    Permalink
  24. object JdbcSinkConfiguration extends Configurator[JdbcSinkConfiguration] with Serializable

    Permalink
  25. implicit val JdbcSourceConfigExtractor: JdbcSourceConfiguration.type

    Permalink
  26. implicit val KafkaStreamDataSinkConfigurationExtractor: KafkaStreamDataSinkConfiguration.type

    Permalink
  27. implicit val KafkaStreamDataSourceConfigurationExtractor: KafkaStreamDataSourceConfiguration.type

    Permalink
  28. implicit val SourceConfigExtractor: SourceConfiguration.type

    Permalink
  29. package sources

    Permalink
  30. package streaming

    Permalink

Inherited from AnyRef

Inherited from Any

Ungrouped