ParquetStreams

com.github.mjakubowski84.parquet4s.ParquetStreams$

Holds factory of Akka Streams sources and sinks that allow reading from and writing to Parquet files.

Attributes

Graph
Supertypes
class Object
trait Matchable
class Any
Self type

Members list

Value members

Concrete methods

Creates a akka.stream.scaladsl.Source that reads Parquet data from the specified path. If there are multiple files at path then the order in which files are loaded is determined by underlying filesystem.
Path can refer to local file, HDFS, AWS S3, Google Storage, Azure, etc. Please refer to Hadoop client documentation or your data provider in order to know how to configure the connection.
Can read also partitioned directories. Filter applies also to partition values. Partition values are set as fields in read entities at path defined by partition name. Path can be a simple column name or a dot-separated path to nested field. Missing intermediate fields are automatically created for each read record.
Allows to turn on a projection over original file schema in order to boost read performance if not all columns are required to be read.
Provides explicit API for both custom data types and generic records.

Creates a akka.stream.scaladsl.Source that reads Parquet data from the specified path. If there are multiple files at path then the order in which files are loaded is determined by underlying filesystem.
Path can refer to local file, HDFS, AWS S3, Google Storage, Azure, etc. Please refer to Hadoop client documentation or your data provider in order to know how to configure the connection.
Can read also partitioned directories. Filter applies also to partition values. Partition values are set as fields in read entities at path defined by partition name. Path can be a simple column name or a dot-separated path to nested field. Missing intermediate fields are automatically created for each read record.
Allows to turn on a projection over original file schema in order to boost read performance if not all columns are required to be read.
Provides explicit API for both custom data types and generic records.

Attributes

Returns

Builder of the source.

Creates a akka.stream.scaladsl.Sink that writes Parquet data to single file at the specified path (including file name).
Path can refer to local file, HDFS, AWS S3, Google Storage, Azure, etc. Please refer to Hadoop client documentation or your data provider in order to know how to configure the connection.
Provides explicit API for both custom data types and generic records.

Creates a akka.stream.scaladsl.Sink that writes Parquet data to single file at the specified path (including file name).
Path can refer to local file, HDFS, AWS S3, Google Storage, Azure, etc. Please refer to Hadoop client documentation or your data provider in order to know how to configure the connection.
Provides explicit API for both custom data types and generic records.

Attributes

Returns

Builder of a sink that writes Parquet file

Builds a flow that:

Builds a flow that:

  • Is designed to write Parquet files indefinitely
  • Is able to (optionally) partition data by a list of provided fields
  • Flushes and rotates files after given number of rows is written to the partition or given time period elapses
  • Outputs incoming message after it is written but can write an effect of provided message transformation.
    Provides explicit API for both custom data types and generic records.

Attributes

Returns

Builder of the flow.