ParquetStreams

Creates a akka.stream.scaladsl.Source that reads Parquet data from the specified path. If there are multiple files at path then the order in which files are loaded is determined by underlying filesystem.
Path can refer to local file, HDFS, AWS S3, Google Storage, Azure, etc. Please refer to Hadoop client documentation or your data provider in order to know how to configure the connection.
Can read also partitioned directories. Filter applies also to partition values. Partition values are set as fields in read entities at path defined by partition name. Path can be a simple column name or a dot-separated path to nested field. Missing intermediate fields are automatically created for each read record.
Allows to turn on a projection over original file schema in order to boost read performance if not all columns are required to be read.
Provides explicit API for both custom data types and generic records.

Attributes

Returns: Builder of the source.

Creates a akka.stream.scaladsl.Sink that writes Parquet data to single file at the specified path (including file name).
Path can refer to local file, HDFS, AWS S3, Google Storage, Azure, etc. Please refer to Hadoop client documentation or your data provider in order to know how to configure the connection.
Provides explicit API for both custom data types and generic records.

Attributes

Returns: Builder of a sink that writes Parquet file

Builds a flow that:

Is designed to write Parquet files indefinitely
Is able to (optionally) partition data by a list of provided fields
Flushes and rotates files after given number of rows is written to the partition or given time period elapses
Outputs incoming message after it is written but can write an effect of provided message transformation.
Provides explicit API for both custom data types and generic records.

Attributes

Returns: Builder of the flow.

ParquetStreams

Attributes

Members list

Value members

Concrete methods

Attributes

Attributes

Attributes