TypedBuilder

com.github.mjakubowski84.parquet4s.ParquetPartitioningFlow$.TypedBuilder
trait TypedBuilder[T, W] extends Builder[T, W, TypedBuilder[T, W]]

Attributes

Graph
Supertypes
trait Builder[T, W, TypedBuilder[T, W]]
class Object
trait Matchable
class Any

Members list

Value members

Abstract methods

def preWriteTransformation[X](transformation: T => Iterable[X]): TypedBuilder[T, X]

Type parameters

X

Schema type

Value parameters

transformation

function that is called by flow in order to transform data to final write format. Identity by default.

Attributes

def write(basePath: Path)(implicit schemaResolver: ParquetSchemaResolver[W], encoder: ParquetRecordEncoder[W]): GraphStage[FlowShape[T, T]]

Builds a final flow

Builds a final flow

Attributes

Inherited methods

def maxCount(maxCount: Long): Self

Value parameters

maxCount

max number of records to be written before file rotation

Attributes

Inherited from:
Builder
def maxDuration(maxDuration: FiniteDuration): Self

Value parameters

maxDuration

max time after which partition file is rotated

Attributes

Inherited from:
Builder
def options(options: Options): Self

Value parameters

options

writer options used by the flow

Attributes

Inherited from:
Builder
def partitionBy(partitionBy: ColumnPath*): Self

Sets partition paths that flow partitions data by. Can be empty. Partition path can be a simple string column (e.g. "color") or a path pointing nested string field (e.g. "user.address.postcode"). Partition path is used to extract data from the entity and to create a tree of subdirectories for partitioned files. Using aforementioned partitions effects in creation of (example) following tree:

Sets partition paths that flow partitions data by. Can be empty. Partition path can be a simple string column (e.g. "color") or a path pointing nested string field (e.g. "user.address.postcode"). Partition path is used to extract data from the entity and to create a tree of subdirectories for partitioned files. Using aforementioned partitions effects in creation of (example) following tree:

../color=blue
     /user.address.postcode=XY1234/
     /user.address.postcode=AB4321/
 /color=green
     /user.address.postcode=XY1234/
     /user.address.postcode=CV3344/
     /user.address.postcode=GH6732/

Take note:

  • PartitionBy must point a string field.
  • Partitioning removes partition fields from the schema. Data is stored in name of subdirectory instead of Parquet file.
  • Partitioning cannot end in having empty schema. If you remove all fields of the message you will get an error.
  • Partitioned directories can be filtered effectively during reading.

Value parameters

partitionBy

ColumnPaths to partition by

Attributes

Inherited from:
Builder
def postWriteHandler(handler: PostWriteState[T] => Unit): Self

Adds a handler after record writes, exposing some of the internal state of the flow. Intended for lower level monitoring and control.

Adds a handler after record writes, exposing some of the internal state of the flow. Intended for lower level monitoring and control.

Please note that the handler is invoked after each input element is processed and not after each write. It is so because postWriteHandler may produce multiple records for a single input element.

Value parameters

handler

a function called after writing a record, receiving a snapshot of the internal state of the flow as a parameter.

Attributes

Inherited from:
Builder