ingestion algorithm
ingestion algorithm
Dataset loading strategy (JSON / CSV / ...)
Dataset loading strategy (JSON / CSV / ...)
Spark Dataframe loaded using metadata options
Saves a dataset.
Saves a dataset. If the path is empty (the first time we call metrics on the schema) then we can write.
If there's already parquet files stored in it, then create a temporary directory to compute on, and flush the path to move updated metrics in it
: dataset to be saved
: Path to save the file at
Merged metadata
Partition a dataset using dataset columns.
Partition a dataset using dataset columns. To partition the dataset using the ingestion time, use the reserved column names :
: Input dataset
: list of columns to use for partitioning.
The Spark session used to run this job
Main entry point as required by the Spark Job interface
Main entry point as required by the Spark Job interface
: Spark Session used for the job
Merge new and existing dataset if required Save using overwrite / Append mode
Merge new and existing dataset if required Save using overwrite / Append mode