Creates a persistent snapshot into the staging folder of the spark data flow and substitutes the dataset behind the label with the one opened from the stored version.
Creates a persistent snapshot into the staging folder of the spark data flow and substitutes the dataset behind the label with the one opened from the stored version.
It will not trigger for labels whose datasets are empty.
- list of labels to cache
Creates a persistent snapshot into the staging folder of the spark data flow and substitutes the dataset behind the label with the one opened from the stored version.
Creates a persistent snapshot into the staging folder of the spark data flow and substitutes the dataset behind the label with the one opened from the stored version.
It will not trigger for labels whose datasets are empty.
- list of labels to snapshot
Applies a transformation to the label's data set and replaces it.
Applies a transformation to the label's data set and replaces it.
Multiple intercept action can be chained. Like post -> post -> snapshot.
Takes a value of type A and a msg to log, returning a and logging the message at the desired level
Takes a value of type A and a msg to log, returning a and logging the message at the desired level
a
Takes a value of type A and a function message from A to String, logs the value of invoking message(a) at the level described by the level parameter
Takes a value of type A and a function message from A to String, logs the value of invoking message(a) at the level described by the level parameter
a
logAndReturn(1, (num: Int) => s"number: $num", Info) // In the log we would see a log corresponding to "number 1"
Cache multiple labels using using Spark's in-built caching mechanism
Cache multiple labels using using Spark's in-built caching mechanism
- list of labels to cache
Cache a single label using Spark's in-built caching mechanism
Cache a single label using Spark's in-built caching mechanism
the label to cache
optionally, the number of partitions to partition the dataset by before caching (will invoke a .repartition
call)
the StorageLevel
to use