public class VerifyDynamicWorkRebalancing extends Object
BoundedSource.BoundedReader.splitAtFraction(double)
(correctness in terms of data consistency can already be verified by methods in
SourceTestUtils).
This test works by blocking at a pre-selected set of sentinel values and making sure the work gets split up such that a thread eventually gets allocated to each of them.
| Modifier and Type | Method and Description |
|---|---|
static <T> void |
run(PTransform<PBegin,PCollection<T>> source,
Collection<T> sentinels,
long nonSentinelSleepMsec)
Reads a source and attempts to dynamically rebalance work to bundles each containing a single
one of the sentinel values using the dataflow runner.
|
static <T> void |
runWithPipeline(Pipeline p,
PTransform<PBegin,PCollection<T>> source,
Collection<T> sentinels,
long nonSentinelSleepMsec)
|
public static <T> void run(PTransform<PBegin,PCollection<T>> source, Collection<T> sentinels, long nonSentinelSleepMsec)
Sentinels should be chosen such that the source's inherent parallelization allows them to be separated. For example, in a simple record-based file format, they can be chosen arbitrarily (e.g. every record is a sentinel), however e.g. in a block-based file format where parallelization can only happen down to blocks, but not down to individual records, sentinels must be in different blocks. However, there should be not too many sentinels, because the test naturally requires at least as many threads (possibly via autoscaling) as there are sentinels to complete successfully.
source - a source PTransform producing the PCollection to be splitsentinels - a collection of elements that should be separable in the sourcenonSentinelSleepMsec - how long each non-sentinel element should take to processpublic static <T> void runWithPipeline(Pipeline p, PTransform<PBegin,PCollection<T>> source, Collection<T> sentinels, long nonSentinelSleepMsec)