za.co.absa.cobrix.spark.cobol.source.index
This class provides methods to rebalance partitions in case we have idle executors.
For instance, assume there were only 4 nodes (n1 ... n4) in the cluster when the files to be processed were stored.
Later on, two new nodes were added (n5 and n6).
When trying to achieve record-level locality the new nodes might be idle since they don't have any HDFS blocks related to the files being processed.
This class provides methods to send some work for those new nodes.
Although some shuffling will happen, the overall benefit of having more workers should outweigh it.
Distributes the partitions among all executors available.
This class provides methods to rebalance partitions in case we have idle executors.
For instance, assume there were only 4 nodes (n1 ... n4) in the cluster when the files to be processed were stored.
Later on, two new nodes were added (n5 and n6).
When trying to achieve record-level locality the new nodes might be idle since they don't have any HDFS blocks related to the files being processed.
This class provides methods to send some work for those new nodes.
Although some shuffling will happen, the overall benefit of having more workers should outweigh it.