Class SamplePartitioner

  • All Implemented Interfaces:
    Partitioner

    @Internal
    public final class SamplePartitioner
    extends java.lang.Object
    Sample Partitioner

    Samples the collection to generate partitions.

    Uses the average document size to split the collection into average sized chunks

    The partitioner samples the collection, projects and sorts by the partition fields. Then uses every samplesPerPartition as the value to use to calculate the partition boundaries.

    • "partition.field": The field to be used for partitioning. Must be a unique field. Defaults to: "_id".
    • "partition.size": The average size (MB) for each partition. Note: Uses the average document size to determine the number of documents per partition so may not be even. Defaults to: 64.
    • "samples.per.partition": The number of samples to take per partition. Defaults to: 10. The total number of samples taken is calculated as: samples per partition * (count / number of documents per partition).
    Partitions collections using a single field.
    • "partition.field": The field to be used for partitioning. Defaults to: "_id".
    • Field Detail

      • PARTITION_SIZE_MB_CONFIG

        public static final java.lang.String PARTITION_SIZE_MB_CONFIG
        See Also:
        Constant Field Values
      • PARTITION_FIELD_DEFAULT

        public static final java.lang.String PARTITION_FIELD_DEFAULT
        See Also:
        Constant Field Values
      • PARTITION_FIELD_CONFIG

        public static final java.lang.String PARTITION_FIELD_CONFIG
        See Also:
        Constant Field Values
    • Constructor Detail

      • SamplePartitioner

        public SamplePartitioner()
        Construct an instance
    • Method Detail

      • generatePartitions

        public java.util.List<MongoInputPartition> generatePartitions​(ReadConfig readConfig)
        Description copied from interface: Partitioner
        Generate the partitions for the collection based upon the read configuration
        Parameters:
        readConfig - the read configuration
        Returns:
        the partitions