Class DetectNewPartitionsAction

    • Method Summary

      All Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      org.apache.beam.sdk.transforms.DoFn.ProcessContinuation run​(org.apache.beam.sdk.transforms.splittabledofn.RestrictionTracker<TimestampRange,​com.google.cloud.Timestamp> tracker, org.apache.beam.sdk.transforms.DoFn.OutputReceiver<PartitionMetadata> receiver, org.apache.beam.sdk.transforms.splittabledofn.ManualWatermarkEstimator<org.joda.time.Instant> watermarkEstimator)
      Executes the main logic to schedule new partitions.
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Method Detail

      • run

        public org.apache.beam.sdk.transforms.DoFn.ProcessContinuation run​(org.apache.beam.sdk.transforms.splittabledofn.RestrictionTracker<TimestampRange,​com.google.cloud.Timestamp> tracker,
                                                                           org.apache.beam.sdk.transforms.DoFn.OutputReceiver<PartitionMetadata> receiver,
                                                                           org.apache.beam.sdk.transforms.splittabledofn.ManualWatermarkEstimator<org.joda.time.Instant> watermarkEstimator)
        Executes the main logic to schedule new partitions. It follows this procedure periodically:
        1. Fetches the min watermark from all the unfinished partitions in the metadata tables.
        2. If there are no unfinished partitions, this function will stop and not be re-scheduled.
        3. Updates the component's watermark to the min fetched.
        4. Fetches the read timestamp from the restriction.
        5. Fetches all the partitions with a createdAt timestamp > read timestamp.
        6. Groups the partitions by createdAt timestamp.
        7. Process the groups in ascending order of createdAt timestamp (oldest first)
        8. For each group, updates the state to PartitionMetadata.State.SCHEDULED.
        9. Tries to claim the createdAt timestamp of the group within the restriction.
        10. If it is possible to claim the timestamp, outputs each partition to the next stage. It then proceeds to process the next batch. When there are no more batches to process, schedules the function to resume after the configured resume duration.
        11. If it is not possible to claim the timestamp, stops.
        Parameters:
        tracker - an instance of DetectNewPartitionsRangeTracker
        receiver - a PartitionMetadata DoFn.OutputReceiver
        watermarkEstimator - a ManualWatermarkEstimator of Instant
        Returns:
        a DoFn.ProcessContinuation.stop() if there are no more partitions to process or DoFn.ProcessContinuation.resume() to re-schedule the function after the configured interval.