Class DetectNewPartitionsDoFn

  • All Implemented Interfaces:
    java.io.Serializable, org.apache.beam.sdk.transforms.display.HasDisplayData

    @UnboundedPerElement
    public class DetectNewPartitionsDoFn
    extends org.apache.beam.sdk.transforms.DoFn<PartitionMetadata,​PartitionMetadata>
    A SplittableDoFn (SDF) that is responsible for scheduling partitions to be queried. This component will periodically scan the partition metadata table looking for partitions in the PartitionMetadata.State.CREATED, update their state to PartitionMetadata.State.SCHEDULED and output them to the next stage in the pipeline.
    See Also:
    Serialized Form
    • Nested Class Summary

      • Nested classes/interfaces inherited from class org.apache.beam.sdk.transforms.DoFn

        org.apache.beam.sdk.transforms.DoFn.AlwaysFetched, org.apache.beam.sdk.transforms.DoFn.BoundedPerElement, org.apache.beam.sdk.transforms.DoFn.BundleFinalizer, org.apache.beam.sdk.transforms.DoFn.Element, org.apache.beam.sdk.transforms.DoFn.FieldAccess, org.apache.beam.sdk.transforms.DoFn.FinishBundle, org.apache.beam.sdk.transforms.DoFn.FinishBundleContext, org.apache.beam.sdk.transforms.DoFn.GetInitialRestriction, org.apache.beam.sdk.transforms.DoFn.GetInitialWatermarkEstimatorState, org.apache.beam.sdk.transforms.DoFn.GetRestrictionCoder, org.apache.beam.sdk.transforms.DoFn.GetSize, org.apache.beam.sdk.transforms.DoFn.GetWatermarkEstimatorStateCoder, org.apache.beam.sdk.transforms.DoFn.Key, org.apache.beam.sdk.transforms.DoFn.MultiOutputReceiver, org.apache.beam.sdk.transforms.DoFn.NewTracker, org.apache.beam.sdk.transforms.DoFn.NewWatermarkEstimator, org.apache.beam.sdk.transforms.DoFn.OnTimer, org.apache.beam.sdk.transforms.DoFn.OnTimerContext, org.apache.beam.sdk.transforms.DoFn.OnTimerFamily, org.apache.beam.sdk.transforms.DoFn.OnWindowExpiration, org.apache.beam.sdk.transforms.DoFn.OnWindowExpirationContext, org.apache.beam.sdk.transforms.DoFn.OutputReceiver<T extends java.lang.Object>, org.apache.beam.sdk.transforms.DoFn.ProcessContext, org.apache.beam.sdk.transforms.DoFn.ProcessContinuation, org.apache.beam.sdk.transforms.DoFn.ProcessElement, org.apache.beam.sdk.transforms.DoFn.RequiresStableInput, org.apache.beam.sdk.transforms.DoFn.RequiresTimeSortedInput, org.apache.beam.sdk.transforms.DoFn.Restriction, org.apache.beam.sdk.transforms.DoFn.Setup, org.apache.beam.sdk.transforms.DoFn.SideInput, org.apache.beam.sdk.transforms.DoFn.SplitRestriction, org.apache.beam.sdk.transforms.DoFn.StartBundle, org.apache.beam.sdk.transforms.DoFn.StartBundleContext, org.apache.beam.sdk.transforms.DoFn.StateId, org.apache.beam.sdk.transforms.DoFn.Teardown, org.apache.beam.sdk.transforms.DoFn.TimerFamily, org.apache.beam.sdk.transforms.DoFn.TimerId, org.apache.beam.sdk.transforms.DoFn.Timestamp, org.apache.beam.sdk.transforms.DoFn.TruncateRestriction, org.apache.beam.sdk.transforms.DoFn.UnboundedPerElement, org.apache.beam.sdk.transforms.DoFn.WatermarkEstimatorState, org.apache.beam.sdk.transforms.DoFn.WindowedContext
    • Method Detail

      • getInitialWatermarkEstimatorState

        @GetInitialWatermarkEstimatorState
        public org.joda.time.Instant getInitialWatermarkEstimatorState​(@Element
                                                                       PartitionMetadata partition)
      • newWatermarkEstimator

        @NewWatermarkEstimator
        public org.apache.beam.sdk.transforms.splittabledofn.ManualWatermarkEstimator<org.joda.time.Instant> newWatermarkEstimator​(@WatermarkEstimatorState
                                                                                                                                   org.joda.time.Instant watermarkEstimatorState)
      • initialRestriction

        @GetInitialRestriction
        public TimestampRange initialRestriction​(@Element
                                                 PartitionMetadata partition)
        Uses an TimestampRange with a max range. This is because it does not know beforehand how many partitions it will schedule.
        Returns:
        the timestamp range for the component
      • getSize

        @GetSize
        public double getSize​(@Restriction
                              TimestampRange restriction)
      • processElement

        @ProcessElement
        public org.apache.beam.sdk.transforms.DoFn.ProcessContinuation processElement​(org.apache.beam.sdk.transforms.splittabledofn.RestrictionTracker<TimestampRange,​com.google.cloud.Timestamp> tracker,
                                                                                      org.apache.beam.sdk.transforms.DoFn.OutputReceiver<PartitionMetadata> receiver,
                                                                                      org.apache.beam.sdk.transforms.splittabledofn.ManualWatermarkEstimator<org.joda.time.Instant> watermarkEstimator)
        Main processing function for the DetectNewPartitionsDoFn function. It will delegate to the DetectNewPartitionsAction class.
      • setAveragePartitionBytesSize

        public void setAveragePartitionBytesSize​(long averagePartitionBytesSize)
        Sets the average partition bytes size to estimate the backlog of this DoFn. Must be called after the initialization of this DoFn.
        Parameters:
        averagePartitionBytesSize - the estimated average size of a partition record used in the backlog bytes calculation (DoFn.GetSize)