Class DetectNewPartitionsDoFn
- java.lang.Object
-
- org.apache.beam.sdk.transforms.DoFn<PartitionMetadata,PartitionMetadata>
-
- org.apache.beam.sdk.io.gcp.spanner.changestreams.dofn.DetectNewPartitionsDoFn
-
- All Implemented Interfaces:
java.io.Serializable
,org.apache.beam.sdk.transforms.display.HasDisplayData
@UnboundedPerElement public class DetectNewPartitionsDoFn extends org.apache.beam.sdk.transforms.DoFn<PartitionMetadata,PartitionMetadata>
A SplittableDoFn (SDF) that is responsible for scheduling partitions to be queried. This component will periodically scan the partition metadata table looking for partitions in thePartitionMetadata.State.CREATED
, update their state toPartitionMetadata.State.SCHEDULED
and output them to the next stage in the pipeline.- See Also:
- Serialized Form
-
-
Nested Class Summary
-
Nested classes/interfaces inherited from class org.apache.beam.sdk.transforms.DoFn
org.apache.beam.sdk.transforms.DoFn.AlwaysFetched, org.apache.beam.sdk.transforms.DoFn.BoundedPerElement, org.apache.beam.sdk.transforms.DoFn.BundleFinalizer, org.apache.beam.sdk.transforms.DoFn.Element, org.apache.beam.sdk.transforms.DoFn.FieldAccess, org.apache.beam.sdk.transforms.DoFn.FinishBundle, org.apache.beam.sdk.transforms.DoFn.FinishBundleContext, org.apache.beam.sdk.transforms.DoFn.GetInitialRestriction, org.apache.beam.sdk.transforms.DoFn.GetInitialWatermarkEstimatorState, org.apache.beam.sdk.transforms.DoFn.GetRestrictionCoder, org.apache.beam.sdk.transforms.DoFn.GetSize, org.apache.beam.sdk.transforms.DoFn.GetWatermarkEstimatorStateCoder, org.apache.beam.sdk.transforms.DoFn.Key, org.apache.beam.sdk.transforms.DoFn.MultiOutputReceiver, org.apache.beam.sdk.transforms.DoFn.NewTracker, org.apache.beam.sdk.transforms.DoFn.NewWatermarkEstimator, org.apache.beam.sdk.transforms.DoFn.OnTimer, org.apache.beam.sdk.transforms.DoFn.OnTimerContext, org.apache.beam.sdk.transforms.DoFn.OnTimerFamily, org.apache.beam.sdk.transforms.DoFn.OnWindowExpiration, org.apache.beam.sdk.transforms.DoFn.OnWindowExpirationContext, org.apache.beam.sdk.transforms.DoFn.OutputReceiver<T extends java.lang.Object>, org.apache.beam.sdk.transforms.DoFn.ProcessContext, org.apache.beam.sdk.transforms.DoFn.ProcessContinuation, org.apache.beam.sdk.transforms.DoFn.ProcessElement, org.apache.beam.sdk.transforms.DoFn.RequiresStableInput, org.apache.beam.sdk.transforms.DoFn.RequiresTimeSortedInput, org.apache.beam.sdk.transforms.DoFn.Restriction, org.apache.beam.sdk.transforms.DoFn.Setup, org.apache.beam.sdk.transforms.DoFn.SideInput, org.apache.beam.sdk.transforms.DoFn.SplitRestriction, org.apache.beam.sdk.transforms.DoFn.StartBundle, org.apache.beam.sdk.transforms.DoFn.StartBundleContext, org.apache.beam.sdk.transforms.DoFn.StateId, org.apache.beam.sdk.transforms.DoFn.Teardown, org.apache.beam.sdk.transforms.DoFn.TimerFamily, org.apache.beam.sdk.transforms.DoFn.TimerId, org.apache.beam.sdk.transforms.DoFn.Timestamp, org.apache.beam.sdk.transforms.DoFn.TruncateRestriction, org.apache.beam.sdk.transforms.DoFn.UnboundedPerElement, org.apache.beam.sdk.transforms.DoFn.WatermarkEstimatorState, org.apache.beam.sdk.transforms.DoFn.WindowedContext
-
-
Constructor Summary
Constructors Constructor Description DetectNewPartitionsDoFn(DaoFactory daoFactory, MapperFactory mapperFactory, ActionFactory actionFactory, ChangeStreamMetrics metrics)
This class needs aDaoFactory
to build DAOs to access the partition metadata tables.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description org.joda.time.Instant
getInitialWatermarkEstimatorState(PartitionMetadata partition)
double
getSize(TimestampRange restriction)
TimestampRange
initialRestriction(PartitionMetadata partition)
Uses anTimestampRange
with a max range.DetectNewPartitionsRangeTracker
newTracker(TimestampRange restriction)
org.apache.beam.sdk.transforms.splittabledofn.ManualWatermarkEstimator<org.joda.time.Instant>
newWatermarkEstimator(org.joda.time.Instant watermarkEstimatorState)
org.apache.beam.sdk.transforms.DoFn.ProcessContinuation
processElement(org.apache.beam.sdk.transforms.splittabledofn.RestrictionTracker<TimestampRange,com.google.cloud.Timestamp> tracker, org.apache.beam.sdk.transforms.DoFn.OutputReceiver<PartitionMetadata> receiver, org.apache.beam.sdk.transforms.splittabledofn.ManualWatermarkEstimator<org.joda.time.Instant> watermarkEstimator)
Main processing function for theDetectNewPartitionsDoFn
function.void
setAveragePartitionBytesSize(long averagePartitionBytesSize)
Sets the average partition bytes size to estimate the backlog of this DoFn.void
setup()
Obtains the instance ofDetectNewPartitionsAction
.
-
-
-
Constructor Detail
-
DetectNewPartitionsDoFn
public DetectNewPartitionsDoFn(DaoFactory daoFactory, MapperFactory mapperFactory, ActionFactory actionFactory, ChangeStreamMetrics metrics)
This class needs aDaoFactory
to build DAOs to access the partition metadata tables. It uses mappers to transform database rows into thePartitionMetadata
model. It builds the delegating action class using theActionFactory
. It emits metrics for the partitions read using theChangeStreamMetrics
. It re-schedules the process element function to be executed according to the default resume interval as inDEFAULT_RESUME_DURATION
(best effort).- Parameters:
daoFactory
- theDaoFactory
to constructPartitionMetadataDao
smapperFactory
- theMapperFactory
to constructPartitionMetadataMapper
sactionFactory
- theActionFactory
to construct actionsmetrics
- theChangeStreamMetrics
to emit partition related metrics
-
-
Method Detail
-
getInitialWatermarkEstimatorState
@GetInitialWatermarkEstimatorState public org.joda.time.Instant getInitialWatermarkEstimatorState(@Element PartitionMetadata partition)
-
newWatermarkEstimator
@NewWatermarkEstimator public org.apache.beam.sdk.transforms.splittabledofn.ManualWatermarkEstimator<org.joda.time.Instant> newWatermarkEstimator(@WatermarkEstimatorState org.joda.time.Instant watermarkEstimatorState)
-
initialRestriction
@GetInitialRestriction public TimestampRange initialRestriction(@Element PartitionMetadata partition)
Uses anTimestampRange
with a max range. This is because it does not know beforehand how many partitions it will schedule.- Returns:
- the timestamp range for the component
-
getSize
@GetSize public double getSize(@Restriction TimestampRange restriction)
-
newTracker
@NewTracker public DetectNewPartitionsRangeTracker newTracker(@Restriction TimestampRange restriction)
-
setup
@Setup public void setup()
Obtains the instance ofDetectNewPartitionsAction
.
-
processElement
@ProcessElement public org.apache.beam.sdk.transforms.DoFn.ProcessContinuation processElement(org.apache.beam.sdk.transforms.splittabledofn.RestrictionTracker<TimestampRange,com.google.cloud.Timestamp> tracker, org.apache.beam.sdk.transforms.DoFn.OutputReceiver<PartitionMetadata> receiver, org.apache.beam.sdk.transforms.splittabledofn.ManualWatermarkEstimator<org.joda.time.Instant> watermarkEstimator)
Main processing function for theDetectNewPartitionsDoFn
function. It will delegate to theDetectNewPartitionsAction
class.
-
setAveragePartitionBytesSize
public void setAveragePartitionBytesSize(long averagePartitionBytesSize)
Sets the average partition bytes size to estimate the backlog of this DoFn. Must be called after the initialization of this DoFn.- Parameters:
averagePartitionBytesSize
- the estimated average size of a partition record used in the backlog bytes calculation (DoFn.GetSize
)
-
-