-
Interfaces Interface Description org.apache.beam.sdk.testing.StreamingIT tests which use unbounded PCollections should be in the categoryUsesUnboundedPCollections. Beyond that, it is up to the runner and test configuration to decide whether to run in streaming mode.
-
Classes Class Description org.apache.beam.sdk.coders.Coder.Context To implement a coder, do not use anyCoder.Context. Just implement only those abstract methods which do not accept aCoder.Contextand leave the default implementations for methods accepting aCoder.Context.org.apache.beam.sdk.io.TextIO.ReadAll SeeTextIO.readAll()for details.org.apache.beam.sdk.schemas.GetterBasedSchemaProvider new implementations should extend theGetterBasedSchemaProviderV2class' methods which receiveTypeDescriptors instead of ordinaryClasses as arguments, which permits to support generic type signatures during schema inferenceorg.apache.beam.sdk.transforms.ApproximateUnique Consider using
ApproximateCountDistinctin thezetasketchextension module, which makes use of theHllCountimplementation.If
ApproximateCountDistinctdoes not meet your needs then you can directly useHllCount. Direct usage will also give you access to save intermediate aggregation result into a sketch for later processing.For example, to estimate the number of distinct elements in a
PCollection<String>:
For more details about usingPCollection<String> input = ...; PCollection<Long> countDistinct = input.apply(HllCount.Init.forStrings().globally()).apply(HllCount.Extract.globally());HllCountand thezetasketchextension module, see https://s.apache.org/hll-in-beam#bookmark=id.v6chsij1ixo7.org.apache.beam.sdk.transforms.Combine.SimpleCombineFn org.apache.beam.sdk.transforms.DoFnTester UseTestPipelinewith theDirectRunner.org.apache.beam.sdk.transforms.Top.Largest useTop.Naturalinsteadorg.apache.beam.sdk.transforms.Top.Smallest useTop.Reversedinsteadorg.apache.beam.sdk.util.BitSetCoder useBitSetCoderinsteadorg.apache.beam.sdk.util.construction.CreatePCollectionViewTranslation this should generally be done as part ofParDotranslation, or moved into a dedicated runners-core-construction auxiliary classorg.apache.beam.sdk.util.construction.CreatePCollectionViewTranslation.Registrar runners should move away from translating `CreatePCollectionView` and treat this as part of the translation for a `ParDo` side input.org.apache.beam.sdk.values.PCollectionViews.IterableViewFn org.apache.beam.sdk.values.PCollectionViews.ListViewFn org.apache.beam.sdk.values.PCollectionViews.MapViewFn org.apache.beam.sdk.values.PCollectionViews.MultimapViewFn org.apache.beam.sdk.values.PCollectionViews.SingletonViewFn org.apache.beam.sdk.values.ShardedKey Use
ShardedKeyinstead.org.apache.beam.sdk.values.WindowedValues.ValueOnlyWindowedValueCoder Use ParamWindowedValueCoder instead, it is a general purpose implementation of the same concept but makes timestamp, windows and pane info configurable.
-
Enums Enum Description org.apache.beam.sdk.io.CompressedSource.CompressionMode UseCompressioninsteadorg.apache.beam.sdk.io.FileBasedSink.CompressionType useCompression.org.apache.beam.sdk.io.TextIO.CompressionType UseCompression.org.apache.beam.sdk.io.TFRecordIO.CompressionType UseCompression.org.apache.beam.sdk.transforms.DoFnTester.CloningBehavior UseTestPipelinewith theDirectRunner.