-
Interfaces Interface Description org.apache.beam.sdk.io.AvroIO.RecordFormatter Users can achieve the same by providing this transform in aParDo
before using write in AvroIOAvroIO.write(Class)
.org.apache.beam.sdk.testing.StreamingIT tests which use unbounded PCollections should be in the categoryUsesUnboundedPCollections
. Beyond that, it is up to the runner and test configuration to decide whether to run in streaming mode.
-
Classes Class Description org.apache.beam.sdk.coders.Coder.Context To implement a coder, do not use anyCoder.Context
. Just implement only those abstract methods which do not accept aCoder.Context
and leave the default implementations for methods accepting aCoder.Context
.org.apache.beam.sdk.io.AvroIO.ParseAll SeeAvroIO.parseAllGenericRecords(SerializableFunction)
for details.org.apache.beam.sdk.io.AvroIO.ReadAll SeeAvroIO.readAll(Class)
for details.org.apache.beam.sdk.io.TextIO.ReadAll SeeTextIO.readAll()
for details.org.apache.beam.sdk.transforms.ApproximateUnique Consider using
ApproximateCountDistinct
in thezetasketch
extension module, which makes use of theHllCount
implementation.If
ApproximateCountDistinct
does not meet your needs then you can directly useHllCount
. Direct usage will also give you access to save intermediate aggregation result into a sketch for later processing.For example, to estimate the number of distinct elements in a
PCollection<String>
:
For more details about usingPCollection<String> input = ...; PCollection<Long> countDistinct = input.apply(HllCount.Init.forStrings().globally()).apply(HllCount.Extract.globally());
HllCount
and thezetasketch
extension module, see https://s.apache.org/hll-in-beam#bookmark=id.v6chsij1ixo7.org.apache.beam.sdk.transforms.Combine.SimpleCombineFn org.apache.beam.sdk.transforms.DoFnTester UseTestPipeline
with theDirectRunner
.org.apache.beam.sdk.transforms.Reshuffle this transform's intended side effects are not portable; it will likely be removedorg.apache.beam.sdk.transforms.Top.Largest useTop.Natural
insteadorg.apache.beam.sdk.transforms.Top.Smallest useTop.Reversed
insteadorg.apache.beam.sdk.transforms.windowing.ReshuffleTrigger The intended side effect ofReshuffle
is not portable; it will likely be removedorg.apache.beam.sdk.util.BitSetCoder useBitSetCoder
insteadorg.apache.beam.sdk.util.WindowedValue.ValueOnlyWindowedValueCoder Use ParamWindowedValueCoder instead, it is a general purpose implementation of the same concept but makes timestamp, windows and pane info configurable.org.apache.beam.sdk.values.PCollectionViews.IterableViewFn org.apache.beam.sdk.values.PCollectionViews.ListViewFn org.apache.beam.sdk.values.PCollectionViews.MapViewFn org.apache.beam.sdk.values.PCollectionViews.MultimapViewFn org.apache.beam.sdk.values.PCollectionViews.SingletonViewFn
-
Enums Enum Description org.apache.beam.sdk.io.CompressedSource.CompressionMode UseCompression
insteadorg.apache.beam.sdk.io.FileBasedSink.CompressionType useCompression
.org.apache.beam.sdk.io.TextIO.CompressionType UseCompression
.org.apache.beam.sdk.io.TFRecordIO.CompressionType UseCompression
.org.apache.beam.sdk.transforms.DoFnTester.CloningBehavior UseTestPipeline
with theDirectRunner
.