public abstract static class AvroIO.Read<T>
extends org.apache.beam.sdk.transforms.PTransform<org.apache.beam.sdk.values.PBegin,org.apache.beam.sdk.values.PCollection<T>>
AvroIO.read(java.lang.Class<T>)
and AvroIO.readGenericRecords(org.apache.avro.Schema)
.Constructor and Description |
---|
Read() |
Modifier and Type | Method and Description |
---|---|
org.apache.beam.sdk.values.PCollection<T> |
expand(org.apache.beam.sdk.values.PBegin input) |
AvroIO.Read<T> |
from(java.lang.String filepattern)
Like
from(ValueProvider) . |
AvroIO.Read<T> |
from(org.apache.beam.sdk.options.ValueProvider<java.lang.String> filepattern)
Reads from the given filename or filepattern.
|
void |
populateDisplayData(org.apache.beam.sdk.transforms.display.DisplayData.Builder builder) |
AvroIO.Read<T> |
watchForNewFiles(org.joda.time.Duration pollInterval,
org.apache.beam.sdk.transforms.Watch.Growth.TerminationCondition<java.lang.String,?> terminationCondition)
Same as
Read#watchForNewFiles(Duration, TerminationCondition, boolean) with matchUpdatedFiles=false . |
AvroIO.Read<T> |
watchForNewFiles(org.joda.time.Duration pollInterval,
org.apache.beam.sdk.transforms.Watch.Growth.TerminationCondition<java.lang.String,?> terminationCondition,
boolean matchUpdatedFiles)
Continuously watches for new files matching the filepattern, polling it at the given
interval, until the given termination condition is reached.
|
AvroIO.Read<T> |
withBeamSchemas(boolean withBeamSchemas)
If set to true, a Beam schema will be inferred from the AVRO schema.
|
AvroIO.Read<T> |
withCoder(org.apache.beam.sdk.coders.Coder<T> coder)
Sets a coder for the result of the read function.
|
AvroIO.Read<T> |
withDatumReaderFactory(AvroSource.DatumReaderFactory<T> readerFactory)
Sets a custom
AvroSource.DatumReaderFactory for reading. |
AvroIO.Read<T> |
withEmptyMatchTreatment(org.apache.beam.sdk.io.fs.EmptyMatchTreatment treatment)
Configures whether or not a filepattern matching no files is allowed.
|
AvroIO.Read<T> |
withHintMatchesManyFiles()
Hints that the filepattern specified in
from(String) matches a very large number of
files. |
AvroIO.Read<T> |
withMatchConfiguration(org.apache.beam.sdk.io.FileIO.MatchConfiguration matchConfiguration)
Sets the
FileIO.MatchConfiguration . |
addAnnotation, compose, compose, getAdditionalInputs, getAnnotations, getDefaultOutputCoder, getDefaultOutputCoder, getDefaultOutputCoder, getKindString, getName, getResourceHints, setDisplayData, setResourceHints, toString, validate, validate
public AvroIO.Read<T> from(org.apache.beam.sdk.options.ValueProvider<java.lang.String> filepattern)
If it is known that the filepattern will match a very large number of files (at least tens
of thousands), use withHintMatchesManyFiles()
for better performance and scalability.
public AvroIO.Read<T> from(java.lang.String filepattern)
from(ValueProvider)
.public AvroIO.Read<T> withMatchConfiguration(org.apache.beam.sdk.io.FileIO.MatchConfiguration matchConfiguration)
FileIO.MatchConfiguration
.public AvroIO.Read<T> withEmptyMatchTreatment(org.apache.beam.sdk.io.fs.EmptyMatchTreatment treatment)
public AvroIO.Read<T> watchForNewFiles(org.joda.time.Duration pollInterval, org.apache.beam.sdk.transforms.Watch.Growth.TerminationCondition<java.lang.String,?> terminationCondition, boolean matchUpdatedFiles)
PCollection
is unbounded. If matchUpdatedFiles
is set, also watches for files with timestamp
change.
This works only in runners supporting splittable DoFn
.
public AvroIO.Read<T> watchForNewFiles(org.joda.time.Duration pollInterval, org.apache.beam.sdk.transforms.Watch.Growth.TerminationCondition<java.lang.String,?> terminationCondition)
Read#watchForNewFiles(Duration, TerminationCondition, boolean)
with matchUpdatedFiles=false
.public AvroIO.Read<T> withHintMatchesManyFiles()
from(String)
matches a very large number of
files.
This hint may cause a runner to execute the transform differently, in a way that improves performance for this case, but it may worsen performance if the filepattern matches only a small number of files (e.g., in a runner that supports dynamic work rebalancing, it will happen less efficiently within individual files).
public AvroIO.Read<T> withBeamSchemas(boolean withBeamSchemas)
public AvroIO.Read<T> withCoder(org.apache.beam.sdk.coders.Coder<T> coder)
public AvroIO.Read<T> withDatumReaderFactory(AvroSource.DatumReaderFactory<T> readerFactory)
AvroSource.DatumReaderFactory
for reading. Pass a AvroDatumFactory
to also use the factory for the default output AvroCoder
public org.apache.beam.sdk.values.PCollection<T> expand(org.apache.beam.sdk.values.PBegin input)
expand
in class org.apache.beam.sdk.transforms.PTransform<org.apache.beam.sdk.values.PBegin,org.apache.beam.sdk.values.PCollection<T>>
public void populateDisplayData(org.apache.beam.sdk.transforms.display.DisplayData.Builder builder)
populateDisplayData
in interface org.apache.beam.sdk.transforms.display.HasDisplayData
populateDisplayData
in class org.apache.beam.sdk.transforms.PTransform<org.apache.beam.sdk.values.PBegin,org.apache.beam.sdk.values.PCollection<T>>