T
- the type of data emitted@PublicEvolving
public class FlinkKinesisConsumer<T>
extends org.apache.flink.streaming.api.functions.source.RichParallelSourceFunction<T>
implements org.apache.flink.api.java.typeutils.ResultTypeQueryable<T>, org.apache.flink.streaming.api.checkpoint.CheckpointedFunction
To leverage Flink's checkpointing mechanics for exactly-once streaming processing guarantees, the Flink Kinesis consumer is implemented with the AWS Java SDK, instead of the officially recommended AWS Kinesis Client Library, for low-level control on the management of stream state. The Flink Kinesis Connector also supports setting the initial starting points of Kinesis streams, namely TRIM_HORIZON and LATEST.
Kinesis and the Flink consumer support dynamic re-sharding and shard IDs, while sequential, cannot be assumed to be consecutive. There is no perfect generic default assignment function. Default shard to subtask assignment, which is based on hash code, may result in skew, with some subtasks having many shards assigned and others none.
It is recommended to monitor the shard distribution and adjust assignment appropriately.
A custom assigner implementation can be set via setShardAssigner(KinesisShardAssigner)
to optimize the
hash function or use static overrides to limit skew.
In order for the consumer to emit watermarks, a timestamp assigner needs to be set via setPeriodicWatermarkAssigner(AssignerWithPeriodicWatermarks)
and the auto watermark emit
interval configured via ExecutionConfig.setAutoWatermarkInterval(long)
.
Watermarks can only advance when all shards of a subtask continuously deliver records. To
avoid an inactive or closed shard to block the watermark progress, the idle timeout should be
configured via configuration property ConsumerConfigConstants.SHARD_IDLE_INTERVAL_MILLIS
. By default, shards won't be considered
idle and watermark calculation will wait for newer records to arrive from all shards.
Note that re-sharding of the Kinesis stream while an application (that relies on the Kinesis records for watermarking) is running can lead to incorrect late events. This depends on how shards are assigned to subtasks and applies regardless of whether watermarks are generated in the source or a downstream operator.
Constructor and Description |
---|
FlinkKinesisConsumer(List<String> streams,
KinesisDeserializationSchema<T> deserializer,
Properties configProps)
Creates a new Flink Kinesis Consumer.
|
FlinkKinesisConsumer(String stream,
org.apache.flink.api.common.serialization.DeserializationSchema<T> deserializer,
Properties configProps)
Creates a new Flink Kinesis Consumer.
|
FlinkKinesisConsumer(String stream,
KinesisDeserializationSchema<T> deserializer,
Properties configProps)
Creates a new Flink Kinesis Consumer.
|
Modifier and Type | Method and Description |
---|---|
void |
cancel() |
void |
close() |
protected KinesisDataFetcher<T> |
createFetcher(List<String> streams,
org.apache.flink.streaming.api.functions.source.SourceFunction.SourceContext<T> sourceContext,
org.apache.flink.api.common.functions.RuntimeContext runtimeContext,
Properties configProps,
KinesisDeserializationSchema<T> deserializationSchema)
This method is exposed for tests that need to mock the KinesisDataFetcher in the consumer.
|
org.apache.flink.streaming.api.functions.AssignerWithPeriodicWatermarks<T> |
getPeriodicWatermarkAssigner() |
org.apache.flink.api.common.typeinfo.TypeInformation<T> |
getProducedType() |
KinesisShardAssigner |
getShardAssigner() |
WatermarkTracker |
getWatermarkTracker() |
void |
initializeState(org.apache.flink.runtime.state.FunctionInitializationContext context) |
void |
run(org.apache.flink.streaming.api.functions.source.SourceFunction.SourceContext<T> sourceContext) |
void |
setPeriodicWatermarkAssigner(org.apache.flink.streaming.api.functions.AssignerWithPeriodicWatermarks<T> periodicWatermarkAssigner)
Set the assigner that will extract the timestamp from
T and calculate the
watermark. |
void |
setShardAssigner(KinesisShardAssigner shardAssigner)
Provide a custom assigner to influence how shards are distributed over subtasks.
|
void |
setWatermarkTracker(WatermarkTracker watermarkTracker)
Set the global watermark tracker.
|
void |
snapshotState(org.apache.flink.runtime.state.FunctionSnapshotContext context) |
public FlinkKinesisConsumer(String stream, org.apache.flink.api.common.serialization.DeserializationSchema<T> deserializer, Properties configProps)
The AWS credentials to be used, AWS region of the Kinesis streams, initial position to start streaming
from are configured with a Properties
instance.
stream
- The single AWS Kinesis stream to read from.deserializer
- The deserializer used to convert raw bytes of Kinesis records to Java objects (without key).configProps
- The properties used to configure AWS credentials, AWS region, and initial starting position.public FlinkKinesisConsumer(String stream, KinesisDeserializationSchema<T> deserializer, Properties configProps)
The AWS credentials to be used, AWS region of the Kinesis streams, initial position to start streaming
from are configured with a Properties
instance.
stream
- The single AWS Kinesis stream to read from.deserializer
- The keyed deserializer used to convert raw bytes of Kinesis records to Java objects.configProps
- The properties used to configure AWS credentials, AWS region, and initial starting position.public FlinkKinesisConsumer(List<String> streams, KinesisDeserializationSchema<T> deserializer, Properties configProps)
The AWS credentials to be used, AWS region of the Kinesis streams, initial position to start streaming
from are configured with a Properties
instance.
streams
- The AWS Kinesis streams to read from.deserializer
- The keyed deserializer used to convert raw bytes of Kinesis records to Java objects.configProps
- The properties used to configure AWS credentials, AWS region, and initial starting position.public KinesisShardAssigner getShardAssigner()
public void setShardAssigner(KinesisShardAssigner shardAssigner)
shardAssigner
- shard assignerpublic org.apache.flink.streaming.api.functions.AssignerWithPeriodicWatermarks<T> getPeriodicWatermarkAssigner()
public void setPeriodicWatermarkAssigner(org.apache.flink.streaming.api.functions.AssignerWithPeriodicWatermarks<T> periodicWatermarkAssigner)
T
and calculate the
watermark.periodicWatermarkAssigner
- periodic watermark assignerpublic WatermarkTracker getWatermarkTracker()
public void setWatermarkTracker(WatermarkTracker watermarkTracker)
watermarkTracker
- public void run(org.apache.flink.streaming.api.functions.source.SourceFunction.SourceContext<T> sourceContext) throws Exception
public void cancel()
cancel
in interface org.apache.flink.streaming.api.functions.source.SourceFunction<T>
public void close() throws Exception
close
in interface org.apache.flink.api.common.functions.RichFunction
close
in class org.apache.flink.api.common.functions.AbstractRichFunction
Exception
public org.apache.flink.api.common.typeinfo.TypeInformation<T> getProducedType()
getProducedType
in interface org.apache.flink.api.java.typeutils.ResultTypeQueryable<T>
public void initializeState(org.apache.flink.runtime.state.FunctionInitializationContext context) throws Exception
initializeState
in interface org.apache.flink.streaming.api.checkpoint.CheckpointedFunction
Exception
public void snapshotState(org.apache.flink.runtime.state.FunctionSnapshotContext context) throws Exception
snapshotState
in interface org.apache.flink.streaming.api.checkpoint.CheckpointedFunction
Exception
protected KinesisDataFetcher<T> createFetcher(List<String> streams, org.apache.flink.streaming.api.functions.source.SourceFunction.SourceContext<T> sourceContext, org.apache.flink.api.common.functions.RuntimeContext runtimeContext, Properties configProps, KinesisDeserializationSchema<T> deserializationSchema)
Copyright © 2014–2021 The Apache Software Foundation. All rights reserved.