Used to write log files that represent batch commit points in structured streaming.
An abstract class for compactible metadata logs.
Class for collecting event time stats with an accumulator
Accumulator that collects stats on event time in a batch.
Used to mark a column as the containing the event time for a given record.
User specified options for file streams.
A sink that writes out results to parquet files.
A special log for FileStreamSink.
A very simple source that reads files from the given directory as they appear.
Offset for the FileStreamSource.
Physical operator for executing FlatMapGroupsWithState.
A Sink that forwards all data into ForeachWriter according to the contract defined by ForeachWriter.
A MetadataLog implementation based on HDFS.
A variant of QueryExecution that allows the execution of the given LogicalPlan plan incrementally.
A simple offset for sources that produce a single linear stream of data.
A FileCommitProtocol that tracks the list of valid files in a manifest file, used in structured streaming.
Used to query the data that has been written into a MemorySink.
A sink that stores the results in memory.
A Source that produces value stored in memory as they are added by the user.
A general MetadataLog that supports the following features:
A FileIndex that generates the list of files to processing by reading them from the metadata log files generated by the FileStreamSink.
Serves metrics from a org.apache.spark.sql.streaming.StreamingQuery to Codahale/DropWizard metrics
An offset is a monotonically increasing metric used to track progress in the computation of a stream.
An ordered collection of offsets, used to track the progress of processing data from one or more Sources that are present in a streaming query.
This class is used to log offsets to persistent files in HDFS.
Contains metadata associated with a OffsetSeq.
A trigger executor that runs a single batch only, then terminates.
Used to identify the state store for a given operator.
A trigger executor that runs a batch every intervalMs
milliseconds.
Responsible for continually reporting statistics about the amount of data processed as well as latency for a streaming query.
A source that generates increment long values with timestamps.
Used when loading a JSON serialized offset from external storage.
An interface for systems that can collect the results of a streaming query.
The status of a file outputted by FileStreamSink.
A source of continually arriving data for a streaming query.
States for StreamExecution's lifecycle.
An operator that reads from a StateStore.
For each input tuple, the key is calculated and the value from the StateStore is added to the stream (in addition to the input tuple) if present.
For each input tuple, the key is calculated and the tuple is put
into the StateStore.
An operator that writes to a StateStore.
An operator that reads or writes state from the StateStore.
Manages the execution of a streaming Spark SQL query that is occurring in a separate thread.
A special thread to run the stream query.
Contains metadata associated with a StreamingQuery.
A helper class that looks like a Map[Source, Offset].
Physical operator for executing streaming Deduplicate.
Used to link a streaming Source of data into a org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.
A bus to forward events to StreamingQueryListeners.
Wrap non-serializable StreamExecution to make the query serializable as it's easy to for it to get captured with normal usage.
Used to link a streaming DataSource into a org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.
A dummy physical plan for StreamingRelation to support org.apache.spark.sql.Dataset.explain
A source that reads text lines through a TCP socket, designed only for tutorials and debugging.
An operator that supports watermark.
A Trigger that process only one batch of data in a streaming query then terminates the query.