Class

org.apache.spark.sql.execution.streaming

StreamExecution

Related Doc: package streaming

Permalink

class StreamExecution extends StreamingQuery with ProgressReporter with Logging

Manages the execution of a streaming Spark SQL query that is occurring in a separate thread. Unlike a standard query, a streaming query executes repeatedly each time new data arrives at any Source present in the query plan. Whenever new data arrives, a QueryExecution is created and the results are committed transactionally to the given Sink.

Linear Supertypes
ProgressReporter, Logging, StreamingQuery, AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. StreamExecution
  2. ProgressReporter
  3. Logging
  4. StreamingQuery
  5. AnyRef
  6. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Instance Constructors

  1. new StreamExecution(sparkSession: SparkSession, name: String, checkpointRoot: String, analyzedPlan: LogicalPlan, sink: Sink, trigger: Trigger, triggerClock: Clock, outputMode: OutputMode, deleteCheckpointOnStop: Boolean)

    Permalink

    deleteCheckpointOnStop

    whether to delete the checkpoint if the query is stopped without errors

Type Members

  1. case class ExecutionStats(inputRows: Map[Source, Long], stateOperators: Seq[StateOperatorProgress], eventTimeStats: Map[String, String]) extends Product with Serializable

    Permalink
    Definition Classes
    ProgressReporter

Value Members

  1. final def !=(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  4. final def asInstanceOf[T0]: T0

    Permalink
    Definition Classes
    Any
  5. var availableOffsets: StreamProgress

    Permalink

    Tracks the offsets that are available to be processed, but have not yet be committed to the sink.

    Tracks the offsets that are available to be processed, but have not yet be committed to the sink. Only the scheduler thread should modify this field, and only in atomic steps. Other threads should make a shallow copy if they are going to access this field more than once, since the field's value may change at any time.

    Definition Classes
    StreamExecutionProgressReporter
  6. def awaitInitialization(timeoutMs: Long): Unit

    Permalink

    Await until all fields of the query have been initialized.

  7. def awaitTermination(timeoutMs: Long): Boolean

    Permalink

    Waits for the termination of this query, either by query.stop() or by an exception.

    Waits for the termination of this query, either by query.stop() or by an exception. If the query has terminated with an exception, then the exception will be thrown. Otherwise, it returns whether the query has terminated or not within the timeoutMs milliseconds.

    If the query has terminated, then all subsequent calls to this method will either return true immediately (if the query was terminated by stop()), or throw the exception immediately (if the query has terminated with exception).

    Definition Classes
    StreamExecutionStreamingQuery
    Since

    2.0.0

    Exceptions thrown

    StreamingQueryException if the query has terminated with an exception

  8. def awaitTermination(): Unit

    Permalink

    Waits for the termination of this query, either by query.stop() or by an exception.

    Waits for the termination of this query, either by query.stop() or by an exception. If the query has terminated with an exception, then the exception will be thrown.

    If the query has terminated, then all subsequent calls to this method will either return immediately (if the query was terminated by stop()), or throw the exception immediately (if the query has terminated with exception).

    Definition Classes
    StreamExecutionStreamingQuery
    Since

    2.0.0

    Exceptions thrown

    StreamingQueryException if the query has terminated with an exception.

  9. val batchCommitLog: BatchCommitLog

    Permalink

    A log that records the batch ids that have completed.

    A log that records the batch ids that have completed. This is used to check if a batch was fully processed, and its output was committed to the sink, hence no need to process it again. This is used (for instance) during restart, to help identify which batch to run next.

  10. val checkpointRoot: String

    Permalink
  11. def clone(): AnyRef

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  12. var committedOffsets: StreamProgress

    Permalink

    Tracks how much data we have processed and committed to the sink or state store from each input source.

    Tracks how much data we have processed and committed to the sink or state store from each input source. Only the scheduler thread should modify this field, and only in atomic steps. Other threads should make a shallow copy if they are going to access this field more than once, since the field's value may change at any time.

    Definition Classes
    StreamExecutionProgressReporter
  13. var currentBatchId: Long

    Permalink

    The current batchId or -1 if execution has not yet been initialized.

    The current batchId or -1 if execution has not yet been initialized.

    Attributes
    protected
    Definition Classes
    StreamExecutionProgressReporter
  14. var currentStatus: StreamingQueryStatus

    Permalink
    Attributes
    protected
    Definition Classes
    ProgressReporter
  15. final def eq(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  16. def equals(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  17. def exception: Option[StreamingQueryException]

    Permalink

    Returns the StreamingQueryException if the query was terminated by an exception.

    Returns the StreamingQueryException if the query was terminated by an exception.

    Definition Classes
    StreamExecutionStreamingQuery
  18. def explain(): Unit

    Permalink

    Prints the physical plan to the console for debugging purposes.

    Prints the physical plan to the console for debugging purposes.

    Definition Classes
    StreamExecutionStreamingQuery
    Since

    2.0.0

  19. def explain(extended: Boolean): Unit

    Permalink

    Prints the physical plan to the console for debugging purposes.

    Prints the physical plan to the console for debugging purposes.

    extended

    whether to do extended explain or not

    Definition Classes
    StreamExecutionStreamingQuery
    Since

    2.0.0

  20. def explainInternal(extended: Boolean): String

    Permalink

    Expose for tests

  21. def finalize(): Unit

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  22. def finishTrigger(hasNewData: Boolean): Unit

    Permalink

    Finalizes the query progress and adds it to list of recent status updates.

    Finalizes the query progress and adds it to list of recent status updates.

    Attributes
    protected
    Definition Classes
    ProgressReporter
  23. final def getClass(): Class[_]

    Permalink
    Definition Classes
    AnyRef → Any
  24. def hashCode(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  25. val id: UUID

    Permalink

    Returns the unique id of this query that persists across restarts from checkpoint data.

    Returns the unique id of this query that persists across restarts from checkpoint data. That is, this id is generated when a query is started for the first time, and will be the same every time it is restarted from checkpoint data. Also see runId.

    Definition Classes
    StreamExecutionProgressReporterStreamingQuery
    Since

    2.1.0

  26. def initializeLogIfNecessary(isInterpreter: Boolean): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  27. def isActive: Boolean

    Permalink

    Whether the query is currently active or not

    Whether the query is currently active or not

    Definition Classes
    StreamExecutionStreamingQuery
  28. final def isInstanceOf[T0]: Boolean

    Permalink
    Definition Classes
    Any
  29. def isTraceEnabled(): Boolean

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  30. var lastExecution: IncrementalExecution

    Permalink
    Definition Classes
    StreamExecutionProgressReporter
  31. def lastProgress: StreamingQueryProgress

    Permalink

    Returns the most recent query progress update or null if there were no progress updates.

    Returns the most recent query progress update or null if there were no progress updates.

    Definition Classes
    ProgressReporter
  32. def log: Logger

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  33. def logDebug(msg: ⇒ String, throwable: Throwable): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  34. def logDebug(msg: ⇒ String): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  35. def logError(msg: ⇒ String, throwable: Throwable): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  36. def logError(msg: ⇒ String): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  37. def logInfo(msg: ⇒ String, throwable: Throwable): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  38. def logInfo(msg: ⇒ String): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  39. def logName: String

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  40. def logTrace(msg: ⇒ String, throwable: Throwable): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  41. def logTrace(msg: ⇒ String): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  42. def logWarning(msg: ⇒ String, throwable: Throwable): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  43. def logWarning(msg: ⇒ String): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  44. lazy val logicalPlan: LogicalPlan

    Permalink
    Definition Classes
    StreamExecutionProgressReporter
  45. val microBatchThread: StreamExecutionThread

    Permalink

    The thread that runs the micro-batches of this stream.

    The thread that runs the micro-batches of this stream. Note that this thread must be org.apache.spark.util.UninterruptibleThread to workaround KAFKA-1894: interrupting a running KafkaConsumer may cause endless loop.

  46. val name: String

    Permalink

    Returns the user-specified name of the query, or null if not specified.

    Returns the user-specified name of the query, or null if not specified. This name can be specified in the org.apache.spark.sql.streaming.DataStreamWriter as dataframe.writeStream.queryName("query").start(). This name, if set, must be unique across all active queries.

    Definition Classes
    StreamExecutionProgressReporterStreamingQuery
    Since

    2.0.0

  47. final def ne(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  48. var newData: Map[Source, DataFrame]

    Permalink

    Holds the most recent input data for each source.

    Holds the most recent input data for each source.

    Attributes
    protected
    Definition Classes
    StreamExecutionProgressReporter
  49. final def notify(): Unit

    Permalink
    Definition Classes
    AnyRef
  50. final def notifyAll(): Unit

    Permalink
    Definition Classes
    AnyRef
  51. val offsetLog: OffsetSeqLog

    Permalink

    A write-ahead-log that records the offsets that are present in each batch.

    A write-ahead-log that records the offsets that are present in each batch. In order to ensure that a given batch will always consist of the same data, we write to this log *before* any processing is done. Thus, the Nth record in this log indicated data that is currently being processed and the N-1th entry indicates which offsets have been durably committed to the sink.

  52. var offsetSeqMetadata: OffsetSeqMetadata

    Permalink

    Metadata associated with the offset seq of a batch in the query.

    Metadata associated with the offset seq of a batch in the query.

    Attributes
    protected
    Definition Classes
    StreamExecutionProgressReporter
  53. val outputMode: OutputMode

    Permalink
  54. def postEvent(event: Event): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    StreamExecutionProgressReporter
  55. def processAllAvailable(): Unit

    Permalink

    Blocks until all available data in the source has been processed and committed to the sink.

    Blocks until all available data in the source has been processed and committed to the sink. This method is intended for testing. Note that in the case of continually arriving data, this method may block forever. Additionally, this method is only guaranteed to block until data that has been synchronously appended data to a org.apache.spark.sql.execution.streaming.Source prior to invocation. (i.e. getOffset must immediately reflect the addition).

    Definition Classes
    StreamExecutionStreamingQuery
    Since

    2.0.0

  56. def recentProgress: Array[StreamingQueryProgress]

    Permalink

    Returns an array containing the most recent query progress updates.

    Returns an array containing the most recent query progress updates.

    Definition Classes
    ProgressReporter
  57. def reportTimeTaken[T](triggerDetailKey: String)(body: ⇒ T): T

    Permalink

    Records the duration of running body for the next query progress update.

    Records the duration of running body for the next query progress update.

    Attributes
    protected
    Definition Classes
    ProgressReporter
  58. val runId: UUID

    Permalink

    Returns the unique id of this run of the query.

    Returns the unique id of this run of the query. That is, every start/restart of a query will generated a unique runId. Therefore, every time a query is restarted from checkpoint, it will have the same id but different runIds.

    Definition Classes
    StreamExecutionProgressReporterStreamingQuery
  59. val sink: Sink

    Permalink
    Definition Classes
    StreamExecutionProgressReporter
  60. var sources: Seq[Source]

    Permalink

    All stream sources present in the query plan.

    All stream sources present in the query plan. This will be set when generating logical plan.

    Attributes
    protected
    Definition Classes
    StreamExecutionProgressReporter
  61. val sparkSession: SparkSession

    Permalink

    Returns the SparkSession associated with this.

    Returns the SparkSession associated with this.

    Definition Classes
    StreamExecutionProgressReporterStreamingQuery
    Since

    2.0.0

  62. def start(): Unit

    Permalink

    Starts the execution.

    Starts the execution. This returns only after the thread has started and QueryStartedEvent has been posted to all the listeners.

  63. def startTrigger(): Unit

    Permalink

    Begins recording statistics about query progress for a given trigger.

    Begins recording statistics about query progress for a given trigger.

    Attributes
    protected
    Definition Classes
    ProgressReporter
  64. def status: StreamingQueryStatus

    Permalink

    Returns the current status of the query.

    Returns the current status of the query.

    Definition Classes
    ProgressReporter
  65. def stop(): Unit

    Permalink

    Signals to the thread executing micro-batches that it should stop running after the next batch.

    Signals to the thread executing micro-batches that it should stop running after the next batch. This method blocks until the thread stops running.

    Definition Classes
    StreamExecutionStreamingQuery
  66. val streamMetadata: StreamMetadata

    Permalink

    Metadata associated with the whole query

    Metadata associated with the whole query

    Attributes
    protected
  67. lazy val streamMetrics: MetricsReporter

    Permalink

    Used to report metrics to coda-hale.

    Used to report metrics to coda-hale. This uses id for easier tracking across restarts.

  68. final def synchronized[T0](arg0: ⇒ T0): T0

    Permalink
    Definition Classes
    AnyRef
  69. def toString(): String

    Permalink
    Definition Classes
    StreamExecution → AnyRef → Any
  70. val trigger: Trigger

    Permalink
  71. val triggerClock: Clock

    Permalink
    Definition Classes
    StreamExecutionProgressReporter
  72. def updateStatusMessage(message: String): Unit

    Permalink

    Updates the message returned in status.

    Updates the message returned in status.

    Attributes
    protected
    Definition Classes
    ProgressReporter
  73. final def wait(): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  74. final def wait(arg0: Long, arg1: Int): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  75. final def wait(arg0: Long): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )

Inherited from ProgressReporter

Inherited from Logging

Inherited from StreamingQuery

Inherited from AnyRef

Inherited from Any

Ungrouped