Package

org.apache.spark.streaming.flume

sink

Permalink

package sink

Visibility
  1. Public
  2. All

Type Members

  1. class EventBatch extends SpecificRecordBase with SpecificRecord

    Permalink
  2. trait SparkFlumeProtocol extends AnyRef

    Permalink
  3. class SparkSink extends AbstractSink with Logging with Configurable

    Permalink

    A sink that uses Avro RPC to run a server that can be polled by Spark's FlumePollingInputDStream.

    A sink that uses Avro RPC to run a server that can be polled by Spark's FlumePollingInputDStream. This sink has the following configuration parameters:

    hostname - The hostname to bind to. Default: 0.0.0.0 port - The port to bind to. (No default - mandatory) timeout - Time in seconds after which a transaction is rolled back, if an ACK is not received from Spark within that time threads - Number of threads to use to receive requests from Spark (Default: 10)

    This sink is unlike other Flume sinks in the sense that it does not push data, instead the process method in this sink simply blocks the SinkRunner the first time it is called. This sink starts up an Avro IPC server that uses the SparkFlumeProtocol.

    Each time a getEventBatch call comes, creates a transaction and reads events from the channel. When enough events are read, the events are sent to the Spark receiver and the thread itself is blocked and a reference to it saved off.

    When the ack for that batch is received, the thread which created the transaction is is retrieved and it commits the transaction with the channel from the same thread it was originally created in (since Flume transactions are thread local). If a nack is received instead, the sink rolls back the transaction. If no ack is received within the specified timeout, the transaction is rolled back too. If an ack comes after that, it is simply ignored and the events get re-sent.

  4. class SparkSinkEvent extends SpecificRecordBase with SpecificRecord

    Permalink

Ungrouped