streaming

Type Members

class CustomReceiver extends Receiver[String] with Logging
class JavaCustomReceiver extends Receiver[String]
final class JavaDirectKafkaWordCount extends AnyRef
final class JavaFlumeEventCount extends AnyRef
final class JavaNetworkWordCount extends AnyRef
final class JavaQueueStream extends AnyRef
class JavaRecord extends Serializable
final class JavaRecoverableNetworkWordCount extends AnyRef
final class JavaSqlNetworkWordCount extends AnyRef
class JavaStatefulNetworkWordCount extends AnyRef
case class Record(word: String) extends Product with Serializable

Case class for converting RDD to DataFrame

Value Members

object CustomReceiver extends Serializable

Custom Receiver that receives data over a socket.
Custom Receiver that receives data over a socket. Received bytes are interpreted as text and \n delimited lines are considered as records. They are then counted and printed.
To run this on your local machine, you need to first run a Netcat server $ nc -lk 9999 and then run the example $ bin/run-example org.apache.spark.examples.streaming.CustomReceiver localhost 9999
object DirectKafkaWordCount

Consumes messages from one or more topics in Kafka and does wordcount.
Consumes messages from one or more topics in Kafka and does wordcount. Usage: DirectKafkaWordCount <brokers> <topics> <brokers> is a list of one or more Kafka brokers <topics> is a list of one or more kafka topics to consume from
Example: $ bin/run-example streaming.DirectKafkaWordCount broker1-host:port,broker2-host:port \ topic1,topic2
object DroppedWordsCounter

Use this singleton to get or register an Accumulator.
object FlumeEventCount

Produces a count of events received from Flume.
Produces a count of events received from Flume.
This should be used in conjunction with an AvroSink in Flume. It will start an Avro server on at the request host:port address and listen for requests. Your Flume AvroSink should be pointed to this address.
Usage: FlumeEventCount <host> <port> <host> is the host the Flume receiver will be started on - a receiver creates a server and listens for flume events. <port> is the port the Flume receiver will listen on.
To run this example: $ bin/run-example org.apache.spark.examples.streaming.FlumeEventCount <host> <port>
object FlumePollingEventCount

Produces a count of events received from Flume.
Produces a count of events received from Flume.
This should be used in conjunction with the Spark Sink running in a Flume agent. See the Spark Streaming programming guide for more details.
Usage: FlumePollingEventCount <host> <port> host is the host on which the Spark Sink is running. port is the port at which the Spark Sink is listening.
To run this example: $ bin/run-example org.apache.spark.examples.streaming.FlumePollingEventCount [host] [port]
object HdfsWordCount

Counts words in new text files created in the given directory Usage: HdfsWordCount <directory> <directory> is the directory that Spark Streaming will use to find and read new text files.
Counts words in new text files created in the given directory Usage: HdfsWordCount <directory> <directory> is the directory that Spark Streaming will use to find and read new text files.
To run this on your local machine on directory localdir, run this example $ bin/run-example \ org.apache.spark.examples.streaming.HdfsWordCount localdir
Then create a text file in localdir and the words in the file will get counted.
object NetworkWordCount

Counts words in UTF8 encoded, '\n' delimited text received from the network every second.
Counts words in UTF8 encoded, '\n' delimited text received from the network every second.
Usage: NetworkWordCount <hostname> <port> <hostname> and <port> describe the TCP server that Spark Streaming would connect to receive data.
To run this on your local machine, you need to first run a Netcat server $ nc -lk 9999 and then run the example $ bin/run-example org.apache.spark.examples.streaming.NetworkWordCount localhost 9999
object QueueStream
object RawNetworkGrep

Receives text from multiple rawNetworkStreams and counts how many '\n' delimited lines have the word 'the' in them.
Receives text from multiple rawNetworkStreams and counts how many '\n' delimited lines have the word 'the' in them. This is useful for benchmarking purposes. This will only work with spark.streaming.util.RawTextSender running on all worker nodes and with Spark using Kryo serialization (set Java property "spark.serializer" to "org.apache.spark.serializer.KryoSerializer"). Usage: RawNetworkGrep <numStreams> <host> <port> <batchMillis> <numStream> is the number rawNetworkStreams, which should be same as number of work nodes in the cluster <host> is "localhost". <port> is the port on which RawTextSender is running in the worker nodes. <batchMillise> is the Spark Streaming batch duration in milliseconds.
object RecoverableNetworkWordCount

Counts words in text encoded with UTF8 received from the network every second.
Counts words in text encoded with UTF8 received from the network every second. This example also shows how to use lazily instantiated singleton instances for Accumulator and Broadcast so that they can be registered on driver failures.
Usage: RecoverableNetworkWordCount <hostname> <port> <checkpoint-directory> <output-file> <hostname> and <port> describe the TCP server that Spark Streaming would connect to receive data. <checkpoint-directory> directory to HDFS-compatible file system which checkpoint data <output-file> file to which the word counts will be appended
<checkpoint-directory> and <output-file> must be absolute paths
To run this on your local machine, you need to first run a Netcat server
$ nc -lk 9999
and run the example as
$ ./bin/run-example org.apache.spark.examples.streaming.RecoverableNetworkWordCount \ localhost 9999 ~/checkpoint/ ~/out
If the directory ~/checkpoint/ does not exist (e.g. running for the first time), it will create a new StreamingContext (will print "Creating new context" to the console). Otherwise, if checkpoint data exists in ~/checkpoint/, then it will create StreamingContext from the checkpoint data.
Refer to the online documentation for more details.
object SparkSessionSingleton

Lazily instantiated singleton instance of SparkSession
object SqlNetworkWordCount

Use DataFrames and SQL to count words in UTF8 encoded, '\n' delimited text received from the network every second.
Use DataFrames and SQL to count words in UTF8 encoded, '\n' delimited text received from the network every second.
Usage: SqlNetworkWordCount <hostname> <port> <hostname> and <port> describe the TCP server that Spark Streaming would connect to receive data.
To run this on your local machine, you need to first run a Netcat server $ nc -lk 9999 and then run the example $ bin/run-example org.apache.spark.examples.streaming.SqlNetworkWordCount localhost 9999
object StatefulNetworkWordCount

Counts words cumulatively in UTF8 encoded, '\n' delimited text received from the network every second starting with initial value of word count.
Counts words cumulatively in UTF8 encoded, '\n' delimited text received from the network every second starting with initial value of word count. Usage: StatefulNetworkWordCount <hostname> <port> <hostname> and <port> describe the TCP server that Spark Streaming would connect to receive data.
To run this on your local machine, you need to first run a Netcat server $ nc -lk 9999 and then run the example $ bin/run-example org.apache.spark.examples.streaming.StatefulNetworkWordCount localhost 9999
object StreamingExamples extends Logging

Utility functions for Spark Streaming examples.
object WordBlacklist

Use this singleton to get or register a Broadcast variable.
package clickstream

package streaming

Type Members

class CustomReceiver extends Receiver[String] with Logging

class JavaCustomReceiver extends Receiver[String]

final class JavaDirectKafkaWordCount extends AnyRef

final class JavaFlumeEventCount extends AnyRef

final class JavaNetworkWordCount extends AnyRef

final class JavaQueueStream extends AnyRef

class JavaRecord extends Serializable

final class JavaRecoverableNetworkWordCount extends AnyRef

final class JavaSqlNetworkWordCount extends AnyRef

class JavaStatefulNetworkWordCount extends AnyRef

case class Record(word: String) extends Product with Serializable

Value Members

object CustomReceiver extends Serializable

object DirectKafkaWordCount

object DroppedWordsCounter

object FlumeEventCount

object FlumePollingEventCount

object HdfsWordCount

object NetworkWordCount

object QueueStream

object RawNetworkGrep

object RecoverableNetworkWordCount

object SparkSessionSingleton

object SqlNetworkWordCount

object StatefulNetworkWordCount

object StreamingExamples extends Logging

object WordBlacklist

package clickstream

Ungrouped