How-To Apache Spark Streaming with Scala Part 1 – Supergloo

Todd McGrathMarch 10th, 2016Last Updated: March 10th, 2016

0 50 2 minutes read

Let’s start Apache Spark Streaming by building up our confidence with small steps. These small steps will create the forward momentum needed when learning new skills. The quickest way to gain confidence and momentum in learning new software development skills is executing code that performs without error.

In this post, we’re going to setup and run Apache Spark Streaming with Scala code. Then, we will be confident taking the next step to Part 2 of learning Apache Spark Streaming.

Before we begin though, I assume you already have a high-level understanding of Apache Spark Streaming at this point, but if not, here’s a quick two-minute read on Spark Streaming (opens in new window) from the Learning Apache Spark Summary book.

Overview

Spark comes with some great examples and convenient scripts for running Streaming code. Let’s make sure you can run these examples. In case it helps, I made a screencast of me running through these steps. Link to the screencast below.

Running the NetworkWordCount example out-of-the-box

Open a shell or command prompt on Windows and go to your Spark root directory.
Start Spark Master: sbin/start-master.sh **
Start a Worker: sbin/start-slave.sh spark://todd-mcgraths-macbook-pro.local:7077
Start netcat on port 9999: nc -lk 9999 (*** Windows users: https://nmap.org/ncat/ Let me know in page comments if this works well on Windows)
Run network word count using handy run-example script: bin/run-example streaming.NetworkWordCount localhost 9999

** Windows users, please adjust accordingly; i.e. sbin/start-master.cmd instead of sbin/start-master.sh

Here’s a screencast of me running these steps:

Making and Running Our Own NetworkWordCount

Ok, that’s good. We’ve succeeded in running the Scala Spark Streaming NetworkWordCount example, but what about running our own Spark Streaming program in Scala? Let’s take another step towards that goal. In this step, we’re going to setup our own Scala/SBT project, compile, package and deploy a modified NetworkWordCount. Again, I made a screencast of the following steps with a link to the screencast below.

Choose or create a new directory for a new Spark Streaming Scala project.
Make dirs to make things convenient for SBT: src/main/scala
Create Scala object code file called NetworkWordCount.scala in src/main/scala directory
Copy-and-paste NetworkWordCount.scala code from Spark examples directory to your version created in previous step
Remove or comment out package and StreamingExamples references
Change AppName to “MyNetworkWordCount”
Create a build.sbt file (source code below)
sbt compile to smoke test
Deploy: ~/Development/spark-1.5.1-bin-hadoop2.4/bin/spark-submit –class “NetworkWordCount” –master spark://todd-mcgraths-macbook-pro.local:7077 target/scala-2.11/streaming-example_2.11-1.0.jar localhost 9999
Start netcat on port 9999: nc -lk 9999 and start typing
Check things out in the Spark UI

build.sbt source

name := "streaming-example"
 
version := "1.0"
 
scalaVersion := "2.11.4"
 
libraryDependencies ++= Seq(
    "org.apache.spark" %% "spark-core" % "1.5.1",
    "org.apache.spark" %% "spark-streaming" % "1.5.1"
)

If you watched the video, notice this has been corrected to “streaming-example” and not “steaming-example”.

Spark Streaming With Scala Part 1 Conclusion

At this point, I hope my intention was realized for you. Were you successful in running both Spark Streaming examples in Scala? Will you be more confident when we continue to explore Spark Streaming in Part 2? I hope so. But, if you have any questions, feel free to add comments below.

Reference:

How-To Apache Spark Streaming with Scala Part 1 – Supergloo from our JCG partner Todd McGrath at the Supergloo blog.