Home » Spark

Tag Archives: Spark

Recommendation System Using Spark ML Akka and Cassandra

Building a recommendation system with Spark is a simple task. Spark’s machine learning library already does all the hard work for us. In this study I will show you how to build a scalable application for Big Data using the following technologies: Scala LanguageSpark with Machine LearningAkka with ActorsCassandra A recommendation system is an information filtering mechanism that attempts to ...

Read More »

The Kubernetes Spark operator in OpenShift Origin (Part 1)

This series is about the Kubernetes Spark operator by Radanalytics.io onOpenShift Origin. It is an Open Source operator to manageApache Spark clusters and applications.In order to deploy the operator on OpenShift Origin, the first time you need to clone the GitHub repository for it: git clone https://github.com/radanalyticsio/spark-operator.git Then login to the cluster using the OpenShift command-lineoc: oc login -u <username>:<password> ...

Read More »

Sparklens: a tool for Spark applications optimization

Sparklens is a profiling tool for Spark with a built-in Spark Scheduler simulator: it makes easier to understand the scalability limits of Spark applications. It helps in understanding how efficiently is a given Spark application using the compute resources provided to it. It has been implemented and is maintained at Qubole. It is Open Source ( Apache License 2.0) and ...

Read More »

Native microservices with SparkJava and Graal

Microservices written with SparkJava are just plain Java code using a standard Java library. No annotation magic, just code. The advantage of this simple style of programming is that it is, well, simple. It’s so simple that the Graal native compiler just compiles it without blinking, something which is currently very difficult with more complex frameworks like Spring, for example. ...

Read More »

Spark Run local design pattern

Many spark applications have now become legacy applications and it is very hard to enhance, test & run locally. Spark has very good testing support but still many spark applications are not testable. I will share one common error that appears when you try to run some old spark applications. Exception in thread "main" org.apache.spark.SparkException: A master URL must be ...

Read More »

Performance Tuning of an Apache Kafka/Spark Streaming System

Real-world case study in the telecom industry Debugging a real-life distributed application can be a pretty daunting task. Most common Google searches don’t turn out to be very useful, at least at first. In this blog post, I will give a fairly detailed account of how we managed to accelerate by almost 10x an Apache Kafka/Spark Streaming/Apache Ignite application and ...

Read More »

Building Apache Zeppelin for MapR using Spark under YARN

Apache Zeppelin is a web-based notebook that enables interactive data analytics. You can make beautiful data-driven, interactive and collaborative documents with Spark SQL, Scala, Hive, Flink, Kylin and more. Zeppelin enables rapid development of Spark and Hadoop workflows with simple, easy visualizations. The code from Zeppelin can be used in the Zeppelin notebooks or compiled and packaged into complete applications. ...

Read More »

Java Micro Frameworks: The New Trend You Can’t Ignore


What are Java micro frameworks and why should you use them? Every language has tradeoffs. With Java, the tradeoff for being a safe, rigorously tested, backwards compatible language is making some sacrifices around agility and streamlining. There’s undeniably some verbosity and bloating, however, the JVM is hugely appealing as a backend if you really want to dive into things or ...

Read More »