Home » You searched for spark

Search Results for: spark

Where is Apache Spark heading?

I watched (COVID19-era version of “attended”) the latest spark Summit and in one of the keynotes Reynold Xin from Databricks, presented the following two images comparing spark usage on their platform on 2013 vs. 2020: While Databricks’ platform is, of course, not the whole spark community, I would wager that they have enough users to represent the overall trend. Incidentally, ...

Read More »

Recommendation System Using Spark ML Akka and Cassandra

Building a recommendation system with Spark is a simple task. Spark’s machine learning library already does all the hard work for us. In this study I will show you how to build a scalable application for Big Data using the following technologies: Scala LanguageSpark with Machine LearningAkka with ActorsCassandra A recommendation system is an information filtering mechanism that attempts to ...

Read More »

The Kubernetes Spark operator in OpenShift Origin (Part 1)

This series is about the Kubernetes Spark operator by Radanalytics.io onOpenShift Origin. It is an Open Source operator to manageApache Spark clusters and applications.In order to deploy the operator on OpenShift Origin, the first time you need to clone the GitHub repository for it: git clone https://github.com/radanalyticsio/spark-operator.git Then login to the cluster using the OpenShift command-lineoc: oc login -u <username>:<password> ...

Read More »

Sparklens: a tool for Spark applications optimization

Sparklens is a profiling tool for Spark with a built-in Spark Scheduler simulator: it makes easier to understand the scalability limits of Spark applications. It helps in understanding how efficiently is a given Spark application using the compute resources provided to it. It has been implemented and is maintained at Qubole. It is Open Source ( Apache License 2.0) and ...

Read More »

Native microservices with SparkJava and Graal

Microservices written with SparkJava are just plain Java code using a standard Java library. No annotation magic, just code. The advantage of this simple style of programming is that it is, well, simple. It’s so simple that the Graal native compiler just compiles it without blinking, something which is currently very difficult with more complex frameworks like Spring, for example. ...

Read More »

Spark Run local design pattern

Many spark applications have now become legacy applications and it is very hard to enhance, test & run locally. Spark has very good testing support but still many spark applications are not testable. I will share one common error that appears when you try to run some old spark applications. Exception in thread "main" org.apache.spark.SparkException: A master URL must be ...

Read More »

Exploring the Spline Data Tracker and Visualization tool for Apache Spark (Part 2)

In part 1 we have learned how to test data lineage info collection with Spline from a Spark shell. The same can be done in any Scala or Java Spark application. The same dependencies for the Spark shell need to be registered in your build tool of choice (Maven, Gradle or sbt): groupId: za.co.absa.spline artifactId: spline-core version: 0.3.5 groupId: za.co.absa.spline artifactId: spline-persistence-mongo ...

Read More »

Exploring the Spline Data Tracker and Visualization tool for Apache Spark (Part 1)

One interesting and promising Open Source project that caught my attention lately is Spline, a data lineage tracking and visualization tool for Apache Spark, maintained at  Absa. This project consists of 2 parts: a Scala library that works on the drivers which, by analyzing the Spark execution plans, captures the data lineages and a web application which provides a UI to visualize them. ...

Read More »