Home » Tag Archives: Scala

Tag Archives: Scala

Recommendation System Using Spark ML Akka and Cassandra

Building a recommendation system with Spark is a simple task. Spark’s machine learning library already does all the hard work for us. In this study I will show you how to build a scalable application for Big Data using the following technologies: Scala LanguageSpark with Machine LearningAkka with ActorsCassandra A recommendation system is an information filtering mechanism that attempts to ...

Read More »

Kotlin vs Scala: which is right for you?

Kotlin or Scala? Scala or Kotlin? The two contenders for the crown of the JVM Kingdoms and the title of “Better Java” each bring something unique to the fight. But which should be the next ruler of your code? Java is old. Not that there’s anything wrong with being old. Sometimes it can be a good thing. Math is old ...

Read More »

The beautiful simplicity of Apache Ranger plugin

If you are here, you already know what Apache Ranger is. It is the most popular, if not the only, way to manage security in the Hadoop framework. It has integrations with Active Directory, Kerberos and various others for authentication but I believe the most interesting feature is its authorization support. Being part of the Hadoop ecosystem, one would not be surprised ...

Read More »

Sparklens: a tool for Spark applications optimization

Sparklens is a profiling tool for Spark with a built-in Spark Scheduler simulator: it makes easier to understand the scalability limits of Spark applications. It helps in understanding how efficiently is a given Spark application using the compute resources provided to it. It has been implemented and is maintained at Qubole. It is Open Source ( Apache License 2.0) and ...

Read More »

Exploring the Spline Data Tracker and Visualization tool for Apache Spark (Part 2)

In part 1 we have learned how to test data lineage info collection with Spline from a Spark shell. The same can be done in any Scala or Java Spark application. The same dependencies for the Spark shell need to be registered in your build tool of choice (Maven, Gradle or sbt): groupId: za.co.absa.spline artifactId: spline-core version: 0.3.5 groupId: za.co.absa.spline artifactId: spline-persistence-mongo ...

Read More »

Exploring the Spline Data Tracker and Visualization tool for Apache Spark (Part 1)

One interesting and promising Open Source project that caught my attention lately is Spline, a data lineage tracking and visualization tool for Apache Spark, maintained at  Absa. This project consists of 2 parts: a Scala library that works on the drivers which, by analyzing the Spark execution plans, captures the data lineages and a web application which provides a UI to visualize them. ...

Read More »