Home » Author Archives: Carol Mcdonald

Author Archives: Carol Mcdonald

Monitoring Real-Time Uber Data Using Spark Machine Learning, Streaming, and the Kafka API (Part 2)

This post is the second part in a series where we will build a real-time example for analysis and monitoring of Uber car GPS trip data. If you have not already read the first part of this series, you should read that first. The first post discussed creating a machine learning model using Apache Spark’s K-means algorithm to cluster Uber data based ...

Read More »

Predicting Breast Cancer Using Apache Spark Machine Learning Logistic Regression

In this blog post, I’ll help you get started using Apache Spark’s spark.ml Logistic Regression for predicting cancer malignancy. Spark’s spark.ml library goal is to provide a set of APIs on top of DataFrames that help users create and tune machine learning workflows or pipelines. Using spark.ml with DataFrames improves performance through intelligent optimizations. Classification Classification is a family of ...

Read More »

How to Get Started with Spark Streaming and MapR Streams Using the Kafka API

This post will help you get started using Apache Spark Streaming for consuming and publishing messages with MapR Streams and the Kafka API. Spark Streaming is an extension of the core Spark API that enables continuous data stream processing. MapR Streams is a distributed messaging system for streaming event data at scale. MapR Streams enables producers and consumers to exchange events in real time via ...

Read More »

Fast, Scalable, Streaming Applications with MapR Streams, Spark Streaming, and MapR-DB

Many of the systems we want to monitor happen as a stream of events. Examples include event data from web or mobile applications, sensors, or medical devices. Real-time analysis examples include: Website monitoring , Network monitoring Fraud detection Web clicks Advertising Internet of Things: sensors Batch processing can give great insights into things that happened in the past, but it ...

Read More »

How to Get Started Using Apache Spark GraphX with Scala

Editor’s Note: Don’t miss our new free on-demand training course about how to create data pipeline applications using Apache Spark – learn more here. This post will help you get started using Apache Spark GraphX with Scala on the MapR Sandbox. GraphX is the Apache Spark component for graph-parallel computations, built upon a branch of mathematics called graph theory. It ...

Read More »

Apache Spark Machine Learning Tutorial

Editor’s Note: Don’t miss our new free on-demand training course about how to create data pipeline applications using Apache Spark – learn more here. Decision trees are widely used for the machine learning tasks of classification and regression. In this blog post, I’ll help you get started using Apache Spark’s MLlib machine learning decision trees for classification. Overview of ML Algorithms ...

Read More »

MapReduce Design Patterns Implemented in Apache Spark

This blog is a first in a series that discusses some design patterns from the book MapReduce design patterns and shows how these patterns can be implemented in Apache Spark(R). When writing MapReduce or Spark programs, it is useful to think about the data flows to perform a job. Even if Pig, Hive, Apache Drill and Spark Dataframes make it ...

Read More »

Want to take your Java skills to the next level?

Grab our programming books for FREE!

Here are some of the eBooks you will get:

  • Spring Interview QnA
  • Multithreading & Concurrency QnA
  • JPA Minibook
  • JVM Troubleshooting Guide
  • Advanced Java
  • Java Interview QnA
  • Java Design Patterns