Big Data
-
Software Development

Using Apache Spark SQL to Explore S&P 500, and Oil Stock Prices
This post will use Apache Spark SQL and DataFrames to query, compare and explore S&P 500, Exxon and Anadarko Petroleum…
Read More » -
Software Development

How Apache Kafka and MapR Streams Handle Topic Partitions
Streaming data can be used as a long-term auditable history when you choose a messaging system with persistence, but is…
Read More » -
Software Development

The Changing Economics of Big Data
Perhaps you’re old enough to remember when the library was the place we went to learn. We foraged through card…
Read More » -
Software Development

Real Time Credit Card Fraud Detection with Apache Spark and Event Streaming
In this post we are going to discuss building a real time solution for credit card fraud detection. There are…
Read More » -
Software Development

Distributed Deep Learning with Caffe Using a MapR Cluster
We have experimented with CaffeOnSpark on a 5 node MapR 5.1 cluster running Spark 1.5.2 and will share our experience, difficulties,…
Read More » -
Software Development

Evolution of Big Data Storage: How to Support Real-time Analytics at Scale
Organizations embracing big data are ready to put data to work, including looking for ways to effectively analyze data from…
Read More » -
Software Development

Spark Streaming and Twitter Sentiment Analysis
This blog post is the result of my efforts to show to a coworker how to get the insights he…
Read More » -
Software Development

Key Steps for Removing the Hive Metastore Password from the Hive Configuration
In a typical Hive installation with metadata in a MySQL configuration, a password is configured in a configuration file in…
Read More » -
Software Development

Spark Data Source API: Extending Our Spark SQL Query Engine
In my last post, Apache Spark as a Distributed SQL Engine, we explained how we could use SQL to query…
Read More »


