Home » Tag Archives: Big Data (page 3)

Tag Archives: Big Data

Distributed Deep Learning with Caffe Using a MapR Cluster

We have experimented with CaffeOnSpark on a 5 node MapR 5.1 cluster running Spark 1.5.2 and will share our experience, difficulties, and solutions on this blog post. Deep Learning and Caffe Deep learning is getting a lot of attention recently, with AlphaGo beating a top world  player at a game that was thought so complicated as to be out of reach of ...

Read More »

Spark Streaming and Twitter Sentiment Analysis

This blog post is the result of my efforts to show to a coworker how to get the insights he needed by using the streaming capabilities and concise API of Apache Spark. In this blog post, you’ll learn how to do some simple, yet very interesting analytics that will help you solve real problems by analyzing specific areas of a ...

Read More »

Key Steps for Removing the Hive Metastore Password from the Hive Configuration

In a typical Hive installation with metadata in a MySQL configuration, a password is configured in a configuration file in clear text. This presents a few risks: 1) Unauthorized access could destroy/modify Hive metadata and disrupt workflows. A malicious user could alter Hive permissions or damage metadata. 2) This password permits hiveserver2-thrift-MySQL communication. To avoid this problem, you should use ...

Read More »

Spark Data Source API: Extending Our Spark SQL Query Engine

In my last post, Apache Spark as a Distributed SQL Engine, we explained how we could use SQL to query our data stored within Hadoop. Our engine is capable of reading CSV files from a distributed file system, auto discovering the schema from the files and exposing them as tables through the Hive meta store. All this was done to ...

Read More »

Achieving Sub Second SQL JOINs and building a data warehouse using Spark, Cassandra, and FiloDB

Evan loves to design, build, and improve bleeding edge distributed data and backend systems using the latest in open source technologies. He is the creator of the FiloDB open-source distributed analytical database, as well as the Spark Job Server. He has led the design and implementation of multiple big data platforms based on Storm, Spark, Kafka, Cassandra, and Scala/Akka, including ...

Read More »

The Method Behind March Madness

There are 150 quintillion (i.e. the one after trillion) permutations to consider when completing your NCAA bracket. Some of us don’t have time to review them all; if you are likewise short on time, you can let MapR do the heavy lifting for you and get your personalized bracket from the Crystal B-Ball! In this post, we describe the methodology ...

Read More »

Getting Started with MapR Streams

MapR Streams is a new distributed messaging system for streaming event data at scale, and it’s integrated into the MapR converged platform. MapR Streams uses the Apache Kafka API, so if you’re already familiar with Kafka, you’ll find it particularly easy to get started with MapR Streams. Although MapR Streams generally uses the Apache Kafka programming model, there are a ...

Read More »