Home » Tag Archives: Big Data (page 2)

Tag Archives: Big Data

Spark Streaming and Twitter Sentiment Analysis

apache-spark-logo

This blog post is the result of my efforts to show to a coworker how to get the insights he needed by using the streaming capabilities and concise API of Apache Spark. In this blog post, you’ll learn how to do some simple, yet very interesting analytics that will help you solve real problems by analyzing specific areas of a ...

Read More »

Key Steps for Removing the Hive Metastore Password from the Hive Configuration

apache-hive-logo

In a typical Hive installation with metadata in a MySQL configuration, a password is configured in a configuration file in clear text. This presents a few risks: 1) Unauthorized access could destroy/modify Hive metadata and disrupt workflows. A malicious user could alter Hive permissions or damage metadata. 2) This password permits hiveserver2-thrift-MySQL communication. To avoid this problem, you should use ...

Read More »

Spark Data Source API: Extending Our Spark SQL Query Engine

apache-spark-logo

In my last post, Apache Spark as a Distributed SQL Engine, we explained how we could use SQL to query our data stored within Hadoop. Our engine is capable of reading CSV files from a distributed file system, auto discovering the schema from the files and exposing them as tables through the Hive meta store. All this was done to ...

Read More »

Achieving Sub Second SQL JOINs and building a data warehouse using Spark, Cassandra, and FiloDB

apache-cassandra-logo

Evan loves to design, build, and improve bleeding edge distributed data and backend systems using the latest in open source technologies. He is the creator of the FiloDB open-source distributed analytical database, as well as the Spark Job Server. He has led the design and implementation of multiple big data platforms based on Storm, Spark, Kafka, Cassandra, and Scala/Akka, including ...

Read More »

The Method Behind March Madness

software-development-2-logo

There are 150 quintillion (i.e. the one after trillion) permutations to consider when completing your NCAA bracket. Some of us don’t have time to review them all; if you are likewise short on time, you can let MapR do the heavy lifting for you and get your personalized bracket from the Crystal B-Ball! In this post, we describe the methodology ...

Read More »

Getting Started with MapR Streams

software-development-2-logo

MapR Streams is a new distributed messaging system for streaming event data at scale, and it’s integrated into the MapR converged platform. MapR Streams uses the Apache Kafka API, so if you’re already familiar with Kafka, you’ll find it particularly easy to get started with MapR Streams. Although MapR Streams generally uses the Apache Kafka programming model, there are a ...

Read More »

Apache Flink GA – Planning for the Future

software-development-2-logo

The distributed computation world has seen a massive shift in the last decade. Apache Hadoop showed up on the scene and brought with it new ways to handle distributed computation at scale. It wasn’t the easiest to work with, and the APIs were far from perfect, but they worked. People tried using this platform as the proverbial hammer to build ...

Read More »

Gartner 2016 Magic Quadrant for Data Warehouse and Database Management Solutions for Analytics

software-development-2-logo

We are excited to share with you that Gartner has named MapR a Visionary in the Gartner 2016 Magic Quadrant for Data Warehouse and Data Management Solutions for Analytics. Gartner evaluated 21 software vendors on 15 criteria for the quadrant. The MapR Converged Data Platform enables customers to leverage a real-time, reliable analytics platform for global data-driven applications. MAGIC QUADRANT ...

Read More »

The most important thing to know in Cassandra data modeling: The primary key

apache-cassandra-logo

Patrick McFadin, Chief Evangelist for Apache Cassandra, DataStax Patrick is regarded as one of the foremost experts of Apache Cassandra and data modeling techniques. As the Chief Evangelist for Apache Cassandra and consultant for DataStax, he has helped build some of the largest deployments in the world. Previous to DataStax, he was Chief Architect at Hobsons, an education services company. ...

Read More »

Decentralized Analytics for a Complex World

software-development-2-logo

In 2015, General Stan McChrystal published Team of Teams, New Rules of Engagement For a Complex World. It was the culmination of his experience in adapting to a world that had changed faster than the organization that he was responsible to lead. When he assumed command for the Joint Special Operations Task Force in 2003, he recognized that their typical ...

Read More »

Want to take your Java skills to the next level?

Grab our programming books for FREE!

Here are some of the eBooks you will get:

  • Spring Interview QnA
  • Multithreading & Concurrency QnA
  • JPA Minibook
  • JVM Troubleshooting Guide
  • Advanced Java
  • Java Interview QnA
  • Java Design Patterns