Home » Tag Archives: Big Data (page 2)

Tag Archives: Big Data

Spark Data Source API: Extending Our Spark SQL Query Engine

apache-spark-logo

In my last post, Apache Spark as a Distributed SQL Engine, we explained how we could use SQL to query our data stored within Hadoop. Our engine is capable of reading CSV files from a distributed file system, auto discovering the schema from the files and exposing them as tables through the Hive meta store. All this was done to ...

Read More »

Achieving Sub Second SQL JOINs and building a data warehouse using Spark, Cassandra, and FiloDB

apache-cassandra-logo

Evan loves to design, build, and improve bleeding edge distributed data and backend systems using the latest in open source technologies. He is the creator of the FiloDB open-source distributed analytical database, as well as the Spark Job Server. He has led the design and implementation of multiple big data platforms based on Storm, Spark, Kafka, Cassandra, and Scala/Akka, including ...

Read More »

The Method Behind March Madness

software-development-2-logo

There are 150 quintillion (i.e. the one after trillion) permutations to consider when completing your NCAA bracket. Some of us don’t have time to review them all; if you are likewise short on time, you can let MapR do the heavy lifting for you and get your personalized bracket from the Crystal B-Ball! In this post, we describe the methodology ...

Read More »

Getting Started with MapR Streams

software-development-2-logo

MapR Streams is a new distributed messaging system for streaming event data at scale, and it’s integrated into the MapR converged platform. MapR Streams uses the Apache Kafka API, so if you’re already familiar with Kafka, you’ll find it particularly easy to get started with MapR Streams. Although MapR Streams generally uses the Apache Kafka programming model, there are a ...

Read More »

Apache Flink GA – Planning for the Future

software-development-2-logo

The distributed computation world has seen a massive shift in the last decade. Apache Hadoop showed up on the scene and brought with it new ways to handle distributed computation at scale. It wasn’t the easiest to work with, and the APIs were far from perfect, but they worked. People tried using this platform as the proverbial hammer to build ...

Read More »

Gartner 2016 Magic Quadrant for Data Warehouse and Database Management Solutions for Analytics

software-development-2-logo

We are excited to share with you that Gartner has named MapR a Visionary in the Gartner 2016 Magic Quadrant for Data Warehouse and Data Management Solutions for Analytics. Gartner evaluated 21 software vendors on 15 criteria for the quadrant. The MapR Converged Data Platform enables customers to leverage a real-time, reliable analytics platform for global data-driven applications. MAGIC QUADRANT ...

Read More »

The most important thing to know in Cassandra data modeling: The primary key

apache-cassandra-logo

Patrick McFadin, Chief Evangelist for Apache Cassandra, DataStax Patrick is regarded as one of the foremost experts of Apache Cassandra and data modeling techniques. As the Chief Evangelist for Apache Cassandra and consultant for DataStax, he has helped build some of the largest deployments in the world. Previous to DataStax, he was Chief Architect at Hobsons, an education services company. ...

Read More »

Decentralized Analytics for a Complex World

software-development-2-logo

In 2015, General Stan McChrystal published Team of Teams, New Rules of Engagement For a Complex World. It was the culmination of his experience in adapting to a world that had changed faster than the organization that he was responsible to lead. When he assumed command for the Joint Special Operations Task Force in 2003, he recognized that their typical ...

Read More »

Cassandra: The Foundation Big Data Building Block

apache-cassandra-logo

As Chief Technology Officer and co-founder at Instaclustr, Ben sets the technical direction for the company, identifying new features and capability. Ben is located in our Redwood City office and he was recognized as an Apache Cassandra MVP at the Cassandra Summit in 2015. Ben is active in the community often speaking at local meetups and presenting at related conferences. ...

Read More »

The Essential Guide to Streaming-first Processing with Apache Flink

software-development-2-logo

Editor’s note: This is a post by Apache Flink PMC members Fabian Hueske and Kostas Tzoumas. Fabian and Kostas are also co-founders of data Artisans.  A very large part of today’s data processing is done on data that is continuously produced, e.g., data from user activity logs, web logs, machines, sensors, and database transactions. Until now, data streaming technology was lacking in several ...

Read More »

Want to take your Java skills to the next level?

Grab our programming books for FREE!

Here are some of the eBooks you will get:

  • Advanced Java Guide
  • Java Design Patterns
  • JMeter Tutorial
  • Java 8 Features Tutorial
  • JUnit Tutorial
  • JSF Programming Cookbook
  • Java Concurrency Essentials