Home » Tag Archives: Big Data (page 5)

Tag Archives: Big Data

Top 10 Big Data Trends in 2016 for Financial Services

2015 was a groundbreaking year for banking and financial markets firms, as they continue to learn how big data can help transform their processes and organizations. Now, with an eye towards what lies ahead for 2016, we see that financial services organizations are still at various stages of their activity with big data in terms of how they’re changing their ...

Read More »

MapReduce Design Patterns Implemented in Apache Spark

This blog is a first in a series that discusses some design patterns from the book MapReduce design patterns and shows how these patterns can be implemented in Apache Spark(R). When writing MapReduce or Spark programs, it is useful to think about the data flows to perform a job. Even if Pig, Hive, Apache Drill and Spark Dataframes make it ...

Read More »

Introduction to Apache Spark with Examples and Use Cases

I first heard of Spark in late 2013 when I became interested in Scala, the language in which Spark is written. Some time later, I did a fun data science project trying to predict survival on the Titanic. This turned out to be a great way to get further introduced to Spark concepts and programming. I highly recommend it for ...

Read More »

Changing the Game When it Comes to Auditing in Big Data – Part 2

In my previous blog post we enabled auditing at the various levels of your MapR cluster. In this follow up post we will analyse the audit logs using Apache Drill to start answering questions like: Unauthorized cluster changes and data access Complying with regulatory frameworks and legislation Data usage heatmaps on cold, warm and hot data Data access analytics and ...

Read More »

Changing the Game When it Comes to Auditing in Big Data – Part 1

With MapR version 5.0 being released recently, MapR customers got yet another powerful feature at no additional licensing costs: Auditing! In this two-folded blog post, I’ll describe various use cases for auditing as well as a instructions for how to deploy these cases in your MapR environment. The auditing features in MapR let you log audit records of cluster administration ...

Read More »

Distributed Stream and Graph Processing with Apache Flink

Apache Flink is a top-level Apache project that allows unifying distributed stream and batch processing. In the core of Apache Flink is a streaming dataflow engine that provides data distribution, communication, and fault tolerance for distributed computations over data streams. On August 27, the Bay Area Apache Flink Meetup had another event hosted by MapR. This time, the main topics ...

Read More »

Architecting Data Intensive Applications – Part 1

Introduction Every software application can, in essence, be divided into two types : Compute Intensive Applications and Data Intensive Applications. And then there are applications that fall somewhere between these two extremes. Today I would be talking about how to define the High Level Architecture for applications that are focused on leveraging the data of the enterprise in order to ...

Read More »

In-memory Data Model and Persistence for Big Data


ORM frameworks help developers when they want to interact with relational databases. There are many excellent ORM frameworks for relational databases such as Hibernate and Apache OpenJPA and some of them are really good. Nowadays, big data is emerging and more and more people develops applications which runs on big data. There have been developed different kinds of NoSQL databases to store such ...

Read More »

Want to take your Java skills to the next level?

Grab our programming books for FREE!

Here are some of the eBooks you will get:

  • Spring Interview QnA
  • Multithreading & Concurrency QnA
  • JPA Minibook
  • JVM Troubleshooting Guide
  • Advanced Java
  • Java Interview QnA
  • Java Design Patterns