Home » Tag Archives: Big Data (page 2)

Tag Archives: Big Data

Changing the Game When it Comes to Auditing in Big Data – Part 2

software-development-2-logo

In my previous blog post we enabled auditing at the various levels of your MapR cluster. In this follow up post we will analyse the audit logs using Apache Drill to start answering questions like: Unauthorized cluster changes and data access Complying with regulatory frameworks and legislation Data usage heatmaps on cold, warm and hot data Data access analytics and ...

Read More »

Changing the Game When it Comes to Auditing in Big Data – Part 1

software-development-2-logo

With MapR version 5.0 being released recently, MapR customers got yet another powerful feature at no additional licensing costs: Auditing! In this two-folded blog post, I’ll describe various use cases for auditing as well as a instructions for how to deploy these cases in your MapR environment. The auditing features in MapR let you log audit records of cluster administration ...

Read More »

Distributed Stream and Graph Processing with Apache Flink

software-development-2-logo

Apache Flink is a top-level Apache project that allows unifying distributed stream and batch processing. In the core of Apache Flink is a streaming dataflow engine that provides data distribution, communication, and fault tolerance for distributed computations over data streams. On August 27, the Bay Area Apache Flink Meetup had another event hosted by MapR. This time, the main topics ...

Read More »

Architecting Data Intensive Applications – Part 1

software-development-2-logo

Introduction Every software application can, in essence, be divided into two types : Compute Intensive Applications and Data Intensive Applications. And then there are applications that fall somewhere between these two extremes. Today I would be talking about how to define the High Level Architecture for applications that are focused on leveraging the data of the enterprise in order to ...

Read More »

In-memory Data Model and Persistence for Big Data

java-interview-questions-answers

ORM frameworks help developers when they want to interact with relational databases. There are many excellent ORM frameworks for relational databases such as Hibernate and Apache OpenJPA and some of them are really good. Nowadays, big data is emerging and more and more people develops applications which runs on big data. There have been developed different kinds of NoSQL databases to store such ...

Read More »

Big Data Skills Spectrum

software-development-2-logo

Big Data has been a hype for many years. I’ve seen a few “Big Data” projects start in the past with a lot of fanfare and promise.The promise has always been that “we will start getting a holistic picture of our departmental silos and gain numerous insights from our data that will help us get ahead of our competitors”. The ...

Read More »

What are the 5 Trends for Testing in the Era of Big Data?

software-development-2-logo

In today’s world of data explosion, big data applications and their implementations are growing dramatically. As data is at the heart of any big data application, it is important to understand the characteristics of big data. The three most unique characteristics of big data are ‘Volume’, ‘Velocity’ and ‘Variety’. And these data comes in different format from multiple channels. All ...

Read More »

Running PageRank Hadoop job on AWS Elastic MapReduce

apache-hadoop-logo

In a previous post I described an example to perform a PageRank calculation which is part of the Mining Massive Dataset course with Apache Hadoop. In that post I took an existing Hadoop job in Java and modified it somewhat (added unit tests and made file paths set by a parameter). This post shows how to use this job on ...

Read More »

Calculate PageRanks with Apache Hadoop

apache-hadoop-logo

Currently I am following the Coursera training ‘Mining Massive Datasets‘. I have been interested in MapReduce and Apache Hadoop for some time and with this course I hope to get more insight in when and how MapReduce can help to fix some real world business problems (another way to do so I described here). This Coursera course is mainly focussing ...

Read More »

Even Doctors Will Be Data Scientists

software-development-2-logo

We all know how it works. You walk into a doctor’s office complaining about some pain in your leg or otherwise. They take your temperature, get you on the scale, check your blood pressure, and perhaps even get out the rubber hammer. These measurements are simply snapshots at one particular instant in time and may be subject to error. This ...

Read More »

Want to take your Java skills to the next level?

Grab our programming books for FREE!

Here are some of the eBooks you will get:

  • Advanced Java Guide
  • Java Design Patterns
  • JMeter Tutorial
  • Java 8 Features Tutorial
  • JUnit Tutorial
  • JSF Programming Cookbook
  • Java Concurrency Essentials