List/Grid Tag Archives: Apache Hadoop

Hadoop + Amazon EC2 – An updated tutorial
There is an old tutorial placed at Hadoop’s wiki page: http://wiki.apache.org/hadoop/AmazonEC2, but recently I had to follow this tutorial and I noticed that it doesn’t ...

Testing Hadoop Programs with MRUnit
This post will take a slight detour from implementing the patterns found in Data-Intensive Processing with MapReduce to discuss something equally important, testing. I was inspired ...

Distributed Apache Flume Setup With an HDFS Sink
I have recently spent a few days getting up to speed with Flume, Cloudera‘s distributed log offering. If you haven’t seen this and deal with lots of logs, you are definitely missing ...

MapReduce: Working Through Data-Intensive Text Processing – Local Aggregation Part II
This post continues with the series on implementing algorithms found in the Data Intensive Processing with MapReduce book. Part one can be found here. In the previous post, we discussed ...

MapReduce: Working Through Data-Intensive Text Processing
It has been a while since I last posted, as I’ve been busy with some of the classes offered by Coursera. There are some very interesting offerings and is worth a look. Some time ago, ...

Apache Bigtop – Installing Hive, HBase and Pig
In the previous post we learnt how easy it was to install Hadoop with Apache Bigtop! We know its not just Hadoop and there are sub-projects around the table! So, lets have a look at ...

Apache Bigtop – Installing Hadoop
Ah!! The name is everywhere, carried with the wind. Apache Hadoop!! The BIG DATA crunching platform! We all know how alien it can be at start too! Phew!!Its my personal experience, ...

A SMALL cross-section of BIG Data
Big data is a term applied to data sets whose size is beyond the ability of commonly used software tools to capture, manage, and process the data within a tolerable elapsed time. Big ...

The problems in Hadoop – When does it fail to deliver?
Hadoop is a great piece of software. It is not original but that certainly does not take away its glory. It builds on parallel processing, a concept that’s been around for decades. ...

Big Data analytics with Hive and iReport
Each J.J. Abrams’ TV series Person of Interest episode starts with the following narration from Mr. Finch one of the leading characters: “You are being watched. The government ...


