Apache Hadoop
-
Enterprise Java

Hadoop Hangover: Launch a hadoop cluster CDH4 using Apache Whirr
This post is about how-to launch a CDH4 MRv1 or CDH4 Yarn cluster on EC2 instances. It’s said that you…
Read More » -
Enterprise Java

MapReduce Algorithms – Secondary Sorting
We continue with our series on implementing MapReduce algorithms found in Data-Intensive Text Processing with MapReduce book. Other posts in…
Read More » -
Enterprise Java

MapReduce Algorithms – Order Inversion
This post is another segment in the series presenting MapReduce algorithms as found in the Data-Intensive Text Processing with MapReduce…
Read More » -
Enterprise Java

Calculating A Co-Occurrence Matrix with Hadoop
This post continues with our series of implementing the MapReduce algorithms found in the Data-Intensive Text Processing with MapReduce book.…
Read More » -
DevOps

Hadoop Single Node Set Up
With this post I am hoping to share the procedure to set up Apache Hadoop in single node. Hadoop is…
Read More » -
Enterprise Java

Hadoop + Amazon EC2 – An updated tutorial
There is an old tutorial placed at Hadoop’s wiki page: http://wiki.apache.org/hadoop/AmazonEC2, but recently I had to follow this tutorial and…
Read More » -
Enterprise Java

Testing Hadoop Programs with MRUnit
 This post will take a slight detour from implementing the patterns found in Data-Intensive Processing with MapReduce to discuss something…
Read More » -
DevOps

Distributed Apache Flume Setup With an HDFS Sink
I have recently spent a few days getting up to speed with Flume, Clouderaâs distributed log offering. If you havenât…
Read More » -
Enterprise Java

MapReduce: Working Through Data-Intensive Text Processing – Local Aggregation Part II
This post continues with the series on implementing algorithms found in the Data Intensive Processing with MapReduce book. Part one…
Read More »


