Home » Apache Hadoop » Page 7

Tag Archives: Apache Hadoop

MapReduce: Working Through Data-Intensive Text Processing

It has been a while since I last posted, as I’ve been busy with some of the classes offered by Coursera. There are some very interesting offerings and is worth a look. Some time ago, I purchased Data-Intensive Processing with MapReduce by Jimmy Lin and Chris Dyer. The book presents several key MapReduce algorithms, but in pseudo code format. My ...

Read More »

Apache Bigtop – Installing Hive, HBase and Pig

In the previous post we learnt how easy it was to install Hadoop with Apache Bigtop! We know its not just Hadoop and there are sub-projects around the table! So, lets have a look at how to install Hive, Hbase and Pig in this post. Before rowing your boat… Please follow the previous post and get ready with Hadoop installed! ...

Read More »

Apache Bigtop – Installing Hadoop

Ah!! The name is everywhere, carried with the wind. Apache Hadoop!! The BIG DATA crunching platform! We all know how alien it can be at start too! Phew!! :o Its my personal experience, nearly 11 months before, I was trying to install HBase, I faced few issues! The problem was version compatibility. Ex: “HBase some x.version” with “Hadoop some y.version”. ...

Read More »

A SMALL cross-section of BIG Data

Big data is a term applied to data sets whose size is beyond the ability of commonly used software tools to capture, manage, and process the data within a tolerable elapsed time. Big data sizes are a constantly moving target currently ranging from a few dozen terabytes to many petabytes of data in a single data set. IDC estimated the ...

Read More »

The problems in Hadoop – When does it fail to deliver?

Hadoop is a great piece of software. It is not original but that certainly does not take away its glory. It builds on parallel processing, a concept that’s been around for decades. Although conceptually unoriginal, Hadoop shows the power of being free and open (as in beer!) and most of all shows about what usability is all about. It succeeded ...

Read More »

Big Data analytics with Hive and iReport

Each J.J. Abrams’ TV series Person of Interest episode starts with the following narration from Mr. Finch one of the leading characters: “You are being watched. The government has a secret system–a machine that spies on you every hour of every day. I know because…I built it.” Of course us technical people know better. It would take a huge team ...

Read More »

Hadoop Modes Explained – Standalone, Pseudo Distributed, Distributed

After Understanding What is Hadoop Lets start Hadoop on Single Machine: This post contains instructions for Hadoop installation on ubuntu. This is a quick step by step tutorial of Hadoop installation. Here you will get all the commands and their description required to install Hadoop in Standalone mode (single node cluster),  Hadoop in Pseudo distributed mode (single node cluster) and  Hadoop in distributed ...

Read More »

Hadoop: A Soft Introduction

What is Hadoop: Hadoop is a framework written in Java for running applications on large clusters of commodity hardware and incorporates features similar to those of the Google File System and of MapReduce. HDFSis a highly fault-tolerant distributed file system and like Hadoop designed to be deployed on low-cost hardware. It provides high throughput access to application data and is ...

Read More »