Home » Tag Archives: Apache Hadoop (page 4)

Tag Archives: Apache Hadoop

How Hadoop Works? HDFS case study

apache-hadoop-hdfs-logo

The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver high-availability, the library itself is designed to detect ...

Read More »

Ganglia configuration for a small Hadoop cluster and some troubleshooting

apache-hadoop-logo

Ganglia is an open-source, scalable and distributed monitoring system for large clusters. It collects, aggregates and provides time-series views of tens of machine-related metrics such as CPU, memory, storage, network usage. You can see Ganglia in action at UC Berkeley Grid. Ganglia is also a popular solution for monitoring Hadoop and HBase clusters, since Hadoop (and HBase) has built-in support ...

Read More »

Hadoop Books Giveaway – Roundup

jcg-logo

Fellow geeks, Our giveaway of Packt Publishing’s books on Apache Hadoop has ended. You may find the original post for the competition here. The Prize Winners The 6 lucky winners that will receive the book prizes are (names are as appeared on their emails): Hadoop Real-World Solutions Cookbook Sellamuthu, Rudra Moorthy Josep Ventura Argerich Hadoop Beginner’s Guide Bhakti Rajdev Manuel ...

Read More »

Spring meets Apache Hadoop

spring-interview-questions-answers

SpringSource has just announced the first GA release of Spring for Apache Hadoop. The goal of this project is to simplify the development of Hadoop based applications. You may download the project here and check out the Maven artifacts here. Spring for Apache Hadoop was born to resolve the issue of having poorly constructed Hadoop applications, which usually consist of ...

Read More »

MapReduce Algorithms – Secondary Sorting

apache-hadoop-mapreduce-logo

We continue with our series on implementing MapReduce algorithms found in Data-Intensive Text Processing with MapReduce book. Other posts in this series: Working Through Data-Intensive Text Processing with MapReduce Working Through Data-Intensive Text Processing with MapReduce – Local Aggregation Part II Calculating A Co-Occurrence Matrix with Hadoop MapReduce Algorithms – Order Inversion       This post covers the pattern ...

Read More »

MapReduce Algorithms – Order Inversion

apache-hadoop-logo

This post is another segment in the series presenting MapReduce algorithms as found in the Data-Intensive Text Processing with MapReduce book. Previous installments are Local Aggregation, Local Aggregation PartII and Creating a Co-Occurrence Matrix. This time we will discuss the order inversion pattern. The order inversion pattern exploits the sorting phase of MapReduce to push data needed for calculations to ...

Read More »

Calculating A Co-Occurrence Matrix with Hadoop

apache-hadoop-mapreduce-logo

This post continues with our series of implementing the MapReduce algorithms found in the Data-Intensive Text Processing with MapReduce book. This time we will be creating a word co-occurrence matrix from a corpus of text. Previous posts in this series are: Working Through Data-Intensive Text Processing with MapReduce Working Through Data-Intensive Text Processing with MapReduce – Local Aggregation Part II ...

Read More »

Hadoop Single Node Set Up

apache-hadoop-logo

With this post I am hoping to share the procedure to set up Apache Hadoop in single node. Hadoop is used in dealing with Big Data sets where deployment is happening on low-cost commodity hardware. It is a map-reduce framework which map segments of a job among the nodes in a cluster for execution. Though we will not see the ...

Read More »

Hadoop + Amazon EC2 – An updated tutorial

apache-hadoop-logo

There is an old tutorial placed at Hadoop’s wiki page: http://wiki.apache.org/hadoop/AmazonEC2, but recently I had to follow this tutorial and I noticed that it doesn’t cover some new Amazon functionality. To follow this tutorial is recommended that you are already familiar with the basics of Hadoop, a very useful ‘how to start’ tutorial can be found at Hadoop’s homepage: http://hadoop.apache.org/. ...

Read More »
Want to take your Java Skills to the next level?
Grab our programming books for FREE!
  • Save time by leveraging our field-tested solutions to common problems.
  • The books cover a wide range of topics, from JPA and JUnit, to JMeter and Android.
  • Each book comes as a standalone guide (with source code provided), so that you use it as reference.
Last Step ...

Where should we send the free eBooks?

Good Work!
To download the books, please verify your email address by following the instructions found on the email we just sent you.