Home » Tag Archives: Apache Hadoop (page 4)

Tag Archives: Apache Hadoop

Setting up Apache Hadoop Multi – Node Cluster

apache-hadoop-logo

We are sharing our experience about Apache Hadoop Installation in Linux based machines (Multi-node). Here we will also share our experience about different troubleshooting also and make update in future. User creation and other configurations step – We start by adding a dedicated Hadoop system user in each cluster.       $ sudo addgroup hadoop $ sudo adduser –ingroup hadoop ...

Read More »

Running Map-Reduce Job in Apache Hadoop (Multinode Cluster)

apache-hadoop-logo

We will describe here the process to run MapReduce Job in Apache Hadoop in multinode cluster. To set up Apache Hadoop in Multinode Cluster, one can read Setting up Apache Hadoop Multi – Node Cluster. For setting up we have to configure the hadoop with the following in each machine: Add the following property in conf/mapred-site.xml in all the nodes ...

Read More »

Hadoop setup on single node and multi node

apache-hadoop-logo

We will describe Hadoop setup on single node and multi node. The Hadoop  environment setup and configuration will be described in details. First you need to download the following software (rpm). Java JDK RPM Apache Hadoop 0.20.204.0 RPM A)  Single node system Hadoop setup 1) Install JDK on a Red Hat or CentOS 5+ system.   $ ./jdk-6u26-linux-x64-rpm.bin.sh Java is ...

Read More »

How Hadoop Works? HDFS case study

apache-hadoop-hdfs-logo

The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver high-availability, the library itself is designed to detect ...

Read More »

Ganglia configuration for a small Hadoop cluster and some troubleshooting

apache-hadoop-logo

Ganglia is an open-source, scalable and distributed monitoring system for large clusters. It collects, aggregates and provides time-series views of tens of machine-related metrics such as CPU, memory, storage, network usage. You can see Ganglia in action at UC Berkeley Grid. Ganglia is also a popular solution for monitoring Hadoop and HBase clusters, since Hadoop (and HBase) has built-in support ...

Read More »

Hadoop Books Giveaway – Roundup

jcg-logo

Fellow geeks, Our giveaway of Packt Publishing’s books on Apache Hadoop has ended. You may find the original post for the competition here. The Prize Winners The 6 lucky winners that will receive the book prizes are (names are as appeared on their emails): Hadoop Real-World Solutions Cookbook Sellamuthu, Rudra Moorthy Josep Ventura Argerich Hadoop Beginner’s Guide Bhakti Rajdev Manuel ...

Read More »

Spring meets Apache Hadoop

spring-interview-questions-answers

SpringSource has just announced the first GA release of Spring for Apache Hadoop. The goal of this project is to simplify the development of Hadoop based applications. You may download the project here and check out the Maven artifacts here. Spring for Apache Hadoop was born to resolve the issue of having poorly constructed Hadoop applications, which usually consist of ...

Read More »

MapReduce Algorithms – Secondary Sorting

apache-hadoop-mapreduce-logo

We continue with our series on implementing MapReduce algorithms found in Data-Intensive Text Processing with MapReduce book. Other posts in this series: Working Through Data-Intensive Text Processing with MapReduce Working Through Data-Intensive Text Processing with MapReduce – Local Aggregation Part II Calculating A Co-Occurrence Matrix with Hadoop MapReduce Algorithms – Order Inversion       This post covers the pattern ...

Read More »

MapReduce Algorithms – Order Inversion

apache-hadoop-logo

This post is another segment in the series presenting MapReduce algorithms as found in the Data-Intensive Text Processing with MapReduce book. Previous installments are Local Aggregation, Local Aggregation PartII and Creating a Co-Occurrence Matrix. This time we will discuss the order inversion pattern. The order inversion pattern exploits the sorting phase of MapReduce to push data needed for calculations to ...

Read More »

Want to take your Java skills to the next level?

Grab our programming books for FREE!

Here are some of the eBooks you will get:

  • Advanced Java Guide
  • Java Design Patterns
  • JMeter Tutorial
  • Java 8 Features Tutorial
  • JUnit Tutorial
  • JSF Programming Cookbook
  • Java Concurrency Essentials