Home » Apache Hadoop » Page 5

Tag Archives: Apache Hadoop

MapReduce Algorithms – Understanding Data Joins Part 1

In this post we continue with our series of implementing the algorithms found in the Data-Intensive Text Processing with MapReduce book, this time discussing data joins. While we are going to discuss the techniques for joining data in Hadoop and provide sample code, in most cases you probably won’t be writing code to perform joins yourself. Instead, joining data is ...

Read More »

Distributed System Development Considerations

There are a number of factors to take into account while developing distributed software systems. If you don’t even know what I am talking about in the first sentence then let me give you some insight, examples and for instances of what distributed systems are. Overview A distributed system is when multiple physical hardware devices interact with separate and discrete users and collaborate together through these ...

Read More »

Setting up Apache Hadoop Multi – Node Cluster

We are sharing our experience about Apache Hadoop Installation in Linux based machines (Multi-node). Here we will also share our experience about different troubleshooting also and make update in future. User creation and other configurations step – We start by adding a dedicated Hadoop system user in each cluster.       $ sudo addgroup hadoop $ sudo adduser –ingroup hadoop ...

Read More »

Running Map-Reduce Job in Apache Hadoop (Multinode Cluster)

We will describe here the process to run MapReduce Job in Apache Hadoop in multinode cluster. To set up Apache Hadoop in Multinode Cluster, one can read Setting up Apache Hadoop Multi – Node Cluster. For setting up we have to configure the hadoop with the following in each machine: Add the following property in conf/mapred-site.xml in all the nodes ...

Read More »

Hadoop setup on single node and multi node

We will describe Hadoop setup on single node and multi node. The Hadoop  environment setup and configuration will be described in details. First you need to download the following software (rpm). Java JDK RPM Apache Hadoop 0.20.204.0 RPM A)  Single node system Hadoop setup 1) Install JDK on a Red Hat or CentOS 5+ system.   $ ./jdk-6u26-linux-x64-rpm.bin.sh Java is ...

Read More »

How Hadoop Works? HDFS case study

The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver high-availability, the library itself is designed to detect ...

Read More »

Ganglia configuration for a small Hadoop cluster and some troubleshooting

Ganglia is an open-source, scalable and distributed monitoring system for large clusters. It collects, aggregates and provides time-series views of tens of machine-related metrics such as CPU, memory, storage, network usage. You can see Ganglia in action at UC Berkeley Grid. Ganglia is also a popular solution for monitoring Hadoop and HBase clusters, since Hadoop (and HBase) has built-in support ...

Read More »

Hadoop Books Giveaway – Roundup

Fellow geeks, Our giveaway of Packt Publishing’s books on Apache Hadoop has ended. You may find the original post for the competition here. The Prize Winners The 6 lucky winners that will receive the book prizes are (names are as appeared on their emails): Hadoop Real-World Solutions Cookbook Sellamuthu, Rudra Moorthy Josep Ventura Argerich Hadoop Beginner’s Guide Bhakti Rajdev Manuel ...

Read More »

Spring meets Apache Hadoop

spring-interview-questions-answers

SpringSource has just announced the first GA release of Spring for Apache Hadoop. The goal of this project is to simplify the development of Hadoop based applications. You may download the project here and check out the Maven artifacts here. Spring for Apache Hadoop was born to resolve the issue of having poorly constructed Hadoop applications, which usually consist of ...

Read More »