Home » Tag Archives: Apache Hadoop (page 2)

Tag Archives: Apache Hadoop

Introducing Hadoop Development Tools

apache-hadoop-logo

Few days back Apache Hadoop Development Tools a.k.a. HDT was released.  The projects aims at bringing plugins in eclipse to simplify development on Hadoop platform. This blog aims to provide an overview of few great features of HDT. Single Endpoint The project can act as a single endpoint for your HDFS, Zookeeper and MR Cluster. You can connect to your HDFS/Zookeeper instance ...

Read More »

Graph Degree Distributions using R over Hadoop

software-development-2-logo

There are two common types of graph engines. One type is focused on providing real-time, traversal-based algorithms over linked-list graphs represented on a single-server. Such engines are typically called graph databases and some of the vendors include Neo4j, OrientDB, DEX, and InfiniteGraph. The other type of graph engine is focused on batch-processing using vertex-centric message passing within a graph represented ...

Read More »

Understanding the World using Tables and Graphs

apache-hadoop-logo

Organizations make use of data to drive their decision making, enhance their product features, and to increase the efficiency of their everyday operations. Data by itself is not useful. However, with data analysis, patterns such as trends, clusters, predictions, etc. can be distilled. The way in which data is analyzed is predicated on the way in which data is structured. ...

Read More »

ElasticSearch-Hadoop: Indexing product views count and customer top search query from Hadoop to ElasticSearch

apache-hadoop-logo

This post covers to use ElasticSearch-Hadoop to read data from Hadoop system and index that in ElasticSearch. The functionality it covers is to index product views count and top search query per customer in last n number of days. The analyzed data can further be used on website to display customer recently viewed, product views count and top search query string. ...

Read More »

Apache Hadoop 2.4.0

apache-hadoop-logo

The Apache community has voted to release Apache Hadoop 2.4.0, so the new release is now available and consists of important improvements. The improvements are related not only to HDFS but also to MapReduce. The important improvement in HDFS is about NameNodes. Multiple independent Namenodes and Namespaces are now used that do not require coordination with each other. Datanodes are ...

Read More »

Hadoop MapReduce Concepts

apache-hadoop-logo

What do you mean by Map-Reduce programming? MapReduce is a programming model designed for processing large volumes of data in parallel by dividing the work into a set of independent tasks. The MapReduce programming model is inspired by functional languages and targets data-intensive computations. The input data format is application-specific, and is specified by the user. The output is a ...

Read More »

MapReduce Algorithms – Understanding Data Joins Part II

apache-hadoop-logo

It’s been awhile since I last posted, and like last time I took a big break, I was taking some classes on Coursera. This time it was Functional Programming Principals in Scala and Principles of Reactive Programming. I found both of them to be great courses and would recommend taking either one if you have the time. In this post ...

Read More »

Coordination and service discovery with Apache Zookeeper

apache-hadoop-logo

Service-oriented design has proven to be a successful solution for a huge variety of different distributed systems. When used properly, it has a lot of benefits. But as number of services grows, it becomes more difficult to understand what is deployed and where. And because we are building reliable and highly-available systems, yet another question to ask: how many instances ...

Read More »

Configuring Hadoop with Guava MapSplitters

apache-hadoop-logo

In this post we are going to provide a new twist on passing configuration parameters to a Hadoop Mapper via the Context object. Typically, we set configuration parameters as key/value pairs on the Context object when starting a map-reduce job. Then in the Mapper we use the key(s) to retrieve the value(s) to use for our configuration needs. The twist ...

Read More »
Do you want to know how to develop your skillset and become a ...

Subscribe to our newsletter to start Rocking right now!

To get you started we give you our best selling eBooks for FREE!
Get ready to Rock!
To download the books, please verify your email address by following the instructions found on the email we just sent you.

THANK YOU!

Close