Home » Tag Archives: MapReduce

Tag Archives: MapReduce

Is there a future for Map/Reduce?

software-development-2-logo

Google’s Jeffrey Dean and Sanjay Ghemawat filed the patent request and published the map/reduce paper  10 year ago (2004). According to WikiPedia Doug Cutting and Mike Cafarella created Hadoop, with its own implementation of Map/Reduce,  one year later at Yahoo – both these implementations were done for the same purpose – batch indexing of the web. Back than, the web began its “web 2.0″ transition, ...

Read More »

Hadoop MapReduce Concepts

apache-hadoop-logo

What do you mean by Map-Reduce programming? MapReduce is a programming model designed for processing large volumes of data in parallel by dividing the work into a set of independent tasks. The MapReduce programming model is inspired by functional languages and targets data-intensive computations. The input data format is application-specific, and is specified by the user. The output is a ...

Read More »

Can MapReduce solve planning problems?

software-development-2-logo

To solve a planning or optimization problem, some solvers tend to scale out poorly: As the problem has more variables and more constraints, they use a lot more RAM memory and CPU power. They can hit hardware memory limits at a few thousand variables and few million constraint matches. One way their users typically work around such hardware limits, is ...

Read More »

Apache Spark is now a top-level project

apache-spark-logo

The Apache Software Foundation (ASF) happily announced that Apache Spark has graduated from the Apache Incubator to become a Top-Level Project (TLP), signifying the project’s stability. Apache Spark is an Open Source cluster computing framework for fast and flexible large-scale data analysis. Spark has been the talk of the Big Data town for a while, and 2014 was predicted to ...

Read More »

MapReduce Algorithms – Understanding Data Joins Part II

apache-hadoop-logo

It’s been awhile since I last posted, and like last time I took a big break, I was taking some classes on Coursera. This time it was Functional Programming Principals in Scala and Principles of Reactive Programming. I found both of them to be great courses and would recommend taking either one if you have the time. In this post ...

Read More »

Run your Hadoop MapReduce job on Amazon EMR

apache-hadoop-mapreduce-logo

I have posted a while ago how to setup an EMR cluster by using CLI. In this post I will show how to setup the cluster by using the Java SDK for AWS. The best way to show how to do this with the Java AWS SDK is to show the complete example in my opinion, so lets start. Set ...

Read More »

Writing a Hadoop MapReduce task in Java

apache-hadoop-mapreduce-logo

Although Hadoop Framework itself is created with Java the MapReduce jobs can be written in many different languages. In this post I show how to create a MapReduce job in Java based on a Maven project like any other Java project.                 Prepare the example input Lets start with a fictional business case. ...

Read More »

MapReduce Algorithms – Understanding Data Joins Part 1

apache-hadoop-mapreduce-logo

In this post we continue with our series of implementing the algorithms found in the Data-Intensive Text Processing with MapReduce book, this time discussing data joins. While we are going to discuss the techniques for joining data in Hadoop and provide sample code, in most cases you probably won’t be writing code to perform joins yourself. Instead, joining data is ...

Read More »

What is Big Data – Theory to Implementation

jcg-logo

What is Big Data? You may ask; and more importantly why it is the latest trend in nearly every business domain? Is it just a hype or its here to stay? As a matter of fact “Big Data” is a pretty straightforward term – its just what its says – a very large data-set. How large? The exact answer is ...

Read More »

MapReduce Algorithms – Secondary Sorting

apache-hadoop-mapreduce-logo

We continue with our series on implementing MapReduce algorithms found in Data-Intensive Text Processing with MapReduce book. Other posts in this series: Working Through Data-Intensive Text Processing with MapReduce Working Through Data-Intensive Text Processing with MapReduce – Local Aggregation Part II Calculating A Co-Occurrence Matrix with Hadoop MapReduce Algorithms – Order Inversion       This post covers the pattern ...

Read More »
Want to take your Java Skills to the next level?
Grab our programming books for FREE!
  • Save time by leveraging our field-tested solutions to common problems.
  • The books cover a wide range of topics, from JPA and JUnit, to JMeter and Android.
  • Each book comes as a standalone guide (with source code provided), so that you use it as reference.
Last Step ...

Where should we send the free eBooks?

Good Work!
To download the books, please verify your email address by following the instructions found on the email we just sent you.