Home » Tag Archives: MapReduce (page 2)

Tag Archives: MapReduce

What is Big Data – Theory to Implementation

jcg-logo

What is Big Data? You may ask; and more importantly why it is the latest trend in nearly every business domain? Is it just a hype or its here to stay? As a matter of fact “Big Data” is a pretty straightforward term – its just what its says – a very large data-set. How large? The exact answer is ...

Read More »

MapReduce Algorithms – Secondary Sorting

apache-hadoop-mapreduce-logo

We continue with our series on implementing MapReduce algorithms found in Data-Intensive Text Processing with MapReduce book. Other posts in this series: Working Through Data-Intensive Text Processing with MapReduce Working Through Data-Intensive Text Processing with MapReduce – Local Aggregation Part II Calculating A Co-Occurrence Matrix with Hadoop MapReduce Algorithms – Order Inversion       This post covers the pattern ...

Read More »

Couchbase 101: Create views (MapReduce) from your Java application

couchbase-logo

When you are developing a new applications with Couchbase 2.0, you sometimes need to create view dynamically from your code. For example you may need this when you are installing your application, writing some test, or you can also use that when you are building frameworks, and wants to dynamically create views to query data. This post shows how to ...

Read More »

MapReduce Algorithms – Order Inversion

apache-hadoop-logo

This post is another segment in the series presenting MapReduce algorithms as found in the Data-Intensive Text Processing with MapReduce book. Previous installments are Local Aggregation, Local Aggregation PartII and Creating a Co-Occurrence Matrix. This time we will discuss the order inversion pattern. The order inversion pattern exploits the sorting phase of MapReduce to push data needed for calculations to ...

Read More »

Calculating A Co-Occurrence Matrix with Hadoop

apache-hadoop-mapreduce-logo

This post continues with our series of implementing the MapReduce algorithms found in the Data-Intensive Text Processing with MapReduce book. This time we will be creating a word co-occurrence matrix from a corpus of text. Previous posts in this series are: Working Through Data-Intensive Text Processing with MapReduce Working Through Data-Intensive Text Processing with MapReduce – Local Aggregation Part II ...

Read More »

MapReduce: Working Through Data-Intensive Text Processing

apache-hadoop-mapreduce-logo

It has been a while since I last posted, as I’ve been busy with some of the classes offered by Coursera. There are some very interesting offerings and is worth a look. Some time ago, I purchased Data-Intensive Processing with MapReduce by Jimmy Lin and Chris Dyer. The book presents several key MapReduce algorithms, but in pseudo code format. My ...

Read More »

Processing 10 million messages with Akka

akka-logo

Akka Actors promise concurrency. What better way to simulate that and see if how much time it takes to process 10 million messages using commodity hardware and software without any low level tunings.I wrote the entire 10 million messages processing in Java and the overall results astonished me. When I ran the program on my iMac machine with an intel ...

Read More »

MapReduce Questions and Answers Part 2

apache-hadoop-mapreduce-logo

4 Inverting Indexing for Text Retrieval The chapter contains a lot of details about integer numbers encoding and compression. Since these topics are not directly about MapReduce, I made no questions about them. 4.4 Inverting Indexing: Revised Implementation Explain inverting index retrieval algorithm. You may assume that each document fits into the memory. Assume also then there is a huge ...

Read More »

MapReduce Questions and Answers Part 1

apache-hadoop-mapreduce-logo

With all the hype and buzz surrounding NoSQL, I decided to have a look at it. I quickly found that there is not one NoSQL I could learn. Rather, there are various different solutions with different purposes and trade offs. Those various solutions tend to have one thing in common: processing of data in NoSQL storage is usually done using ...

Read More »

Want to take your Java skills to the next level?

Grab our programming books for FREE!

Here are some of the eBooks you will get:

  • Spring Interview QnA
  • Multithreading & Concurrency QnA
  • JPA Minibook
  • JVM Troubleshooting Guide
  • Advanced Java
  • Java Interview QnA
  • Java Design Patterns