MapReduce with MongoDB

MapReduce is a software framework introduced by Google in 2004 to support distributed computing on large data sets on clusters of computers. You can read about MapReduce from here.

MongoDB is an open source document-oriented NoSQL database system written in C++. You can read more about MongoDB from here.

1. Installing MangoDB.

Follow the instructions from the MongoDB official documentation available here. In my case, I followed the instructions for OS X and it worked fine with no issues.
I used sudo port install mongodb to install MongoDB and one issue I faced was regarding to the xcode version I had. Basically I installed xcode while I was in OS X Leopard and didn’t update the xcode to the latest after moving to Lion. Once I updated the xcode, I could install mongodb with MacPort with no issue. Another hint – sometime your xcode installation doesn’t work fine when you directly install it from the App Store – what you could do is, get xcode from the App Store and then go to the Launch Pad, find Install Xcode and install it from there.

2. Running MongoDB

Starting MongoDB is simple..
Just type mogod in the terminal or in your command console.
By default this will start the MongoDB server on 27017 and will use the /data/db/ directory to store data – yes, that is directory that you created in step – 1.
In case you want to change those default settings – you can do it while starting the server.
mongod –port [your_port] –dbpath [your_db_file_path]
You need to make sure that your_db_file_path exists and its empty when you start the server for the first time…

3. Starting MongoDB shell

We can start MongoDB shell – to connect it to our MongoDB server and run commands from there.
To start the MongoDB shell to connect to the MongoDB server running on the same machine with the default ports you only need to type mongo in the command line. If you are running MongoDB server on a different machine with a different port use the following.

mongo [ip_address]:[port]
e.g : mongo localhost:4000

4. Let’s create a Database first.

In the MangoDB shell type the following…

> use library

The above is supposed to create a database called ‘library’.

Now to see whether your database been created, just type the following – which is supposed to list all the databases.

> show dbs;
You will notice that the database that you just created is not listed there. The reason is, MongoDB creates databases on-demand. It will get created only when we add something to it.

5. Inserting data to MongoDB.

Let’s first create two books with the following commands.

> book1 = {name : "Understanding JAVA", pages : 100}
> book2 = {name : "Understanding JSON", pages : 200}

Now, let’s insert these two books in to a collection called books.

> db.books.save(book1)
> db.books.save(book2)

The above two statements will create a collection called books under the database library. Following statement will list out the two books which we just saved.

> db.books.find();

{ "_id" : ObjectId("4f365b1ed6d9d6de7c7ae4b1"), "name" : "Understanding JAVA", "pages" : 100 }
{ "_id" : ObjectId("4f365b28d6d9d6de7c7ae4b2"), "name" : "Understanding JSON", "pages" : 200 }

Let’s add few more records.

> book = {name : "Understanding XML", pages : 300}
> db.books.save(book)
> book = {name : "Understanding Web Services", pages : 400}
> db.books.save(book)
> book = {name : "Understanding Axis2", pages : 150}
> db.books.save(book)

6. Writing the Map function

Let’s process this library collection in a way that, we need to find the number of books having pages less 250 pages and greater than that.

> var map = function() {
var category;
if ( this.pages >= 250 ) 
category = 'Big Books';
else 
category = "Small Books";
emit(category, {name: this.name});
};

Here, the collection produced by the Map function will have a collection of following members.

{"Big Books",[{name: "Understanding XML"}, {name : "Understanding Web Services"}]);
{"Small Books",[{name: "Understanding JAVA"}, {name : "Understanding JSON"},{name: "Understanding Axis2"}]);

7. Writing the Reduce function.

> var reduce = function(key, values) {
var sum = 0;
values.forEach(function(doc) {
sum += 1;
});
return {books: sum};
};

8. Running MapReduce against the books collection.

> var count  = db.books.mapReduce(map, reduce, {out: "book_results"});
> db[count.result].find()

{ "_id" : "Big Books", "value" : { "books" : 2 } }
{ "_id" : "Small Books", "value" : { "books" : 3 } } 

The above says, we have 2 Big Books and 3 Small Books.

Everything done above using the MongoDB shell, can be done with Java too. Following is the Java client for it. You can download the required dependent jar from here.

import com.mongodb.BasicDBObject;
import com.mongodb.DB;
import com.mongodb.DBCollection;
import com.mongodb.DBObject;
import com.mongodb.MapReduceCommand;
import com.mongodb.MapReduceOutput;
import com.mongodb.Mongo;

public class MongoClient {

 /**
  * @param args
  */
 public static void main(String[] args) {

  Mongo mongo;
  
  try {
   mongo = new Mongo("localhost", 27017);
   DB db = mongo.getDB("library");

   DBCollection books = db.getCollection("books");

   BasicDBObject book = new BasicDBObject();
   book.put("name", "Understanding JAVA");
   book.put("pages", 100);
   books.insert(book);
   
   book = new BasicDBObject();  
   book.put("name", "Understanding JSON");
   book.put("pages", 200);
   books.insert(book);
   
   book = new BasicDBObject();
   book.put("name", "Understanding XML");
   book.put("pages", 300);
   books.insert(book);
   
   book = new BasicDBObject();
   book.put("name", "Understanding Web Services");
   book.put("pages", 400);
   books.insert(book);
 
   book = new BasicDBObject();
   book.put("name", "Understanding Axis2");
   book.put("pages", 150);
   books.insert(book);
   
   String map = "function() { "+ 
             "var category; " +  
             "if ( this.pages >= 250 ) "+  
             "category = 'Big Books'; " +
             "else " +
             "category = 'Small Books'; "+  
             "emit(category, {name: this.name});}";
   
   String reduce = "function(key, values) { " +
                            "var sum = 0; " +
                            "values.forEach(function(doc) { " +
                            "sum += 1; "+
                            "}); " +
                            "return {books: sum};} ";
   
   MapReduceCommand cmd = new MapReduceCommand(books, map, reduce,
     null, MapReduceCommand.OutputType.INLINE, null);

   MapReduceOutput out = books.mapReduce(cmd);

   for (DBObject o : out.results()) {
    System.out.println(o.toString());
   }
  } catch (Exception e) {
   // TODO Auto-generated catch block
   e.printStackTrace();
  }
 }
}

Reference: MapReduce with MongoDB from our JCG partner Prabath Siriwardena at the Facile Login blog.

Related Whitepaper:

Functional Programming in Java: Harnessing the Power of Java 8 Lambda Expressions

Get ready to program in a whole new way!

Functional Programming in Java will help you quickly get on top of the new, essential Java 8 language features and the functional style that will change and improve your code. This short, targeted book will help you make the paradigm shift from the old imperative way to a less error-prone, more elegant, and concise coding style that’s also a breeze to parallelize. You’ll explore the syntax and semantics of lambda expressions, method and constructor references, and functional interfaces. You’ll design and write applications better using the new standards in Java 8 and the JDK.

Get it Now!  

One Response to "MapReduce with MongoDB"

  1. varshini says:

    great job…this article is very informatiive for me….i would like to learn more in mongodb….

Leave a Reply


eight × 2 =



Java Code Geeks and all content copyright © 2010-2014, Exelixis Media Ltd | Terms of Use
All trademarks and registered trademarks appearing on Java Code Geeks are the property of their respective owners.
Java is a trademark or registered trademark of Oracle Corporation in the United States and other countries.
Java Code Geeks is not connected to Oracle Corporation and is not sponsored by Oracle Corporation.

Sign up for our Newsletter

15,153 insiders are already enjoying weekly updates and complimentary whitepapers! Join them now to gain exclusive access to the latest news in the Java world, as well as insights about Android, Scala, Groovy and other related technologies.

As an extra bonus, by joining you will get our brand new e-books, published by Java Code Geeks and their JCG partners for your reading pleasure! Enter your info and stay on top of things,

  • Fresh trends
  • Cases and examples
  • Research and insights
  • Two complimentary e-books