About Vlad Mihalcea

Vlad Mihalcea is a software architect passionate about software integration, high scalability and concurrency challenges.

MongoDB 2.6 is out

Introduction

MongoDB is evolving rapidly. The 2.2 version introduced the aggregation framework as an alternative to the Map-Reduce query model. Generating aggregated reports is a recurrent requirement for enterprise systems and MongoDB shines in this regard. If you’re new to it you might want to check this aggregation framework introduction or the performance tuning and the data modelling guides.

Let’s reuse the data model I first introduced while demonstrating the blazing fast MongoDB insert capabilities:
 

{
        "_id" : ObjectId("5298a5a03b3f4220588fe57c"),
        "created_on" : ISODate("2012-04-22T01:09:53Z"),
        "value" : 0.1647851116706831
}

MongoDB 2.6 Aggregation enhancements

In the 2.4 version, if I run the following aggregation query:

db.randomData.aggregate( [ 
{ 
	$match: { 
		"created_on" : { 
			$gte : new Date(Date.UTC(2012, 0, 1)), 
			$lte : new Date(Date.UTC(2012, 0, 10)) 
		} 
	} 
},  
{ 
	$group: {
		_id : {
			"minute" : {
				$minute : "$created_on"
			} 
		},  
		"values": { 
			$addToSet: "$value" 
		} 
	} 
}]);

I hit the 16MB aggregation result limitation:

{
	"errmsg" : "exception: aggregation result exceeds maximum document size (16MB)",
	"code" : 16389,
	"ok" : 0
}

MongoDB documents are limited to 16MB, and prior to the 2.6 version, the aggregation result was a BSON document. The 2.6 version replaced it with a cursor instead.

Running the same query on 2.6 yields the following result:

db.randomData.aggregate( [ 
{ 
	$match: { 
		"created_on" : { 
			$gte : new Date(Date.UTC(2012, 0, 1)), 
			$lte : new Date(Date.UTC(2012, 0, 10)) 
		} 
	} 
},  
{ 
	$group: {
		_id : {
			"minute" : {
				$minute : "$created_on"
			} 
		},  
		"values": { 
			$addToSet: "$value" 
		} 
	} 
}])
.objsLeftInBatch();
14

I used the cursor-based objsLeftInBatch method to test the aggregation result type and the 16MB limitation no longer applies to the overall result. The cursor inner results are regular BSON documents, hence they are still limited to 16MB, but this is way more manageable than the previous overall result limit.

The 2.6 version also addresses the aggregation memory restrictions. A full collection scan such as:

db.randomData.aggregate( [   
{ 
	$group: {
		_id : {
			"minute" : {
				$minute : "$created_on"
			} 
		},  
		"values": { 
			$addToSet: "$value" 
		} 
	} 
}])
.objsLeftInBatch();

can end up with the following error:

{
	"errmsg" : "exception: Exceeded memory limit for $group, but didn't allow external sort. Pass allowDiskUse:true to opt in.",
	"code" : 16945,
	"ok" : 0
}

So, we can now perform large sort operations using the allowDiskUse parameter:

db.randomData.aggregate( [   
{ 
	$group: {
		_id : {
			"minute" : {
				$minute : "$created_on"
			} 
		},  
		"values": { 
			$addToSet: "$value" 
		} 
	} 
}]
, 
{ 
	allowDiskUse : true 
})
.objsLeftInBatch();

The 2.6 version allows us to save the aggregation result to a different collection using the newly added $out stage.

db.randomData.aggregate( [ 
{ 
	$match: { 
		"created_on" : { 
			$gte : new Date(Date.UTC(2012, 0, 1)), 
			$lte : new Date(Date.UTC(2012, 0, 10)) 
		} 
	} 
},  
{ 
	$group: {
		_id : {
			"minute" : {
				$minute : "$created_on"
			} 
		},  
		"values": { 
			$addToSet: "$value" 
		} 
	} 
},
{ 
	$out : "randomAggregates" 
}
]);
db.randomAggregates.count();
60

New operators have been added such as let, map, cond, to name a few.

The next example will append AM or PM to the time info of each specific event entry.

var dataSet = db.randomData.aggregate( [ 
{ 
	$match: { 
		"created_on" : { 
			$gte : new Date(Date.UTC(2012, 0, 1)), 
			$lte : new Date(Date.UTC(2012, 0, 2)) 
		} 
	} 
},  
{ 
	$project: { 
		"clock" : { 
			$let: {
				vars: {
					"hour": { 
						$substr: ["$created_on", 11, -1]
					},				
					"am_pm": { $cond: { if: { $lt: [ {$hour : "$created_on" }, 12 ] } , then: 'AM',else: 'PM'} }
				},
				in: { $concat: [ "$$hour", " ", "$$am_pm"] }				
			}			
		}   
	} 
}, 
{
	$limit : 10
}
]);
dataSet.forEach(function(document)  {
	printjson(document);
});

Resulting in:

"clock" : "16:07:14 PM"
"clock" : "22:14:42 PM"
"clock" : "21:46:12 PM"
"clock" : "03:35:00 AM"
"clock" : "04:14:20 AM"
"clock" : "03:41:39 AM"
"clock" : "17:08:35 PM"
"clock" : "18:44:02 PM"
"clock" : "19:36:07 PM"
"clock" : "07:37:55 AM"

Conclusion

MongoDB 2.6 version comes with a lot of other enhancements such as bulk operations or index intersection. MongoDB is constantly evolving, offering a viable alternative for document-based storage. At such a development rate, there’s no wonder it was named 2013 database of the year.

Reference: MongoDB 2.6 is out from our JCG partner Vlad Mihalcea at the Vlad Mihalcea’s Blog blog.
Related Whitepaper:

Professional NoSQL

A hands-on guide to leveraging NoSQL databases!

NoSQL databases are an efficient and powerful tool for storing and manipulating vast quantities of data. Most NoSQL databases scale well as data grows. In addition, they are often malleable and flexible enough to accommodate semi-structured and sparse data sets. This comprehensive hands-on guide presents fundamental concepts and practical solutions for getting you ready to use NoSQL databases. Expert author Shashank Tiwari begins with a helpful introduction on the subject of NoSQL, explains its characteristics and typical uses, and looks at where it fits in the application stack. Unique insights help you choose which NoSQL solutions are best for solving your specific data storage needs.

Get it Now!  

Leave a Reply


+ 2 = eight



Java Code Geeks and all content copyright © 2010-2014, Exelixis Media Ltd | Terms of Use | Privacy Policy
All trademarks and registered trademarks appearing on Java Code Geeks are the property of their respective owners.
Java is a trademark or registered trademark of Oracle Corporation in the United States and other countries.
Java Code Geeks is not connected to Oracle Corporation and is not sponsored by Oracle Corporation.

Sign up for our Newsletter

20,709 insiders are already enjoying weekly updates and complimentary whitepapers! Join them now to gain exclusive access to the latest news in the Java world, as well as insights about Android, Scala, Groovy and other related technologies.

As an extra bonus, by joining you will get our brand new e-books, published by Java Code Geeks and their JCG partners for your reading pleasure! Enter your info and stay on top of things,

  • Fresh trends
  • Cases and examples
  • Research and insights
  • Two complimentary e-books