About Vlad Mihalcea

Vlad Mihalcea is a software architect passionate about software integration, high scalability and concurrency challenges.

MongoDB Facts: Lightning speed aggregation

In my previous post, I demonstrated how fast you can insert 50 millions time-event entries with MongoDB. This time we will make use of all that data to fuel our aggregation tests.

This is how one time-event entry looks like:
 
 
 
 
 
 

{
        "_id" : ObjectId("529a2a988cccdb538932d31f"),
        "created_on" : ISODate("2012-05-02T06:08:47.835Z"),
        "value" : 0.9270193106494844
}

Beside the default primary key “_id” index, we also created one for the “created_on” field, so these are all our indexes:

[
        {
                "v" : 1,
                "key" : {
                        "_id" : 1
                },
                "ns" : "random.randomData",
                "name" : "_id_"
        },
        {
                "v" : 1,
                "key" : {
                        "created_on" : 1
                },
                "ns" : "random.randomData",
                "name" : "created_on_1"
        }
]

Now let’s use all 50 million entries to build a daily report, counting how many events were generated in a day, including the minimum, the maximum and the average value for that particular day.

This is how our script looks like:

var start = new Date();
var dataSet = db.randomData.aggregate([
	{
		$group: {
				"_id": { 
					"year" : {
						$year : "$created_on"
					}, 
					"dayOfYear" : {
						$dayOfYear : "$created_on"
					}
				}, 
				"count": { 
					$sum: 1 
				}, 
				"avg": { 
					$avg: "$value" 
				}, 
				"min": { 
					$min: "$value" 
				}, 
				"max": { 
					$max: "$value" 
				}		
			}
	},
	{
		$sort: {
			"_id.year" : 1, 
			"_id.dayOfYear" : 1
		} 	
	}
]);
if(dataSet.result != null && dataSet.result.length > 0) {
	print("Aggregated:" + dataSet.result.length + " days.");	
	db.dailyReport.insert(dataSet.result);
}
var end = new Date();
print("Aggregation took:" + (end.getTime() - start.getTime())/1000 + "s");

After aggregating all data, the results are saved to a new dailyReport collection. Let’s run the script and see what we get:

D:\wrk\vladmihalcea\mongodb-facts\aggregator\timeseries>mongo random aggregate_daily_report.js
MongoDB shell version: 2.4.6
connecting to: random
Aggregated:367 days.
Aggregation took:129.052s

So, in 129 seconds we managed to build our report for all this data. Let’s check the new collection and see our daily reports.

{
        "_id" : {
                "year" : 2012,
                "dayOfYear" : 1
        },
        "count" : 137244,
        "avg" : 0.5009360724400802,
        "min" : 0.0000013632234185934067,
        "max" : 0.9999953350052238
}
{
        "_id" : {
                "year" : 2012,
                "dayOfYear" : 2
        },
        "count" : 136224,
        "avg" : 0.49982110975583033,
        "min" : 0.0000023238826543092728,
        "max" : 0.9999841095414013
}

Since we generated our time-event associated values using Math.random(), the average, minimum and maximum values are what we were expecting anyway. What is really interesting is how fast MongoDB managed to mass all this data at a rate of 387440 documents per second.

Being exited about this result, let’s now check how fast we can randomly select a one-hour report. We first match one-hour span of entries, then we group and sort, to finally display the results to the Mongo shell.

var minDate = new Date(2012, 0, 1, 0, 0, 0, 0);
var maxDate = new Date(2013, 0, 1, 0, 0, 0, 0);
var delta = maxDate.getTime() - minDate.getTime();
var fromDate = new Date(minDate.getTime() + Math.random() * delta);
fromDate.setHours(0, 0, 0, 0);
var toDate = new Date(fromDate.getTime() + 60 * 60 * 1000);

print("Aggregating from " + fromDate + " to " + toDate);

var start = new Date();

var dataSet = db.randomData.aggregate([
	{
		$match: {
			"created_on" : {
				$gte: fromDate, 
				$lt : toDate	
			}
		}
	},
	{
		$group: {
				"_id": { 
					"year" : {
						$year : "$created_on"
					}, 
					"dayOfYear" : {
						$dayOfYear : "$created_on"
					},
					"hour" : {
						$hour : "$created_on"
					}
				}, 
				"count": { 
					$sum: 1 
				}, 
				"avg": { 
					$avg: "$value" 
				}, 
				"min": { 
					$min: "$value" 
				}, 
				"max": { 
					$max: "$value" 
				}		
			}
	},
	{
		$sort: {
			"_id.year" : 1, 
			"_id.dayOfYear" : 1,
			"_id.hour" : 1
		} 	
	}
]);
if(dataSet.result != null && dataSet.result.length > 0) {
	dataSet.result.forEach(function(document)  {
		printjson(document);
	});
}
var end = new Date();
print("Aggregation took:" + (end.getTime() - start.getTime())/1000 + "s");

Running this script we get the following result:

D:\wrk\vladmihalcea\mongodb-facts\aggregator\timeseries>mongo random aggregate_hour_report.js
MongoDB shell version: 2.4.6
connecting to: random
Aggregating from Mon Jul 16 2012 00:00:00 GMT+0300 (GTB Daylight Time) to Mon Jul 16 2012 01:00:00 GMT+0300 (GTB Daylight Time)
{
        "_id" : {
                "year" : 2012,
                "dayOfYear" : 197,
                "hour" : 21
        },
        "count" : 5808,
        "avg" : 0.5015344015735451,
        "min" : 0.00005716201849281788,
        "max" : 0.9998941225931048
}
Aggregation took:0.209s

This is so fast that I don’t even have to pre-calculate the hour-based reports, meaning I could easily generate it on demand, at run-time.

MongoDB aggregation framework is extremely useful and its performances can’t go unnoticed. What I showed you were only simple examples, that didn’t require any extra optimization, aiming to demonstrate the out-of-the-box performance of MongoDB.

 

Do you want to know how to develop your skillset to become a Java Rockstar?

Subscribe to our newsletter to start Rocking right now!

To get you started we give you two of our best selling eBooks for FREE!

JPA Mini Book

Learn how to leverage the power of JPA in order to create robust and flexible Java applications. With this Mini Book, you will get introduced to JPA and smoothly transition to more advanced concepts.

JVM Troubleshooting Guide

The Java virtual machine is really the foundation of any Java EE platform. Learn how to master it with this advanced guide!

Given email address is already subscribed, thank you!
Oops. Something went wrong. Please try again later.
Please provide a valid email address.
Thank you, your sign-up request was successful! Please check your e-mail inbox.
Please complete the CAPTCHA.
Please fill in the required fields.

Leave a Reply


− two = 2



Java Code Geeks and all content copyright © 2010-2014, Exelixis Media Ltd | Terms of Use | Privacy Policy | Contact
All trademarks and registered trademarks appearing on Java Code Geeks are the property of their respective owners.
Java is a trademark or registered trademark of Oracle Corporation in the United States and other countries.
Java Code Geeks is not connected to Oracle Corporation and is not sponsored by Oracle Corporation.
Do you want to know how to develop your skillset and become a ...
Java Rockstar?

Subscribe to our newsletter to start Rocking right now!

To get you started we give you two of our best selling eBooks for FREE!

Get ready to Rock!
You can download the complementary eBooks using the links below:
Close