While starting out with MongoDB is super easy, there are few things you should keep in mind as you move from a development environment into a production one. No one wants to get paged at 3am because a customer can’t complete an order on your awesome e-commerce site because your database isn’t responding fast enough or worse, is down.
Planning for a production deployment with MongoDB isn’t rocket science, but I must warn you, it’ll cost money, especially if your application actually gets used a lot, which is every developer’s dream. Therefore, like all databases, you need to plan for high availability and you’ll want the maximum performance benefits you can get for your money in a production environment.
First and foremost, Mongo likes memory; that is, frequently accessed data is stored directly in memory; moreover, writes are also stored in memory until being flushed to disk. It’s imperative that you provide enough memory for Mongo to store a valid working dataset; otherwise, Mongo will have to go to the disk to retrieve, what should be, fast lookups via indexed data. This is sloooooow. Therefore, a good rule of thumb is to plan to run your Mongo instances with as much memory as you can afford.
You can get an idea for your working data set by running
Mongostat – this is a handy command line utility that’ll give you a second-by-second view into what Mongo is up to – one particular metric you’ll see is resident memory (labeled as
res) – this will give you a good idea of how much memory Mongo’s using at any given moment. If this number exceeds what you have available on a given machine, then Mongo is having to go to disk, which is going to be a lot slower.
Not all data can be stored in memory; every document in Mongo is eventually written to disk. And like always, I/O is always a slow operation compared to working with memory. This is why, for example, writes in Mongo can be so fast – drivers allow you to, essentially, fire and forget and the actual write to disk is done later, asynchronously. Reads can also incur an I/O penalty when something requested isn’t in working memory.
Thus, for high performance reads and writes, pay attention to the underlying disks. A key metric here is IOPS or input/output operations per second. Mongo will be extremely happy, for example, in an SSD environment, provided you can afford it. Just take a look at various IOPS comparisons between SSDs and traditional spinning disks – super fast RPM disks can achieve IOPS in the 200 range. Typical SSD drives are attaining wild numbers, orders of magnitude higher (like in the 100’s of thousands of IOPS). It’s crazy how fast SSDs are compared to traditional hard drives.
RAM is still faster than SSDs, so you’ll still want to understand your working set of data and ensure you have plenty of memory to contain it.
Finally, for maximum availably, you really should be using Mongo’s replica sets. Setting up a cluster of Mongo instances is so incredibility easy that there really isn’t a good reason not to do it. The benefits of doing so are manifold, including:
- data redundancy
- high availability via automated failover
- disaster recovery
Plus, running a replica set makes maintenance so much easier as you can bring nodes off line and on line w/out an interruption of service. And you can run nodes in a replica set on commodity hardware (don’t forget about my points regarding memory and I/O though).
Accordingly, when looking to move Mongo into a production environment, you need to consider memory, I/O performance, and replica sets. Running a high performant, high availability replica set’ed Mongo, not surprisingly, will cost you. If you’re looking for options for running Mongo in a production environment, I can’t recommend enough the team at MongoHQ.
I’m a huge fan of Mongo. Check out some of the articles, videos, and podcasts that I’ve done, which focus on Mongo, including:
- Java development 2.0: MongoDB: A NoSQL datastore with (all the right) RDBMS moves
- Video demo: An introduction to MongoDB
- Eliot Horowitz on MongoDB
- 10gen’s Steve Francia talks MongoDB
Reference: MongoDB From the Trenches: Prudent Production Planning from our JCG partner Andrew Glover at the The Disco Blog blog.
Join Talend for this new on-demand webinar to show how data management can benefit your organization.
This on-demand webinar shows how Talend for Big Data greatly simplifies the process of working with Hadoop and NoSQL and makes Big Data integration easy, fast, and affordable.