MongoDB From the Trenches: Prudent Production Planning

While starting out with MongoDB is super easy, there are few things you should keep in mind as you move from a development environment into a production one. No one wants to get paged at 3am because a customer can’t complete an order on your awesome e-commerce site because your database isn’t responding fast enough or worse, is down.

Planning for a production deployment with MongoDB isn’t rocket science, but I must warn you, it’ll cost money, especially if your application actually gets used a lot, which is every developer’s dream. Therefore, like all databases, you need to plan for high availability and you’ll want the maximum performance benefits you can get for your money in a production environment.

First and foremost, Mongo likes memory; that is, frequently accessed data is stored directly in memory; moreover, writes are also stored in memory until being flushed to disk. It’s imperative that you provide enough memory for Mongo to store a valid working dataset; otherwise, Mongo will have to go to the disk to retrieve, what should be, fast lookups via indexed data. This is sloooooow. Therefore, a good rule of thumb is to plan to run your Mongo instances with as much memory as you can afford.

You can get an idea for your working data set by running Mongostat – this is a handy command line utility that’ll give you a second-by-second view into what Mongo is up to – one particular metric you’ll see is resident memory (labeled as res) – this will give you a good idea of how much memory Mongo’s using at any given moment. If this number exceeds what you have available on a given machine, then Mongo is having to go to disk, which is going to be a lot slower.

Not all data can be stored in memory; every document in Mongo is eventually written to disk. And like always, I/O is always a slow operation compared to working with memory. This is why, for example, writes in Mongo can be so fast – drivers allow you to, essentially, fire and forget and the actual write to disk is done later, asynchronously. Reads can also incur an I/O penalty when something requested isn’t in working memory.

Thus, for high performance reads and writes, pay attention to the underlying disks. A key metric here is IOPS or input/output operations per second. Mongo will be extremely happy, for example, in an SSD environment, provided you can afford it. Just take a look at various IOPS comparisons between SSDs and traditional spinning disks – super fast RPM disks can achieve IOPS in the 200 range. Typical SSD drives are attaining wild numbers, orders of magnitude higher (like in the 100’s of thousands of IOPS). It’s crazy how fast SSDs are compared to traditional hard drives.

RAM is still faster than SSDs, so you’ll still want to understand your working set of data and ensure you have plenty of memory to contain it.

Finally, for maximum availably, you really should be using Mongo’s replica sets. Setting up a cluster of Mongo instances is so incredibility easy that there really isn’t a good reason not to do it. The benefits of doing so are manifold, including:

  • data redundancy
  • high availability via automated failover
  • disaster recovery

Plus, running a replica set makes maintenance so much easier as you can bring nodes off line and on line w/out an interruption of service. And you can run nodes in a replica set on commodity hardware (don’t forget about my points regarding memory and I/O though).

Accordingly, when looking to move Mongo into a production environment, you need to consider memory, I/O performance, and replica sets. Running a high performant, high availability replica set’ed Mongo, not surprisingly, will cost you. If you’re looking for options for running Mongo in a production environment, I can’t recommend enough the team at MongoHQ.

I’m a huge fan of Mongo. Check out some of the articles, videos, and podcasts that I’ve done, which focus on Mongo, including:

Reference: MongoDB From the Trenches: Prudent Production Planning from our JCG partner Andrew Glover at the The Disco Blog blog.

Related Whitepaper:

Professional NoSQL

A hands-on guide to leveraging NoSQL databases!

NoSQL databases are an efficient and powerful tool for storing and manipulating vast quantities of data. Most NoSQL databases scale well as data grows. In addition, they are often malleable and flexible enough to accommodate semi-structured and sparse data sets. This comprehensive hands-on guide presents fundamental concepts and practical solutions for getting you ready to use NoSQL databases. Expert author Shashank Tiwari begins with a helpful introduction on the subject of NoSQL, explains its characteristics and typical uses, and looks at where it fits in the application stack. Unique insights help you choose which NoSQL solutions are best for solving your specific data storage needs.

Get it Now!  

Leave a Reply

× 4 = eight

Java Code Geeks and all content copyright © 2010-2014, Exelixis Media Ltd | Terms of Use | Privacy Policy
All trademarks and registered trademarks appearing on Java Code Geeks are the property of their respective owners.
Java is a trademark or registered trademark of Oracle Corporation in the United States and other countries.
Java Code Geeks is not connected to Oracle Corporation and is not sponsored by Oracle Corporation.

Sign up for our Newsletter

20,709 insiders are already enjoying weekly updates and complimentary whitepapers! Join them now to gain exclusive access to the latest news in the Java world, as well as insights about Android, Scala, Groovy and other related technologies.

As an extra bonus, by joining you will get our brand new e-books, published by Java Code Geeks and their JCG partners for your reading pleasure! Enter your info and stay on top of things,

  • Fresh trends
  • Cases and examples
  • Research and insights
  • Two complimentary e-books