NoSQL is a set of database technologies built to handle massive amounts of data or specific data structures foreign to relational databases. However, the choice to use a NoSQL database is often based on hype, or a wrong assumption that relational databases cannot perform as well as a NoSQL database. Operational cost is often overlooked by engineers when it comes to selecting a database. At Wix engineering, we’ve found that in most cases we don’t need a NoSQL database, and that MySQL is a great NoSQL database if it’s used appropriately.
When building a scalable system, we found that an important factor is using proven technology so that we know how to recover fast if there’s a failure. For example, you can use the latest and greatest NoSQL database, which works well in theory, but when you have production problems, how long does it take to resume normal activity? Pre-existing knowledge and experience with the system and its workings—as well as being able to Google for answers—is critical for swift mitigation. Relational databases have been around for over 40 years, and there is a vast industry knowledge of how to use and maintain them. This is one reason we usually default to using a MySQL database instead of a NoSQL database, unless NoSQL is a significantly better solution to the problem—for example, if we need a document store, or to handle high data volume that MySQL cannot handle.
However, using MySQL in a large-scale system may have performance challenges. To get great performance from MySQL, we employ a few usage patterns. One of these is avoiding database-level transactions. Transactions require that the database maintains locks, which has an adverse effect on performance.
Instead, we use logical application-level transactions, thus reducing the load and extracting high performance from the database. For example, let’s think about an invoicing schema. If there’s an invoice with multiple line items, instead of writing all the line items in a single transaction, we simply write line by line without any transaction. Once all the lines are written to the database, we write a header record, which has pointers to the line items’ IDs. This way, if something fails while writing the individual lines to the database, and the header record was not written, then the whole transaction fails. A possible downside is that there may be orphan rows in the database. We don’t see it as a significant issue though, as storage is cheap and these rows can be purged later if more space is needed.
Here are some of our other usage patterns to get great performance from MySQL:
Do not have queries with joins; only query by primary key or index.
Do not use sequential primary keys (auto-increment) because they introduce locks. Instead, use client-generated keys, such as GUIDs. Also, when you have master-master replication, auto-increment causes conflicts, so you will have to create key ranges for each instance.
Any field that is not indexed has no right to exist. Instead, we fold such fields into a single text field (JSON is a good choice).
We often use MySQL simply as a key-value store. We store a JSON object in one of the columns, which allows us to extend the schema without making database schema changes. Accessing MySQL by primary key is extremely fast, and we get submillisecond read time by primary key, which is excellent for most use cases. So we found that MySQL is a great NoSQL that’s ACID compliant.
In terms of database size, we found that a single MySQL instance can work perfectly well with hundreds of millions of records. Most of our use cases do not have more than several hundred million records in a single instance.
One big advantage to using relational databases as opposed to NoSQL is that you don’t need to deal with the eventually consistent nature displayed by most NoSQL databases. Our developers all know relational databases very well, and it makes their lives easy.
Don’t get me wrong, there is a place for NoSQL; relational databases have their limits—single host size and strict data structures. Operational cost is often overlooked by engineers in favor of the cool new thing. If the two options are viable, we believe we need to really consider what it takes to maintain it in production and decide accordingly.