1. What are NoSql Databases?
NoSql (Not only Sql) databases are non-relational databases that are horizontally scalable, persist semi or unstructured data and have flexible schemas. These databases support multiple data models such as key-value, document, column-family, graph based, in-memory etc. for managing and accessing data. NoSql databases are best suited for building modern applications, which require large data and request volume, highly scalable, low latency, high-performance and flexible data models to provide great customer experience.
In this Article we’ll discuss few NoSql databases, Cassandra, Mongo and Redis databases’ introduction and when to use each of these NoSql databases for achieving better performance.
2. Features of NoSql databases
2.1 Multiple Models Support
NoSql databases are ideal for persisting, managing, and accessing semi-structured and unstructured data.
2.2 Open Source
Most of the NoSql databases are open sourced. Most of the cloud providers are offering these databases as managed service by managing auto scaling, patch updates etc. behind-the-scenes.
NoSql databases are horizontally scalable by adding more servers to the cluster and the clusters can be spread across multiple geographic locations(regions) unlike relational databases where it can be scaled vertically.
2.4 Low Latency
Due to data replication to multiple nodes on database cluster the latency is low. The tradeoff between latency and consistency is important with web and mobile software applications. Regardless of the replication method employed, there will be a tradeoff between consistency and latency.
2.5 Flexible Schema
NoSql databases support flexible data model with eventual consistency and are inherently schema-free. This allows NoSql databases suitable for storing semi-structured and unstructured data efficiently.
2.6 Highly Performant, Available and Fault Tolerant
In NoSql databases, the data is replicated to multiple nodes in the cluster and also to nodes in clusters that are in other regions. This trait makes NoSql databases highly available and fault tolerant. No Sql databases are highly optimized for document, key-value, column -family, graph etc. data models and their access patterns that give higher performance.
Apache Cassandra is an open-source, distributed, horizontally scalable, highly available, fault tolerant and wide column NoSql database. It is written in Java and is a column family store database. All nodes in cassandra cluster are peers and there is no master-slave paradigm in Cassandra. This makes cassandra highly available, fault tolerant and no-single point of failure. Cassandra clusters can be scaled horizontally and can be distributed to multiple data centers.
Writes are very fast in cassandra, since it doesn’t search for something and then write it. The data is first written to the commit log and then the reflection of this data to the table is taken care by cassandra algorithm. In cassandra, model your data model around queries, i.e first determine the application queries and then data model it.
MongoDB is an open-source, cross-platform, document-oriented, highly available, scalable, and flexible NoSql database written in C++. It works on collections and documents and provides high availability through replica sets.
MongoDB uses JSON like documents that can have variety of structures. Since it’s schema less, you don’t need to create document structure before creating documents. MongoDB uses MongoDB QL (Query Language) for accessing the data stored in MongoDB. MongoDB has very powerful aggregate functions and an expressive aggregate framework.
5. Redis (Remote Dictionary Server)
Redis is an open-source and scalable data store, which can be used as a database, cache, and also as a message broker. Its written in ANSI C. Redis is an in-memory data store that can persist its state to disk, which can recover its state even after restart of Redis nodes. It’s in-memory storage makes it super-fast.
6. Cassandra vs MongoDB vs Redis
- Cassandra stores data in Column-Family structure, whereas MongoDB stores data in JSON document format.
- In Cassandra secondary indexes are not recommended as they degrade the performance. In MongoDB, indexes are preferred to avoid searching all document to find the requested document and for better performance.
- Cassandra is a great choice for high write throughput, but if your application needs very high read concurrency, use MongoDB.
- Cassandra has no master node, all the nodes are peers, whereas in MongoDB, there is a single master.
- Cassandra replicates the written data eventually to the number of nodes specified in replication factor within the cluster and also nodes on cluster in different region. MongoDB requires some setup to do the replication. You can setup secondary database, which can be auto-elected if primary database goes down. In MongoDB the reads are first committed to the primary first and then replicated to secondary replicas.
- In these databases, you can set TTL (Time to Live) on each record, so the record can be evicted automatically after the expiration of TTL.
- Redis is a key-value data store and is very efficient to use as a cache for improving application performance.
- Scaling Cassandra and Mongo is much simpler than Redis.
- In Redis, the size of the data store cannot exceed the total memory space on the system, i.e RAM plus swap space. There are no intrinsic limits on the size of a Mongo database.
- Cassandra, MongoDB and Redis databases can be clustered for high availability, backup and for increasing the overall size of the datastore.
- If your application needs aggregation, use MongoDB. If your application needs key-value temporary storage, use Redis. If your application needs easily scalable high write throughput wide column storage, use Cassandra.
7. When to use which NoSql database?
Following are some of the use cases where different NoSql databases fits best and give better performance.
Choose Cassandra for following use cases:
- Linearly scalable, Highly available, fault tolerant
- Multi data center deployment
- Very high write throughput but a smaller number of reads.
- You want to have a quite responsive reporting system on top of stored data
- Real time data analytics
- Your application doesn’t need ACID properties from DB
- Your application needs integration with Hadoop, HBase, Spark
Choose MongoDB for following use cases:
- Scalability on the fly
- Document based storage
- Very high read concurrency
- Caching for real time analytics
- Content management
- Write payload is high, i.e document size is high (up to 16MB)
- Very useful in rapid prototyping
- Good for storing large texts, Videos, images, media files etc.
Choose Redis DB for following use cases:
- To cache larger payload to increase application’s performance
- Persistent cache to disk and needs to be recovered after restart
- Key-value pairs store
- Need very high performance
- Temporary data storage such as user sessions
- Can be used as a messaging queue using its Pub/Sub model
8. Cassandra vs Mongo vs Redis DB – Summary
Understanding of different NoSql databases is critical for choosing the right database for your application needs. Pick the right NoSql database based on your application use cases. NoSql databases are not a good choice if your data has too many relations and need ACID properties. To improve application performance, use Redis as a cache as it has in-memory storage. Use MongoDB for content management and document type of storage needs. Use Cassandra for highly available, wide column storage cases. Choosing the right database has direct impact on the application’s performance.