NoSQL – A Quick Guide

Ketan ParmarJune 24th, 2014Last Updated: June 23rd, 2014

1 127 2 minutes read

NoSQL is buzz word nowadays among the developers and software professionals.

1. What is NoSQL ?

NoSQL database, also called Not Only SQL, is an approach to data management and database design that’s useful for very large sets of distributed data.

2. Where to use NoSQL ?

Use NOSQL, When project has unstructured big data that require real-time or offline analysis or web/mobile application. i.e. Social Network app, Analytics app.

3. Advantages and Disadvantages of NoSQL DB

Advantages of NoSQL

Elastic scaling
Big Data
Ecomomics
Flexible data models

Disadvantages of NoSQL

Maturity
Support
Analytics and business intelligence
Administration
Expertise

4. Category of NoSQL

Column
Document
Key-value
Graph

5. How many NoSQL database are available in market ?

More than 110 different (Open Source and Proprietary) NoSQL database available in market.

6. If all NoSQL database fall under above category then what is purpose of having lots of NoSQL databases ?

Every NOSQL database has some special feature & functionality which makes it different, Base on the project requirement one can choose NOSQL database.

7. Can I use multiple NoSQL in my project / application ?

Yes.

8. List of popular NoSQL database with usage

Radis: For rapidly changing data (should fit mostly in memory). i.e. to store real-time stock prices, analytics, leaderboards and communication. And replacement of memcached.

MongoDB: When you need dynamic queries, defined indexes, map/reduce and good performance on a big DB. i.e. for most things that you would do with MySQL but having predefined columns really holds you back.

Cassandra: When you need to store data so huge that it doesn’t fit on server, but still want a friendly familiar interface to it. When you don’t need real-time analysis or other operation. i.e. Web analytics, Transaction logging, Data collection from huge sensor arrays.

Riak: If you need very good single-site scalability, availability and fault-tolerance, but you’re ready to pay for multi-site replication. i.e. Point-of-sales data collection. Factory control systems. Places where even seconds of downtime hurt. Could be used as a well-update-able web server.

CouchDB: For accumulating, occasionally changing data, on which pre-defined queries are to be run. Places where versioning is important. i.e. CRM, CMS systems. Master-master replication is an especially interesting feature, allowing easy multi-site deployments.

HBase: Hadoop is probably still the best way to run Map/Reduce jobs on huge datasets. Best if you use the Hadoop/HDFS stack already. ie. Search engines. Analysing log data. Any place where scanning huge, two-dimensional join-less tables are a requirement.

Accumulo: If you need to restict access on the cell level. i.e. Same as HBase, since it’s basically a replacement: Search engines.

Hypertable: If you need a better HBase. i.e/ Same as HBase, since it’s basically a replacement: Search engines.

Neo4j: For graph-style, rich or complex, interconnected data. Neo4j is quite different from the others in this sense. i.e. For searching routes in social relations, public transport links, road maps, or network topologies.

ElasticSearch: When you have objects with (flexible) fields, and you need “advanced search” functionality. i.e. A dating service that handles age difference, geographic location, tastes and dislikes, etc. Or a leaderboard system that depends on many variables. you can replace your Solr with ElasticSearch.

Couchbase: Any application where low-latency data access, high concurrency support and high availability is a requirement. i.e. Low-latency use-cases like ad targeting or highly-concurrent web apps like online gaming (e.g. Zynga).

Reference:

NoSQL – A Quick Guide from our JCG partner Ketan Parmar at the KP Bird blog.