A NoSQL (originally referring to “non SQL” or “non relational”) database provides a mechanism for storage and retrieval of data that is modeled in means other than the tabular relations used in relational databases. Such databases have existed since the late 1960s, but did not obtain the “NoSQL” moniker until a surge of popularity in the early twenty-first century, triggered by the needs of Web 2.0 companies. NoSQL databases are increasingly used in big data and real-time web applications. NoSQL systems are also sometimes called “Not only SQL” to emphasize that they may support SQL-like query languages, or sit alongside SQL database in a polyglot persistence architecture.
Motivations for this approach include: simplicity of design, simpler “horizontal” scaling to clusters of machines (which is a problem for relational databases), and finer control over availability. The data structures used by NoSQL databases (e.g. key-value, wide column, graph, or document) are different from those used by default in relational databases, making some operations faster in NoSQL. The particular suitability of a given NoSQL database depends on the problem it must solve. Sometimes the data structures used by NoSQL databases are also viewed as “more flexible” than relational database tables.
Many NoSQL stores compromise consistency (in the sense of the CAP theorem) in favor of availability, partition tolerance, and speed. Barriers to the greater adoption of NoSQL stores include the use of low-level query languages (instead of SQL, for instance the lack of ability to perform ad-hoc joins across tables), lack of standardized interfaces, and huge previous investments in existing relational databases. Most NoSQL stores lack true ACID transactions, although a few databases have made them central to their designs.
Instead, most NoSQL databases offer a concept of “eventual consistency” in which database changes are propagated to all nodes “eventually” (typically within milliseconds) so queries for data might not return updated data immediately or might result in reading data that is not accurate, a problem known as stale reads. Additionally, some NoSQL systems may exhibit lost writes and other forms of data loss. Some NoSQL systems provide concepts such as write-ahead logging to avoid data loss. For distributed transaction processing across multiple databases, data consistency is an even bigger challenge that is difficult for both NoSQL and relational databases. Even current relational databases “do not allow referential integrity constraints to span databases.” There are few systems that maintain both ACID transactions and X/Open XA standards for distributed transaction processing.
If you wish to know the difference between NoSQL and SQL first, check out our NoSQL vs. SQL: Choosing a Data Management Solution.
Types of NoSQL databases
NoSQL Tutorials – Column data stores
A column (data store) of a distributed data store is a NoSQL object of the lowest level in a key-space. It is a tuple (a key–value pair) consisting of three elements:
- Unique name: Used to reference the column
- Value: The content of the column. It can have different types, like
- Timestamp: The system timestamp used to determine the valid content.
A column is used as a store for the value and has a timestamp that is used to differentiate the valid content from stale ones. According to the CAP theorem, distributed data stores cannot guarantee consistency, as availability and partition tolerance are more important issues. Therefore, the data store or the application programmer will use the timestamp to find out which of the stored values in the backup nodes are up-to-date.
Some data stores, like Riak, may use the more sophisticated vector clock instead of the timestamp to resolve stale information.
In relational databases, a column is a part of a relational table that can be seen in each row of the table. This is not the case in distributed data stores, where the concept of a table only vaguely exists. A column can be part of a Column-family that resembles at most a relational row, but it may appear in one row and not in the others. Also, the number of columns may change from row to row, and new updates to the data store model may also modify the column number. So, all the work of keeping up with changes relies on the application programmer.
Some examples of Column NoSQL databases
Cassandra is a Distributed Database Management System that can handle large amounts of data with data replication across multiple data-centres so that there is no single point of failure. It uses
CQL as its query language which has syntax quite similar to its homonym
Hbase is the NoSql database available in the Hadoop Ecosystem. Like rest of the Hadoop Ecosystem Hbase is also open-source and is used when the database capabilities are needed to store a lot of big data on top of HDFS. It is written in Java and is based on Google’s BigTable which means it is distributed in nature and also provides fault-tolerant capabilities.
NoSQL Tutorials – Document data stores
A document-oriented database, or document store, is a computer program designed for storing, retrieving and managing document-oriented information, also known as semi-structured data.
The central concept of a document store is the notion of a “document”. While each document-oriented database implementation differs on the details of this definition, in general, they all assume that documents encapsulate and encode data (or information) in some standard formats or encodings. Encodings in use include XML, YAML, and JSON as well as binary forms like BSON. Documents are addressed in the database via a unique key that represents that document. One of the other defining characteristics of a document-oriented database is that in addition to the key lookup performed by a key-value store, the database also offers an API or query language that retrieves documents based on their contents.
Different implementations offer different ways of organizing and/or grouping documents:
- Non-visible metadata
- Directory hierarchies
Compared to relational databases, for example, collections could be considered analogous to tables and documents analogous to records. But they are different: every record in a table has the same sequence of fields, while documents in a collection may have fields that are completely different.
Some examples of Document NoSQL databases
MongoDB is a cross-platform document-oriented database system and it is free and open source software. Classified as a NoSQL database, MongoDB eschews the traditional table-based relational database structure in favour of JSON-like documents with dynamic schemas (MongoDB calls the format BSON), making the integration of data in certain types of applications easier and faster.
Couchbase is a highly scalable, Document based NoSQL Database. Document based NoSQL databases work on map-like concept of
KEY-VALUE pairs. The key being uniquely identifiable property like a String, path etc and the value being the Document that is to be saved. Another example of Document based NoSQL is MongoDB. In one of our previous examples, we have already demonstrated how we can connect and manage Spring Data with MongoDB.
NoSQL Tutorials – Key-Value data stores
A key-value database, or key-value store, is a data storage paradigm designed for storing, retrieving, and managing associative arrays, a data structure more commonly known today as a dictionary or hash table. Dictionaries contain a collection of objects, or records, which in turn have many different fields within them, each containing data. These records are stored and retrieved using a key that uniquely identifies the record, and is used to quickly find the data within the database.
Key-value databases work in a very different fashion from the better known relational databases (RDB). RDBs pre-define the data structure in the database as a series of tables containing fields with well defined data types. Exposing the data types to the database program allows it to apply a number of optimizations. In contrast, key-value systems treat the data as a single opaque collection, which may have different fields for every record. This offers considerable flexibility and more closely follows modern concepts like object-oriented programming. Because optional values are not represented by placeholders or input parameters, as in most RDBs, key-value databases often use far less memory to store the same database, which can lead to large performance gains in certain workloads.
Key-value stores use the associative array (also known as a map or dictionary) as their fundamental data model. In this model, data is represented as a collection of key-value pairs, such that each possible key appears at most once in the collection. The key-value model is one of the simplest non-trivial data models, and richer data models are often implemented as an extension of it. The key-value model can be extended to a discretely ordered model that maintains keys in lexicographic order. This extension is computationally powerful, in that it can efficiently retrieve selective key ranges.
Some examples of Key–Value NoSQL databases
Amazon DynamoDB is a fully managed proprietary NoSQL database services that is offered by Amazon.com as part of the Amazon Web Services portfolio.
Redis is an open-source, networked, in-memory, key-value data store with optional durability, written in ANSI C. According to the monthly ranking by DB-Engines.com, Redis is the most popular key-value store. Its name means REmote DIctionary Server.
NoSQL Tutorials – Graph data stores
In computing, a graph database (GDB) is a database that uses graph structures for semantic queries with nodes, edges and properties to represent and store data. A key concept of the system is the graph (or edge or relationship), which directly relates data items in the store a collection of nodes of data and edges representing the relationships between the nodes. The relationships allow data in the store to be linked together directly, and in many cases retrieved with one operation. Graph databases hold the relationships between data as a priority. Querying relationships within a graph database is fast because they are perpetually stored within the database itself. Relationships can be intuitively visualized using graph databases, making it useful for heavily inter-connected data.
Graph databases differ from graph compute engines. Graph databases are technologies that are translations of the relational OLTP databases. On the other hand, graph compute engines are utilized in OLAP for bulk analysis. Graph databases have attracted considerable attention in the 2000s, due to the successes of major technology corporations in using proprietary graph databases, and the introduction of open-source graph databases.
This kind of database is designed for data whose relations are well represented as a graph consisting of elements interconnected with a finite number of relations between them. The type of data could be social relations, public transport links, road maps, network topologies, etc.
Some examples of Graph NoSQL databases
Neo4j is an open source, graph based NoSQL database developed in Java and Scala. Like traditional relational Databases, Neo4J offers support to ACID properties. The graph based databases find their uses in use cases where the focus is strongly on inter-relationship between the entities of the domain like match-making,social networks, routing.