Is NoSQL just a controversial buzzword? Could you imagine if the term ‘Object Oriented’ didn’t exist and instead architectures based on concepts such as encapsulation, polymorphism and inheritance were referred to as ‘NoProcedural’? Could you imagine if .net was called ‘NoJava’? Leinster was called ‘NoMunster’?
Well controversial name aside, a good way to appreciate the hype about NoSQL is to consider scalability – the classical non-functional architectural concern. In a classical OLTP architecture, when load increases and your JVM is under pressure, you need to scale. You have two choices:
- vertical scaling – adding more CPU power to your JVM
- horizontal scaling – adding more JVMs (usually one more boxes)
It’s generally never any problem scaling the business tier horizontally. Follow J2EE / JEE specs and unless you’ve done something crazy your business tier will scale. Just add more JVMs and load balance between them. However, while the business tier may be straightforward, the persistence tier ain’t so easy. Let’s say you are using a classical relational database (such as MySQL, SQLServer, DB2 or Oracle) for your persistence, you can’t just add database machines like you can add JVMs. Why not? Imagine trying to do SQL joins when tables are on the same machine and when the tables are on different machines! Imagine trying to do maintain ACID characteristics for your transactions when your database is split across various CPUs? Now think trying to do all that on 5 machines, 50 , 500, 5000 machines? The more machines the harder it gets.
The leading relational databases will scale horizontally. But only by so much. To get around this an architect usually will consider:
- Scaling vertically – putting the database on the best hardware that can be afforded
- Partitioning out legacy data and thus reduce things like the size of index tables. This will boost performance and put less pressure on the need to scale
- Remove the amount of pressure on the database by caching more in the business tier
- Pay a DBA a lot of money!
But what if you just run out of all possible database optimizations options and you have to scale horizontally? Not just to a few machines but to a few hundred if not thousand. This is where NoSQL architectures become relevant.
With a NoSQL database there is no strict schema. Everything is effectively collapsed into one very fat table – a bit like an old school flat file, but where each row stores a huge amount of data. So, instead of having a table for Users and a table for Activities (representing User’s activities), you put all the User information together in one fat row. This means there are no joins across tables. It also means there is a lot of data redundancy which means more storage space required. In addition, more computational power will be needed for writes. But because data that is used data is located at the very same place – within the same row – it means no complex joins and hence it is easier to scale. The computational requirement for reads is also less. So reads can go faster.
Another advantage of NoSQL databases is derived from the freedom that comes with not having to be tied to strict schema. You know that headache where a change to a data model can cause big problems? Well since there is no strict schema with NoSQL – this problem does not exist. This makes the architecture more flexible and more extensible.
Right now, it’s fair to say NoSQL is only relevant in the minority of architectures. But could this be another case of technical innovation driving business innovation as we have seen with smart phones? There wasn’t a need for smart phones but the technical innovation provided business opportunities. I think the same could happen with NoSQL Architectures.
Take a step back from Computer Science and just think Science. Science used to be hypothesis centric, now it is becoming more and more data centric. CERN, genome sequencing, climate change analysis – all involve tonnes and tonnes of data. Surely NoSQL architectures allied with searching technologies such as MapReduce / Hadoop will open up new ways to do Science?
So any disadvantages with NoSQL architectures? Well it’s still an immature technology. Indexing, Security models are just not as sophisticated as they are with classical relational databases. And because most of it is coming from the open source community the support is not as good as it is for relational databases. So don’t throw out your SQL just yet!
Join Talend for this new on-demand webinar to show how data management can benefit your organization.
This on-demand webinar shows how Talend for Big Data greatly simplifies the process of working with Hadoop and NoSQL and makes Big Data integration easy, fast, and affordable.