About Vlad Mihalcea

Vlad Mihalcea is a software architect passionate about software integration, high scalability and concurrency challenges.

NoSQL is not just about BigData

There is so much debate on the SQL vs NoSQL subject, and probably this is our natural way of understanding and learning what’s the best way of storing data. After publishing the small experiment on MongoDB aggregating framework, I was challenged by the JOOQ team to match my results against Oracle. Matching MongoDB and Oracle is simply honoring Mongo, as Oracle is probably the best SQL DB engine. Being a simple experiment, it’s dangerous to draw any conclusion, since I was only testing the out-of-the-box Mongo performance, without taking advantage of any optimization. I will leave the optimization part as a subject of a future post, and dedicate this post to why NoSQL is a great tool in the Architect’s toolbox.

I really like how JOOQ defends SQL, and I’m constantly learning a lot from their blog and JOOQ documentation. SQL is one of the best way of modelling data, and most of my projects requirements call for a relational data model. I would always recommend Oracle vs any other free RDBMS, since it’s safer to go with the best available tool, but not all of our clients want to spend their money on Oracle licenses, and therefore we have to build their products on top of MySQL or PostgreSQL (not a big issue so far).

But we’ve been successfully using NoSQL even for SmallData. I believe in polyglot persistence since it’s practical and cost-effective. So here are some use-cases where MongoDB was the right tool for the right job:

1. Web resource monitoring

One of our projects requires processing media resources from various providers, and since we have to download them, we want to know which of those resource providers are slowing down our workflows. Therefore we thought of recording timing events in a capped collection of only 100000 documents, and since it has a fixed size we don’t have to worry of running out of disk-space or implementing a deletion mechanism. The timing events are processed asynchronously and thanks to the MongoDB aggregation framework we can calculate the latest average host time response and export the results via JMX to our centralized management application. It was so easy to design and implement it, and it works like a charm.

2. Persistent cache

Some media resources go through a complex processing pipeline and we can always benefit from caching results for previously computed data. From out Java objects to the MongoDB documents there is no ORM involved, and this simplifies the design/implementation of this caching solution.

3. Time series graphs

This is one of my favorite example. In this rather simple project we had to display some time series in a very user-friendly UI graph, and we rapidly implemented everything in JavaScript. The MongoDB stored the time events and some asynchronous batch processing tasks were pre-calculating the time series. The middle-ware was implemented on top of Node.js and the client-server communication was using WebSockets on top of socket.io. From the DB to the UI graph there was no transformation required, since everything was JSON based.

Now, quoting the aforementioned JOOQ post:

50M is not Big Data

CERN has Big Data. Google does. Facebook does. You don’t. 50M is not “Big Data”. It’s just your average database table.

CERN has indeed BigData and big funds as well, being backed by the European Community, so they can afford any solution they want to choose from. Google has BigData and it inspired Hadoop which makes possible products like Hunk from which may benefit even small companies with huge logs (to be analyzed). Facebook does have BigData and it still uses MySQL, which is a solid proof that SQL can scale.

It’s up to you to decide if your product data fits into a relation data model or a document/graph/wide-column store paradigm.

Reference: NoSQL is not just about BigData from our JCG partner Vlad Mihalcea at the Vlad Mihalcea’s Blog blog.
Related Whitepaper:

Big Data Basics

An Introduction to Big Data and How It Is Changing Business

Amazingly, 90% of the data in the world today has been created only in the last two years. With the increase of mobile devices, social media networks, and the sharing of digital photos and videos, we are continuing to grow the world's data at an astounding pace. However, big data is more than just the data itself. It is a combination of factors that require a new way of collecting, analyzing, visualizing, and sharing data. These factors are forcing software companies to re-think the ways that they manage and offer their data, from new insights to completely new revenue streams.

Get it Now!  

Leave a Reply

− two = 4

Java Code Geeks and all content copyright © 2010-2014, Exelixis Media Ltd | Terms of Use | Privacy Policy
All trademarks and registered trademarks appearing on Java Code Geeks are the property of their respective owners.
Java is a trademark or registered trademark of Oracle Corporation in the United States and other countries.
Java Code Geeks is not connected to Oracle Corporation and is not sponsored by Oracle Corporation.

Sign up for our Newsletter

20,709 insiders are already enjoying weekly updates and complimentary whitepapers! Join them now to gain exclusive access to the latest news in the Java world, as well as insights about Android, Scala, Groovy and other related technologies.

As an extra bonus, by joining you will get our brand new e-books, published by Java Code Geeks and their JCG partners for your reading pleasure! Enter your info and stay on top of things,

  • Fresh trends
  • Cases and examples
  • Research and insights
  • Two complimentary e-books