Apache Cassandra is a high performance database system used in an ever growing number of enterprise companies to whom scalability is of major importance. For instance, Netflix, ebay, Reddit and many more companies are adopting Cassandra to their systems, not to mention that Facebook played a crucial part on making it an open source Top Level Project in the first place.
Cassandra is gaining more and more attention as it consistently outperforms “serious players” in the segment of highly scalable databases like, MongoDB. A large number of performance improvements, like guarantees on atomic prepared statement batches, lightweight transactions and triggers where implemented in Version 2.0. Cassandra is steaming through version 2.1, adding User Defined Functions and indexes on collections.
Looking to further improve on performance, many advanced techniques will be incorporated, like cardinality estimation using HyperLogLog algorith by AddThis. This is an algorithm for summarizing huge data streams by estimating certain values. Cassandra developers were able to use this method for large data file compaction and reduce the memory footprint of CommitLog by 85% percent and consequently improve the overall write performance by 50%.
To get you up and running with Cassandra you can read Running Cassandra in a Multi-node Cluster. You can also check out other articles like Crawling the Web with Cassandra and Nutch that shows how you can use Cassandra as your storage engine behind another application, in this case Nutch, to handle massive amounts of Internet data, and Practical NoSQL experiences with Apache Cassandra to get a close view and insights of the experience of using Cassandra.Related Whitepaper:
A hands-on guide to leveraging NoSQL databases!
NoSQL databases are an efficient and powerful tool for storing and manipulating vast quantities of data. Most NoSQL databases scale well as data grows. In addition, they are often malleable and flexible enough to accommodate semi-structured and sparse data sets. This comprehensive hands-on guide presents fundamental concepts and practical solutions for getting you ready to use NoSQL databases. Expert author Shashank Tiwari begins with a helpful introduction on the subject of NoSQL, explains its characteristics and typical uses, and looks at where it fits in the application stack. Unique insights help you choose which NoSQL solutions are best for solving your specific data storage needs.