Apache Cassandra is a high performance database system used in an ever growing number of enterprise companies to whom scalability is of major importance. For instance, Netflix, ebay, Reddit and many more companies are adopting Cassandra to their systems, not to mention that Facebook played a crucial part on making it an open source Top Level Project in the first place.
Cassandra is gaining more and more attention as it consistently outperforms “serious players” in the segment of highly scalable databases like, MongoDB. A large number of performance improvements, like guarantees on atomic prepared statement batches, lightweight transactions and triggers where implemented in Version 2.0. Cassandra is steaming through version 2.1, adding User Defined Functions and indexes on collections.
Looking to further improve on performance, many advanced techniques will be incorporated, like cardinality estimation using HyperLogLog algorith by AddThis. This is an algorithm for summarizing huge data streams by estimating certain values. Cassandra developers were able to use this method for large data file compaction and reduce the memory footprint of CommitLog by 85% percent and consequently improve the overall write performance by 50%.
To get you up and running with Cassandra you can read Running Cassandra in a Multi-node Cluster. You can also check out other articles like Crawling the Web with Cassandra and Nutch that shows how you can use Cassandra as your storage engine behind another application, in this case Nutch, to handle massive amounts of Internet data, and Practical NoSQL experiences with Apache Cassandra to get a close view and insights of the experience of using Cassandra.