Apache Mahout is a “scalable machine learning library” which, among others, contains implementations of various single-node and distributed recommendation algorithms. In my last blog post I described how to implement an on-line recommender system processing data on a single node. What if the data is too large to fit into memory (>100M preference data points)? Then we have no choice, ...
Read More »Home »
Creating an on-line recommender system with Apache Mahout
Recently we’ve been implementing a recommender system for Yap.TV: you can see it in action after installing the app and going to the “Just for you” tab. We’re using Apache Mahout as the base for doing recommendations. Mahout is a “scalable machine learning library” and contains both local and distributed implementations of user- and item- based recommenders using collaborative filtering ...
Read More »What is Big Data – Theory to Implementation
What is Big Data? You may ask; and more importantly why it is the latest trend in nearly every business domain? Is it just a hype or its here to stay? As a matter of fact “Big Data” is a pretty straightforward term – its just what its says – a very large data-set. How large? The exact answer is ...
Read More »Mahout and Scalding for poker collusion detection
When I’ve been reading a very bright book on Mahout, Mahout In Action (which is a great hands-in intro to machine learning, as well), one of the examples has caught my attention. Authors of the book where using well-known K-means clusterization algorithm for finding similar players on stackoverflow.com, where the criterion of similarity was the set of the authors of ...
Read More »Apache Mahout: Build a spam filter server
Something quite interesting has happened with Lucene. It started as a library, then its developers began adding new projects based on it. They developed another open source project that would add crawling features (among others features) to Lucene. Nutch is in fact a full featured web serach engine that anyone can use or modify. Inspired in some famous papers from ...
Read More »Apache Mahout: Getting started
Recently I have got an interesting problem to solve: how to classify text from different sources using automation? Some time ago I read about a project which does this as well as many other text analysis stuff – Apache Mahout. Though it’s not a very mature one (current version is 0.4), it’s very powerful and scalable. Build on top of ...
Read More »