Home » Tag Archives: Apache Mahout

Tag Archives: Apache Mahout

Amazon Elastic Map Reduce to compute recommendations with Apache Mahout

Apache Mahout is a “scalable machine learning library” which, among others, contains implementations of various single-node and distributed recommendation algorithms. In my last blog post I described how to implement an on-line recommender system processing data on a single node. What if the data is too large to fit into memory (>100M preference data points)? Then we have no choice, ...

Read More »

Creating an on-line recommender system with Apache Mahout

Recently we’ve been implementing a recommender system for Yap.TV: you can see it in action after installing the app and going to the “Just for you” tab. We’re using Apache Mahout as the base for doing recommendations. Mahout is a “scalable machine learning library” and contains both local and distributed implementations of user- and item- based recommenders using collaborative filtering ...

Read More »

What is Big Data – Theory to Implementation

What is Big Data? You may ask; and more importantly why it is the latest trend in nearly every business domain? Is it just a hype or its here to stay? As a matter of fact “Big Data” is a pretty straightforward term – its just what its says – a very large data-set. How large? The exact answer is ...

Read More »

Mahout and Scalding for poker collusion detection

When I’ve been reading a very bright book on Mahout, Mahout In Action (which is a great hands-in intro to machine learning, as well), one of the examples has caught my attention. Authors of the book where using well-known K-means clusterization algorithm for finding similar players on stackoverflow.com, where the criterion of similarity was the set of the authors of ...

Read More »

Apache Mahout: Build a spam filter server

Something quite interesting has happened with Lucene. It started as a library, then its developers began adding new projects based on it. They developed another open source project that would add crawling features (among others features) to Lucene. Nutch is in fact a full featured web serach engine that anyone can use or modify. Inspired in some famous papers from ...

Read More »

Apache Mahout: Getting started

Recently I have got an interesting problem to solve: how to classify text from different sources using automation? Some time ago I read about a project which does this as well as many other text analysis stuff – Apache Mahout. Though it’s not a very mature one (current version is 0.4), it’s very powerful and scalable. Build on top of ...

Read More »

Want to take your Java skills to the next level?

Grab our programming books for FREE!

Here are some of the eBooks you will get:

  • Spring Interview QnA
  • Multithreading & Concurrency QnA
  • JPA Minibook
  • JVM Troubleshooting Guide
  • Advanced Java
  • Java Interview QnA
  • Java Design Patterns