Home » Java » Enterprise Java » Creating an on-line recommender system with Apache Mahout

About Adam Warski

Adam Warski
Adam is one of the co-founders of SoftwareMill, a company specialising in delivering customised software solutions. He is also involved in open-source projects, as a founder, lead developer or contributor to: Hibernate Envers, a Hibernate core module, which provides entity versioning/auditing capabilities; ElasticMQ, an SQS-compatible messaging server written in Scala; Veripacks, a tool to specify and verify inter-package dependencies, and others.

Creating an on-line recommender system with Apache Mahout

Recently we’ve been implementing a recommender system for Yap.TV: you can see it in action after installing the app and going to the “Just for you” tab. We’re using Apache Mahout as the base for doing recommendations. Mahout is a “scalable machine learning library” and contains both local and distributed implementations of user- and item- based recommenders using collaborative filtering algorithms.

screen568x568

For now we’ll focus on the local, single-machine implementation. It should work well if you have up to 10s of millions of preference values. Above that, you should probably consider the Hadoop-based implementation, as the data simply won’t fit into memory.

Writing a basic recommender with Mahout is quite simple; As Mahout is very configurable, usually there are different implementations to choose from; I’ll just describe what I think are “good starting points”.

Basics

First you need a file with the input data. The format is quite simple: either comma-separated (user id, item id) pairs or (user id, item id, preference value) triples. This expresses what you already know: what users like which items, and optionally how much (e.g. on a 1-5 scale). The ids must be integers, the preference value is treated as a float.

Let’s first create a user-based recommender: that is a recommender, which when asked for recommendations for user A, first looks up “similar” users to A, and then tries to find best items, which these similar users have rated, but A hasn’t. To do that, we need to create 4 components:

  • data model: this will use the file
  • user similarity: a measure which given two users, will return a number representing how similar they are
  • neighborhood: for finding the neighborhood of a given user
  • recommender: which takes these pieces together to produce recommendations

For unary input data (where users either like items or we don’t know), a good starting point is:

val dataModel = new FileDataModel(file)
val userSimilarity = new LogLikelihoodSimilarity(dataModel)
val neighborhood = new NearestNUserNeighborhood(25, userSimilarity, dataModel)
val recommender = new GenericBooleanPrefUserBasedRecommender(dataModel, neighborhood, userSimilarity)

If we have preference values (triples in the input data):

val dataModel = new FileDataModel(file)
val userSimilarity = new PearsonCorrelationSimilarity(dataModel)
val neighborhood = new NearestNUserNeighborhood(25, userSimilarity, dataModel)
val recommender = new GenericUserBasedRecommender(dataModel, neighborhood, userSimilarity)

Now we are ready to get some recommendations; this is as simple as:

// Gets 10 recommendations
val result = recommender.recommend(userId, 10)

// We get back a list of item-estimated preference value, 
// sorted from the highest score
result.foreach(r => println(r.getItemID() + ": " + r.getValue()))

On-line

What about the on-line aspect? The above will work great for existing users; what about new users which register in the service? For sure we want to provide some reasonable recommendations for them as well. Creating a recommender instance is expensive (for sure takes longer than a “normal” network request), so we can’t just create a new recommender each time.

Luckily Mahout has a possibility of adding temporary users to a data model. The general setup then is:

  • periodically re-create the whole recommender using current data (e.g. each day or each hour – depending on how long it takes)
  • when doing a recommendation, check if the user exists in the system
  • if yes, do the recommendation as always
  • if no, create a temporary user, fill in the preferences, and do the recommendation

The first part (periodically re-creating the recommender) may be actually quite tricky if you are limited on memory: when creating the new recommender, you need to hold two copies of the data in memory (to still be able to server requests from the old one). But as that doesn’t really have anything to do with recommendations, I won’t go into details here.

As for the temporary users, we can wrap our data model with a PlusAnonymousConcurrentUserDataModel instance. This class allows to obtain a temporary user id; the id must be later released so that it can be re-used (there’s a limited number of such ids). After obtaining the id, we have to fill in the preferences, and then we can proceed with the recommendation as always:

val dataModel = new PlusAnonymousConcurrentUserDataModel(
    new FileDataModel(file),
    100)

val recommender: org.apache.mahout.cf.taste.recommender.Recommender = ...

// we are assuming a unary model: we only know which items a user likes
def recommendFor(userId: Long, userPreferences: List[Long]) = {
  if (userExistsInDataModel(userId)) {
    recommendForExistingUser(userId)
  } else {
    recommendForNewUser(userPreferences)
  }
}

def recommendForNewUser(userPreferences: List[Long]) = {
  val tempUserId = dataModel.takeAvailableUser()

  try {
    // filling in a Mahout data structure with the user's preferences
    val tempPrefs = new BooleanUserPreferenceArray(userPreferences.size)
    tempPrefs.setUserID(0, tempUserId)
    userPreferences.zipWithIndex.foreach { case (preference, idx) => 
      tempPrefs.setItemID(idx, preference) 
    }
    dataModel.setTempPrefs(tempPrefs, tempUserId)

    recommendForExistingUser(tempUserId)
  } finally {
    dataModel.releaseUser(tempUserId)
  }
}

def recommendForExistingUser(userId: Long) = {
  recommender.recommend(userId, 10)
}

Incorporating business logic

It often happens that we want to boost the score of selected items because of some business rules. In our use-case, for example if a show has a new episode, we want to give it a higher score. That’s possible using the IDRescorer interface for Mahout. An instance of a rescorer is provided when invoking Recommender.recommend. For example:

val rescorer = new IDRescorer {
  def rescore(id: Long, originalScore: Double) = {
    if (showIsNew(id)) {
      originalScore * 1.2 
    } else {
      originalScore
    }
  }

  def isFiltered(id: Long) = false
}

// Gets 10 recommendations
val result = recommender.recommend(userId, 10, rescorer)

Summary

Mahout is a great basis for creating recommenders. It’s very configurable and provides many extension points. There’s still quite a lof of work in picking the right configuration parameter values, setting up rescoring and evaluating the recommendation results, but the algorithms are solid, so there’s one thing less to worry about.

There’s also a very good book, Mahout in Action, which covers recommender systems and other components of Mahout. It’s based on version 0.5 (current one is 0.8), but the code examples mostly work and the main logic of the project is the same.
 

Do you want to know how to develop your skillset to become a Java Rockstar?
Subscribe to our newsletter to start Rocking right now!
To get you started we give you our best selling eBooks for FREE!
1. JPA Mini Book
2. JVM Troubleshooting Guide
3. JUnit Tutorial for Unit Testing
4. Java Annotations Tutorial
5. Java Interview Questions
6. Spring Interview Questions
7. Android UI Design
and many more ....
Email address:

Leave a Reply

Be the First to Comment!

avatar
  Subscribe  
Notify of
Want to take your Java skills to the next level?
Grab our programming books for FREE!
Here are some of the eBooks you will get:
  • Spring Interview QnA
  • Multithreading & Concurrency QnA
  • JPA Minibook
  • JVM Troubleshooting Guide
  • Advanced Java
  • Java Interview QnA
  • Java Design Patterns
The Spring Framework Cookbook
  • Learn the best practices of the Spring Framework
  • Build simple, portable, fast and flexible JVM-based systems and applications
  • Explore specific projects like Boot and Batch
The Spring Data Programming Cookbook
  • Learn how to use data access technologies and cloud-based data services
  • Set up the environment and create a basic project
  • Learn how to handle the various modules (e.g. JPA, MongoDB, Redis etc.)
The Selenium Programming Cookbook
  • Kick-start your own projects using this testing framework for web applications
  • Learn JUnit integration and Standalone Server functionality
  • Find out the most popular Interview Questions about the Selenium Framework
The Mockito Programming Cookbook
  • Kick-start your own web projects using this open source testing framework
  • Write simple test cases using the Mockito Framework
  • Learn how to integrate with JUnit, Maven and other frameworks
The JUnit Programming Cookbook
  • Learn basic usage and configuration of JUnit
  • Create multithreaded tests
  • Learn how to integrate with other testing frameworks
The JSF 2.0 Programming Cookbook
  • Build component-based user interfaces for web applications
  • Set up the environment and create a basic project
  • Learn Internationalization and Facelets Templates
The Amazon S3 Tutorial
  • Develop your own Amazon S3 based applications
  • Learn API usage and pricing
  • Get your own projects up and running in minimum time
Java Design Patterns
  • Learn how Design Patterns are implemented and utilized in Java
  • Understand the reasons why patterns are so important
  • Learn when and how to apply each one of them
Java Concurrency Essentials
  • Dive into the magic of concurrency
  • Learn about concepts like atomicity, synchronization and thread safety
  • Learn about testing concurrent applications
The IntelliJ IDEA Handbook
  • Kick-start your own programming projects using IntelliJ IDEA
  • Learn how to setup and install plugins
  • Create UIs with this Java integrated development environment
The Git Tutorial
  • Learn why Git differs from other version control systems
  • Explore Git's usage and best practises
  • Learn branching strategies
The Eclipse IDE Handbook
  • Explore the most widely used Java IDE
  • Learn how to setup and install plugins
  • Built your own projects up and running in minimum time
The Docker Containerization Cookbook
  • Explore the world’s leading software containerization platform
  • Learn how to wrap a piece of software in a complete filesystem
  • Learn how to use DNS and various commands
Developing Modern Applications With Scala
  • Develop modern Scala applications
  • Build SBT and reactive applications
  • Learn about testing and database access
The Apache Tomcat Cookbook
  • Explore Apache Tomcat open-source web server
  • Learn about installation, configuration, logging and clustering
  • Kick-start your own web projects using Apache Tomcat
The Apache Maven Cookbook
  • Explore the Apache Maven build automation tool
  • Learn about Maven's project structure and configuration
  • Learn about Maven's dependency management and plug-ins
The Apache Hadoop Cookbook
  • Explore the Apache Hadoop open-source software framework
  • Learn distributed caching and streaming
  • Kick-start your own web projects using Apache Hadoop
The Android Programming Cookbook
  • Explore the Android mobile operating system
  • Learn about services and page views
  • Learn about Google Maps and Bluetooth functionality
The Elasticsearch Tutorial
  • Explore the Elasticsearch search engine
  • Develop your own Elasticsearch based applications
  • Learn operations, Java API Integration and reporting
Amazon DynamoDB Tutorial
  • Develop your own Amazon DynamoDB based applications
  • Learn DynamoDB Concepts and Best Practices
  • Get your own projects up and running in minimum time
Java NIO Programming Cookbook
  • Learn features for intensive I/O operations
  • Follow a series of tutorials on Java NIO examples
  • Get knowledge on Java Nio Socket and Asynchronous Channels
JBoss Drools Cookbook
  • Explore Drools business rule management system
  • Follow a series of tutorials on Drools examples
  • Get knowledge on business rules for a shopping domain model
Vaadin Programming Cookbook
  • Explore Vaadin web framework for rich Internet applications
  • Learn the Architecture and Best Practices
  • Get knowledge on Data Binding and Custom Components
Groovy Programming Cookbook
  • Explore Apache Groovy object-oriented programming language
  • Create sample applications and explore interview questions
  • Get knowledge on Callback functionality and various widgets
GWT Programming Cookbook
  • Explore the open source Google Web Toolkit
  • Create sample applications and explore interview questions
  • Create and maintain complex JavaScript front-end applications in Java
Do you want to know how to develop your skillset to become a Java Rockstar?
Subscribe to our newsletter to start Rocking right now!
To get you started we give you our best selling eBooks for FREE!
1. JPA Mini Book
2. JVM Troubleshooting Guide
3. JUnit Tutorial for Unit Testing
4. Java Annotations Tutorial
5. Java Interview Questions
and many more ....
Email address:
Do you want to know how to develop your skillset to become a Java Rockstar?
Subscribe to our newsletter to start Rocking right now!
To get you started we give you our best selling eBooks for FREE!
1. JPA Mini Book
2. JVM Troubleshooting Guide
3. JUnit Tutorial for Unit Testing
4. Java Annotations Tutorial
5. Java Interview Questions
and many more ....
Email address:
Do you want to know how to develop your skillset to become a Java Rockstar?
Subscribe to our newsletter to start Rocking right now!
To get you started we give you our best selling eBooks for FREE!
1. JPA Mini Book
2. JVM Troubleshooting Guide
3. JUnit Tutorial for Unit Testing
4. Java Annotations Tutorial
5. Java Interview Questions
6. Spring Interview Questions
7. Android UI Design
and many more ....
Email address:
Want to be a DynamoDB Master ?
Subscribe to our newsletter and download the Amazon DynamoDB Tutorial right now!
In order to help you master this Amazon NoSQL database service, we have compiled a kick-ass guide with all the major DynamoDB features and use cases! Besides studying them online you may download the eBook in PDF format!
Email address:
Programming Interview Coming Up?
Subscribe to our newsletter and download the Ultimate Multithreading and Concurrency interview questions and answers collection right now!
In order to get you prepared for your next Programming Interview, we have compiled a huge list of relevant Questions and their respective Answers. Besides studying them online you may download the eBook in PDF format!
Email address:
Java Interview Coming Up?
Subscribe to our newsletter and download the Ultimate Spring interview questions and answers collection right now!
In order to get you prepared for your next Java Interview, we have compiled a huge list of relevant Questions and their respective Answers. Besides studying them online you may download the eBook in PDF format!
Email address:
Java Interview Coming Up?
Subscribe to our newsletter and download the Ultimate Java interview questions and answers collection right now!
In order to get you prepared for your next Java Interview, we have compiled a huge list of relevant Questions and their respective Answers. Besides studying them online you may download the eBook in PDF format!
Email address:
Want to be a Java NIO Master ?
Subscribe to our newsletter and download the Java NIO Programming Cookbook right now!
In order to help you master Java NIO Library, we have compiled a kick-ass guide with all the major Java NIO features and use cases! Besides studying them online you may download the eBook in PDF format!
Email address:
Want to be a Drools Master ?
Subscribe to our newsletter and download the JBoss Drools Cookbook right now!
In order to help you master Drools Business Rule Management System, we have compiled a kick-ass guide with all the major Drools features and use cases! Besides studying them online you may download the eBook in PDF format!
Email address:
Want to be an iText Master ?
Subscribe to our newsletter and download the iText Tutorial right now!
In order to help you master iText Library, we have compiled a kick-ass guide with all the major iText features and use cases! Besides studying them online you may download the eBook in PDF format!
Email address:
Want to be an Elasticsearch Master ?
Subscribe to our newsletter and download the Elasticsearch Tutorial right now!
In order to help you master Elasticsearch search engine, we have compiled a kick-ass guide with all the major Elasticsearch features and use cases! Besides studying them online you may download the eBook in PDF format!
Email address:
Want to be a Scala Master ?
Subscribe to our newsletter and download the Scala Cookbook right now!
In order to help you master Scala, we have compiled a kick-ass guide with all the basic concepts! Besides studying them online you may download the eBook in PDF format!
Email address:
Want to be a JUnit Master ?
Subscribe to our newsletter and download the JUnit Programming Cookbook right now!
In order to help you master unit testing with JUnit, we have compiled a kick-ass guide with all the major JUnit features and use cases! Besides studying them online you may download the eBook in PDF format!
Email address:
Want to master Amazon Web Services ?
Subscribe to our newsletter and download the Amazon S3 Tutorial right now!
In order to help you master the leading Web Services platform, we have compiled a kick-ass guide with all its major features and use cases! Besides studying them online you may download the eBook in PDF format!
Email address:
Want to master Spring Framework ?
Subscribe to our newsletter and download the Spring Framework Cookbook right now!
In order to help you master the leading and innovative Java framework, we have compiled a kick-ass guide with all its major features and use cases! Besides studying them online you may download the eBook in PDF format!
Email address:
Want to master Eclipse IDE ?
Subscribe to our newsletter and download the Eclipse IDE Handbook right now!
In order to help you master Eclipse, we have compiled a kick-ass guide with all the basic features of the popular IDE! Besides studying them online you may download the eBook in PDF format!
Email address:
Want to master IntelliJ IDEA ?
Subscribe to our newsletter and download the IntelliJ IDEA Handbook right now!
In order to help you master IntelliJ IDEA, we have compiled a kick-ass guide with all the basic features of the popular IDE! Besides studying them online you may download the eBook in PDF format!
Email address:
Want to master Docker ?
Subscribe to our newsletter and download the Docker Containerization Cookbook right now!
In order to help you master Docker, we have compiled a kick-ass guide with all the basic concepts of the Docker container system! Besides studying them online you may download the eBook in PDF format!
Email address:
Want to create a kick-ass Android App ?
Subscribe to our newsletter and download the Android Programming Cookbook right now!
With this book, you will delve into the fundamentals of Android programming. You will understand user input, views and layouts. Furthermore, you will learn how to communicate over Bluetooth and also leverage Google Maps into your application!
Email address:
Want to be a GIT Master ?
Subscribe to our newsletter and download the GIT Tutorial eBook right now!
In order to help you master GIT, we have compiled a kick-ass guide with all the basic concepts of the GIT version control system! Besides studying them online you may download the eBook in PDF format!
Email address:
Want to be a Hadoop Master ?
Subscribe to our newsletter and download the Apache Hadoop Cookbook right now!
In order to help you master Apache Hadoop, we have compiled a kick-ass guide with all the basic concepts of a Hadoop cluster! Besides studying them online you may download the eBook in PDF format!
Email address:
Want to master Spring Data ?
Subscribe to our newsletter and download the Spring Data Ultimate Guide right now!
In order to help you master Spring Data, we have compiled a kick-ass guide with all the major features and use cases! Besides studying them online you may download the eBook in PDF format!
Email address:
Want to create a kick-ass Android App ?
Subscribe to our newsletter and download the Android UI Design mini-book right now!
With this book, you will delve into the fundamentals of Android UI design. You will understand user input, views and layouts, as well as adapters and fragments. Furthermore, you will learn how to add multimedia to an app and also leverage themes and styles!
Email address:
Want to be a Java 8 Ninja ?
Subscribe to our newsletter and download the Java 8 Features Ultimate Guide right now!
In order to get you up to speed with the major Java 8 release, we have compiled a kick-ass guide with all the new features and goodies! Besides studying them online you may download the eBook in PDF format!
Email address:
Want to master Java Annotations ?
Subscribe to our newsletter and download the Java Annotations Ultimate Guide right now!
In order to help you master the topic of Annotations, we have compiled a kick-ass guide with all the major features and use cases! Besides studying them online you may download the eBook in PDF format!
Email address:
Want to be a JUnit Master ?
Subscribe to our newsletter and download the JUnit Ultimate Guide right now!
In order to help you master unit testing with JUnit, we have compiled a kick-ass guide with all the major JUnit features and use cases! Besides studying them online you may download the eBook in PDF format!
Email address:
Want to master Java Abstraction ?
Subscribe to our newsletter and download the Abstraction in Java Ultimate Guide right now!
In order to help you master the topic of Abstraction, we have compiled a kick-ass guide with all the major features and use cases! Besides studying them online you may download the eBook in PDF format!
Email address:
Want to master Java Reflection ?
Subscribe to our newsletter and download the Java Reflection Ultimate Guide right now!
In order to help you master the topic of Reflection, we have compiled a kick-ass guide with all the major features and use cases! Besides studying them online you may download the eBook in PDF format!
Email address:
Want to be a JMeter Master ?
Subscribe to our newsletter and download the JMeter Ultimate Guide right now!
In order to help you master load testing with JMeter, we have compiled a kick-ass guide with all the major JMeter features and use cases! Besides studying them online you may download the eBook in PDF format!
Email address:
Want to be a Servlets Master ?
Subscribe to our newsletter and download the Java Servlet Ultimate Guide right now!
In order to help you master programming with Java Servlets, we have compiled a kick-ass guide with all the major servlet API uses and showcases! Besides studying them online you may download the eBook in PDF format!
Email address:
Want to be a JAXB Master ?
Subscribe to our newsletter and download the JAXB Ultimate Guide right now!
In order to help you master XML Binding with JAXB, we have compiled a kick-ass guide with all the major JAXB features and use cases! Besides studying them online you may download the eBook in PDF format!
Email address:
Want to be a JDBC Master ?
Subscribe to our newsletter and download the JDBC Ultimate Guide right now!
In order to help you master database programming with JDBC, we have compiled a kick-ass guide with all the major JDBC features and use cases! Besides studying them online you may download the eBook in PDF format!
Email address:
Want to be a JPA Master ?
Subscribe to our newsletter and download the JPA Ultimate Guide right now!
In order to help you master programming with JPA, we have compiled a kick-ass guide with all the major JPA features and use cases! Besides studying them online you may download the eBook in PDF format!
Email address:
Want to be a Hibernate Master ?
Subscribe to our newsletter and download the Hibernate Ultimate Guide right now!
In order to help you master JPA and database programming with Hibernate, we have compiled a kick-ass guide with all the major Hibernate features and use cases! Besides studying them online you may download the eBook in PDF format!
Email address:
Want to be a JSF Master ?
Subscribe to our newsletter and download the JSF 2.0 Programming Cookbook right now!
In order to get you prepared for your JSF development needs, we have compiled numerous recipes to help you kick-start your projects. Besides reading them online you may download the eBook in PDF format!
Email address:
Want to be a Java Master ?
Subscribe to our newsletter and download the Advanced Java Guide right now!
In order to help you master the Java programming language, we have compiled a kick-ass guide with all the must-know advanced Java features! Besides studying them online you may download the eBook in PDF format!
Email address:
Want to be a Java Master ?
Subscribe to our newsletter and download the Java Design Patterns right now!
In order to help you master the Java programming language, we have compiled a kick-ass guide with all the must-know Design Patterns for Java! Besides studying them online you may download the eBook in PDF format!
Email address:
Want to be a Hadoop Master ?
Subscribe to our newsletter and download the Hadoop Tutorial right now!
In order to help you master Apache Hadoop, we have compiled a kick-ass guide with all the basic concepts of a Hadoop cluster! Besides studying them online you may download the eBook in PDF format!
Email address:
Want to be an Elastic Beanstalk Master ?
Subscribe to our newsletter and download the Amazon Elastic Beanstalk Tutorial right now!
In order to help you master AWS Elastic Beanstalk, we have compiled a kick-ass guide with all the major Elastic Beanstalk features and use cases! Besides studying them online you may download the eBook in PDF format!
Email address:
Amazon Elastic Beanstalk Tutorial
  • Develop your own Amazon Elastic Beanstalk based applications
  • Learn Java Integration and Command Line Interfacing
  • Get your own projects up and running in minimum time
Want to take your Java skills to the next level?
Grab our programming books for FREE!
Here are some of the eBooks you will get:
  • Spring Interview QnA
  • Multithreading & Concurrency QnA
  • JPA Minibook
  • JVM Troubleshooting Guide
  • Advanced Java
  • Java Interview QnA
  • Java Design Patterns
Insiders are already enjoying weekly updates and complimentary whitepapers!
Join them now to gain exclusive access to the latest news in the Java world, as well as insights about Android, Scala, Groovy and other related technologies.
Email address:
Want to be an ActiveMQ Master ?
Subscribe to our newsletter and download the Apache ActiveMQ Cookbook right now!
In order to help you master Apache ActiveMQ JMS, we have compiled a kick-ass guide with all the major ActiveMQ features and use cases! Besides studying them online you may download the eBook in PDF format!
Email address:
Apache ActiveMQ Cookbook
  • Explore Apache ActiveMQ Best Practices
  • Learn ActiveMQ Load Balancing and File Transfer
  • Develop your own Apache ActiveMQ projects