Enterprise Java

Twitter4j and Esper: Tracking user sentiments on Twitter

For new comers to Complex Event Processing  and Twitter API, I hope this serves as a short tutorial and helps them get off the ground quickly.

Managing big data and mining useful information from it is the hottest discussion topic in technology right now. Explosion of growth in semi-structured data flowing from social networks like Twitter, Facebook and Linkedin is making technologies like Hadoop, Cassandra a part of every technology conversation. So as not to fall behind of competition, all customer centric organizations are actively engaged in creating social strategies.
What can a company get out of data feeds from social networks? Think location based services, targeted advertisements and algorithm equity trading for starters. IDC Insights have some informative blogs on the relationship between big data and business analytics. Big data in itself will be meaningless unless the right analytic tools are available to sift through it, explains Barb Darrow in her blog post  on gigaom.com

Companies often listen into social feeds to learn customers’ interest or perception about the products. They also are trying to identify “influencers” – the one with most connections in a social graph – so they could make better offers to such individuals and get better mileage out of their marketing. The companies involved in equity trading want to know which public trading companies are discussed on Twitter and what are the users’ sentiments about them.

From big companies like IBM to  smaller start-ups, everyone is racing to make most of the opportunities of big data management and analytics. Much documentation about big data like this ebook from IBM ‘Big Data Platform’  is freely available on the web. However a lot of this covers theory only. Jouko Ahvenainen in reply to Barb Darrow’s post above makes a good point that “many people who talk about the opportunity of big data are on too general level, talk about better customer understanding, better sales, etc. In reality you must be very specific, what you utilize and how”.

It does sound reasonable, doesn’t it? So I set out to investigate this a bit further by prototyping an idea, the only good option I know. If I could do it, anybody could do it. The code is remarkably simple. But, that’s exactly the point. Writing CEP framework yourself is quite complex but using it is not. Same way, Twitter makes it real easy to get to the information through REST API.

Big Data – http://www.bigdatabytes.com/managing-big-data-starts-here/
Complex Event Processing (CEP), I blogged previously (click here to read) is a critical component of the big data framework. Along with CEP, frameworks with  Hadoop are used to compile, parse and make sense out of the 24×7 stream of data from the social networks. Today,  Twitter’s streaming api and CEP could be used together to capture the happiness levels of twitter users. The code I present below listens in to live tweets to generate an ‘happy’ event every time “lol” is found in the text of a tweet. The CEP is used to capture happy events and alert is raised every time the count of happy events exceed pre-determined number in a pre-determined time period. An assumption that a user is happy every time he or she uses “lol” is very simplistic, but it helps get the point across. In practice, gauging the users’ sentiment is not that easy because it involves natural language analysis. Consider below the example that highlights the complexities of analyzing natural language.

Iphone has never been good.

Iphone has never been so good.

As you can see, addition of just one word to the sentence completely changed the meaning. Because of this reason, natural language processing is considered one of the toughest problems in computer science. You can learn “natural language processing” using free online lectures offered by Stanford University. This link  takes you directly to the first lecture on natural language analysis by Christopher Manning. But, in my opnion, the pervasive use of abbreviations in social media and in modern lingo in general, is making the task a little bit easier. Abbreviations like “lol” and “AFAIK” accurately project the meaning. The use of “lol” projects “funny” and “AFAIK” may indicate the user is “unsure” of him or herself.

The code presented below uses Twitter4j api to listen to live twitter feed and Esper CEP to listen to events and alert us when a threshold is met. You can download twitter4j binaries or source from http://twitter4j.org/en/index.html and Esper from http://esper.codehaus.org/ . Before you execute the code, make sure to create a twitter account if you don’t have one and also read Twitter’s guidelines and concepts  its streaming API here . The authentication through just username & password combination is currently allowed by Twitter but it is going to be phased out in favor of oAuth authentication in near future. Also, pay close attention to their ‘Access and Rate Limit’ section.

The code below uses streaming api in one thread. Please do not use another thread at the same time to avoid hitting the rate limit. Hitting rate limits consistently can result into Twitter blacklisting your twitter ID. Also it is important to note that, the streaming API is not sending each and every tweet our way. Twitter typically will sample the data by sending 1 out every 10 tweets our way. This is not a problem however for us, as long as we are interested in patterns in the data and not in any specific tweet. Twitter offers a paid service for  businesses that need streaming data with no rate limits. Following diagram shows the components and processing of data.

Diagram. Charts & DB not yet implemented in the code
package com.sybase.simple;

public class HappyMessage {
 public String user;
 private final int ctr=1;
 public String getUser() {
  return user;
 }
 public void setUser(String user) {
  this.user = user;
 }
 public int getCtr() {
  return ctr;
 }
} 

Listing 1. Standard java bean representing a happy event.

package com.sybase.simple;

package com.sybase.simple;

import com.espertech.esper.client.EventBean;
import com.espertech.esper.client.UpdateListener;

public class HappyEventListener implements UpdateListener{
 public void update(EventBean[] newEvents, EventBean[] oldEvents) {
  try {
   if (newEvents == null) {

    return;
   }
   EventBean event = newEvents[0];
   System.out.println("exceeded the count, actual " + event.get("sum(ctr)"));
  } catch (Exception e) {
   e.printStackTrace();
  }

 }
}

Listing 2. Esper listener is defined.

package com.sybase.simple;

package com.sybase.simple;

import java.io.IOException;

import twitter4j.Status;
import twitter4j.StatusDeletionNotice;
import twitter4j.StatusListener;
import twitter4j.TwitterException;
import twitter4j.TwitterStream;
import twitter4j.TwitterStreamFactory;
import twitter4j.conf.Configuration;
import twitter4j.conf.ConfigurationBuilder;

import com.espertech.esper.client.EPServiceProvider;
import com.espertech.esper.client.EPServiceProviderManager;
import com.espertech.esper.client.EPStatement;

public class TwitterTest {
 static EPServiceProvider epService;

 public static void main(String[] args) throws TwitterException, IOException {

  // Creating and registering the CEP listener

  com.espertech.esper.client.Configuration config1 = new com.espertech.esper.client.Configuration();
  config1.addEventType("HappyMessage", HappyMessage.class.getName());
  epService = EPServiceProviderManager.getDefaultProvider(config1);
  String expression = "select user, sum(ctr) from com.sybase.simple.HappyMessage.win:time(10 seconds) having sum(ctr) > 2";

  EPStatement statement = epService.getEPAdministrator().createEPL(
    expression);
  HappyEventListener happyListener = new HappyEventListener();
  statement.addListener(happyListener);

  ConfigurationBuilder cb = new ConfigurationBuilder();
  cb.setDebugEnabled(true);
  //simple http form based authentication, you can use oAuth if you have one, check Twitter4j documentation
  cb.setUser("your Twitter user name here");
  cb.setPassword("Your Twitter password here");

  // creating the twitter listener

  Configuration cfg = cb.build();
  TwitterStream twitterStream = new TwitterStreamFactory(cfg)
    .getInstance();
  StatusListener listener = new StatusListener() {
   public void onStatus(Status status) {

    if (status.getText().indexOf("lol") > 0) {
     System.out.println("********* lol found *************");
     raiseEvent(epService, status.getUser().getScreenName(),
       status);
    }
   }

   public void onDeletionNotice(
     StatusDeletionNotice statusDeletionNotice) {
    System.out.println("Got a status deletion notice id:"
      + statusDeletionNotice.getStatusId());
   }

   public void onTrackLimitationNotice(int numberOfLimitedStatuses) {
    System.out.println("Got track limitation notice:"
      + numberOfLimitedStatuses);
   }

   public void onScrubGeo(long userId, long upToStatusId) {
    System.out.println("Got scrub_geo event userId:" + userId
      + " upToStatusId:" + upToStatusId);
   }

   public void onException(Exception ex) {
    ex.printStackTrace();
   }
  };
  twitterStream.addListener(listener);

  //
  twitterStream.sample();

 }

 private static void raiseEvent(EPServiceProvider epService, String name,
   Status status) {
  HappyMessage msg = new HappyMessage();
  msg.setUser(status.getUser().getScreenName());
  epService.getEPRuntime().sendEvent(msg);
 }

}
} 

Listing 3.

Twitter4j listener is created. This listener and CEP listener start listening. Every twitter post is parsed for ‘lol’. Every time ‘lol’ is found, an happy event is generated. CEP listener raises an alert every time the total count of ‘lol’ exceeds 2 in last 10 seconds.
The code establishes a long running thread to get twitter feeds. You will see the output on the console every time threshold is met. Please remember to terminate the program, it doesn’t terminate on its own.

Now that you have this basic functionality working, you can extend this prototype in number of ways. You can handle additional data feeds (from source other than Twitter) and use Esper to corelate data from the two data feeds. For visually appealing output, you can feed the output to some charting library. For example, every time Esper identifies an event, the data point is used to render a point on a line graph. If you track the ‘happy event’ this way, then the graph will essentially show the ever changing level of happiness of Twitter users over a period of time.

Please use comment section for your feedback, +1 to share and let me know if you would like to see more postings on this subject.

Reference: Tracking user sentiments on Twitter with Twitter4j and Esper from our JCG partner Mahesh Gadgil at the Simple yet Practical blog.

Subscribe
Notify of
guest

This site uses Akismet to reduce spam. Learn how your comment data is processed.

1 Comment
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
A Bauer
12 years ago

Nice article. I’ve been using Esper and Twitter already for some years. Works great. I’ll elaborate more on this in my PhD thesis. This supports my idea that my topic is relevant :)

Back to top button