About Tugdual Grall

Tugdual "tug" is Technical Evangelist at Couchbase. He is passionate about development and software architecture using Java platform but also others one. Tug He is also co-founder of the Nantes Java User Group (NantesJUG) in France and developer of the Resultri site hosted on GAE.

Couchbase : Create a large dataset using Twitter and Java

An easy way to create large dataset when playing/demonstrating Couchbase -or any other NoSQL engine- is to inject Twitter feed into your database.

For this small application I am using:

In this example I am using Java to inject Tweets into Couchbase, you can obviously use another langage if you want to.

The sources of this project are available on my Github repository Twitter Injector for Couchbase you can also download the Binary version here, and execute the application from the command line, see Run The Application paragraph. Do not forget to create your Twitter oAuth keys (see next paragraph)

Create oAuth Keys

The first thing to do to be able to use the Twitter API is to create a set of keys. If you want to learn more about all these keys/tokens take a look to the oAuth protocol : http://oauth.net/

1. Log in into the Twitter Development Portal : https://dev.twitter.com/

2. Create a new Application

Click on the ‘Create an App’ link or go into the ‘User Menu > My Applications > Create a new application’

3. Enter the Application Details information

4. Click ‘Create Your Twitter Application’ button

Your application’s OAuth settings are now available :

5- Go down on the Application Settings page and click on the ‘Create My Access Token’ button

You have now all the necessary information to create your application:

  • Consumer key
  • Consumer secret
  • Access token
  • Access token secret

These keys will be uses in the twitter4j.properties file when running the Java application from the command line see

Create the Java Application

The following code is the main code of the application:

package com.couchbase.demo;

import com.couchbase.client.CouchbaseClient;
import org.json.JSONException;
import org.json.JSONObject;
import twitter4j.*;
import twitter4j.json.DataObjectFactory;

import java.io.InputStream;
import java.net.URI;

import java.util.ArrayList;
import java.util.List;
import java.util.Properties;

public class TwitterInjector {

    public final static String COUCHBASE_URIS = "couchbase.uri.list";
    public final static String COUCHBASE_BUCKET = "couchbase.bucket";
    public final static String COUCHBASE_PASSWORD = "couchbase.password";

    private List<URI> couchbaseServerUris = new ArrayList<URI>();
    private String couchbaseBucket = "default";
    private String couchbasePassword = "";

    public static void main(String[] args) {
        TwitterInjector twitterInjector = new TwitterInjector();
        twitterInjector.setUp();
        twitterInjector.injectTweets();
    }

    private void setUp() {
        try {
            Properties prop = new Properties();
            InputStream in = TwitterInjector.class.getClassLoader().getResourceAsStream("twitter4j.properties");
            if (in == null) {
                throw new Exception("File twitter4j.properties not found");
            }
            prop.load(in);
            in.close();

            if (prop.containsKey(COUCHBASE_URIS)) {
                String[] uriStrings =  prop.getProperty(COUCHBASE_URIS).split(",");
                for (int i=0; i<uriStrings.length; i++) {
                    couchbaseServerUris.add( new URI( uriStrings[i] ) );
                }

            } else {
                couchbaseServerUris.add( new URI("http://127.0.0.1:8091/pools") );
            }

            if (prop.containsKey(COUCHBASE_BUCKET)) {
                couchbaseBucket = prop.getProperty(COUCHBASE_BUCKET);
            }

            if (prop.containsKey(COUCHBASE_PASSWORD)) {
                couchbasePassword = prop.getProperty(COUCHBASE_PASSWORD);

            }

        } catch (Exception e) {
            System.out.println( e.getMessage() );
            System.exit(0);
        }

    }

    private void injectTweets() {
        TwitterStream twitterStream = new TwitterStreamFactory().getInstance();
        try {
            final CouchbaseClient cbClient = new CouchbaseClient( couchbaseServerUris , couchbaseBucket , couchbasePassword );
            System.out.println("Send data to : "+  couchbaseServerUris +"/"+ couchbaseBucket );
            StatusListener listener = new StatusListener() {

                @Override
                public void onStatus(Status status) {
                    String twitterMessage = DataObjectFactory.getRawJSON(status);

                    // extract the id_str from the JSON document
                    // see : https://dev.twitter.com/docs/twitter-ids-json-and-snowflake
                    try {
                        JSONObject statusAsJson = new JSONObject(twitterMessage);
                        String idStr = statusAsJson.getString("id_str");
                        cbClient.add( idStr ,0, twitterMessage  );
                        System.out.print(".");
                    } catch (JSONException e) {
                        e.printStackTrace(); 
                    }
                }

                @Override
                public void onDeletionNotice(StatusDeletionNotice statusDeletionNotice) {
                }

                @Override
                public void onTrackLimitationNotice(int numberOfLimitedStatuses) {
                }

                @Override
                public void onScrubGeo(long userId, long upToStatusId) {
                }

                @Override
                public void onException(Exception ex) {
                    ex.printStackTrace();
                }
            };

        twitterStream.addListener(listener);
        twitterStream.sample();

        } catch (Exception e) {
            e.printStackTrace();  
        }
    }

}

Some basic explanation:

  • The setUp() method simply reads the twitter4j.properties file from the classpath to build the Couchbase connection string.
  • The injectTweets opens the Couchbase connection -line 76- and calls the TwitterStream API.
  • A Listener is created and will receive all the onStatus(Status status) from Twitter. The most important method is onStatus() that receive the message and save it into Couchbase.
  • One interesting thing : since Couchbase is a JSON Document database it allows your to just take the JSON String and save it directly.
    cbClient.add(idStr,0 ,twitterMessage);

 
Packaging

To be able to execute the application directly from the Jar file, I am using the assembly plugin with the following informations from the
pom.xml :

... 
<archive>
  <manifest>
   <mainclass>com.couchbase.demo.TwitterInjector</mainclass>
  </manifest>
  <manifestentries>
   <class-path>.</class-path>
  </manifestentries>
</archive>
...

Some information:

  • The mainClass entry allows you to set which class to execute when running java -jar command.
  • The Class-Path entry allows you to set the current directory as part of the classpath where the program will search for the twitter4j.properties file.
  • The assembly file is also configure to include all the dependencies (Twitter4J, Couchbase client SDK, …)

If you do want to build it from the sources, simply run :

mvn clean package

This will create the following Jar file . /target/CouchbaseTwitterInjector.jar

Run the Java Application

Before running the application you must create a twitter4j.properties file with the following information :

twitter4j.jsonStoreEnabled=true

oauth.consumerKey=[YOUR CONSUMER KEY]
oauth.consumerSecret=[YOUR CONSUMER SECRET KEY]
oauth.accessToken=[YOUR ACCESS TOKEN]
oauth.accessTokenSecret=[YOUR ACCESS TOKEN SECRET]

couchbase.uri.list=http://127.0.0.1:8091/pools
couchbase.bucket=default
couchbase.password=

Save the properties file and from the same location run:

jar -jar [path-to-jar]/CouchbaseTwitterInjector.jar

This will inject Tweets into your Couchbase Server. Enjoy !
 

Reference: Couchbase : Create a large dataset using Twitter and Java from our JCG partner Tugdual Grall at the Tug’s Blog blog.

Related Whitepaper:

Functional Programming in Java: Harnessing the Power of Java 8 Lambda Expressions

Get ready to program in a whole new way!

Functional Programming in Java will help you quickly get on top of the new, essential Java 8 language features and the functional style that will change and improve your code. This short, targeted book will help you make the paradigm shift from the old imperative way to a less error-prone, more elegant, and concise coding style that’s also a breeze to parallelize. You’ll explore the syntax and semantics of lambda expressions, method and constructor references, and functional interfaces. You’ll design and write applications better using the new standards in Java 8 and the JDK.

Get it Now!  

Leave a Reply


nine + 7 =



Java Code Geeks and all content copyright © 2010-2014, Exelixis Media Ltd | Terms of Use | Privacy Policy
All trademarks and registered trademarks appearing on Java Code Geeks are the property of their respective owners.
Java is a trademark or registered trademark of Oracle Corporation in the United States and other countries.
Java Code Geeks is not connected to Oracle Corporation and is not sponsored by Oracle Corporation.

Sign up for our Newsletter

20,709 insiders are already enjoying weekly updates and complimentary whitepapers! Join them now to gain exclusive access to the latest news in the Java world, as well as insights about Android, Scala, Groovy and other related technologies.

As an extra bonus, by joining you will get our brand new e-books, published by Java Code Geeks and their JCG partners for your reading pleasure! Enter your info and stay on top of things,

  • Fresh trends
  • Cases and examples
  • Research and insights
  • Two complimentary e-books