Software Development

Integrating MapR With Ruby: Getting started with MapR-DB and MapR Streams on JRuby

MapR Streams and MapR-DB are both very exciting developments in the MapR Converged Data Platform. In this blog post, I’m going to show you how to get Ruby code to natively interact with MapR-DB and MapR Streams. I am a Ruby developer, and existing Ruby clients/libraries for HBase and Kafka just weren’t working properly with the MapR equivalents. So I set out to figure out a way to get Ruby code to talk natively to both of these MapR technologies. I decided to explore porting the Java examples to JRuby, and it worked quite well. There are several Java examples included in this post so I can also walk you through how to port the Java code to JRuby (a fully-threaded Java implementation of Ruby). Let’s get started!

The source code for this post can be found at https://github.com/rvictory/MapR-JRuby-Demos and is heavily influenced by https://github.com/mapr-demos/maprdb-ojai-101 and https://github.com/mapr-demos/mapr-streams-sample-programs

The Basics – Loading the MapR Client Libraries

JRuby tries its best to keep Ruby conventions while allowing you to interface with Java code and libraries. However, in order for it to “find” the Java classes, you need to load their JARs much like you would a Gem or another Ruby source file. In order to get everything loaded, I took a very heavy-handed approach (as far as I can tell, it doesn’t impact performance). Basically, I prepared my scripts to be able to reference any MapR JAR with the following lines at the top of every Ruby script:

include Java

Dir["/opt/mapr/lib/\*.jar"].each { |jar| require jar }

This will obviously only work if MapR is installed in /opt/mapr; however, it’s easy enough to change if you’re not running in that default configuration.

Using MapR-DB With JRuby

Now that we have referenced the appropriate JARs, it’s a matter of porting the code over from Java to Ruby. For the most part, this is a seamless task—simply take the constructs used in Java and make them “Ruby.” For example, here’s the following code from the MapR-DB demo:

Table table;
if (!MapRDB.tableExists(tableName)) { 
    table = MapRDB.createTable(tableName); // Create the table if not already present } else { 
    table = MapRDB.getTable(tableName); // get the table 
}

Becomes:

table = nil
if !MapRDB.tableExists(tableName)
    table = MapRDB.createTable(tableName)
else
    table = MapRDB.getTable(tableName)
end

One thing to note is that instances where you need to invoke a constructor are treated how you’d probably expect, that is this Java code:

Test test = new Test("arguments");

Becomes the following Ruby code:

test = Test.new("arguments")

Also, Java Generics aren’t a “thing” in JRuby (since types are inferred anyways), so in most cases (at least the cases I’ve found), you can simply ignore them. There may be scenarios where the JRuby compiler/interpreter can’t figure out the type—I haven’t run into this yet but I’m sure there’s a way to address it.

As far as working with MapR-DB, we can simply use the same API that you would use in Java:

(http://maprdocs.mapr.com/apidocs/maprdb_json/51/com/mapr/db/MapRDB.html). To create a new document, we can do it one of two ways:

document = MapRDB.newDocument()
      .set("_id", "jdoe")
      .set("first_name", "John")
      .set("last_name", "Doe")
      .set("dob", ODate.parse("1970-06-23"))
table.insertOrReplace(document)

Or we can use the JSON format instead of the API:

document = MapRDB.newDocument('{"_id" : "test", "first_name" : "John", "last_name" : "Doe", "dob" : "1970-06-23"}')
table.insertOrReplace(document)

That’s pretty much it for the basics of MapR-DB using JRuby. Check out the source code file “maprdb.rb” for a full example. One thing to mention is that by defaul, tables that are created are stored in the user directory in MapR-FS for the current user. If you are running as a user other than mapr, make sure that you create the directory /users/<username/ in MapR-FS for your user and that your user has permission to read and write to that directory.

Using MapR Streams With JRuby

Now that we’ve laid the foundation for porting the Java code to Ruby, the Streams code is just as simple as the MapR-DB code. The only difference is that the original Java Streams example uses “Properties” files to store the configuration, so we just drop these files into the same directory as the Ruby script, and they’ll get picked up by the code.

Other than the slight change (using the props files), the code is a pretty straightforward port from the Java example. Check out the actual code on my Github page to see the producer and the consumer. Make sure that you create the stream (and have the appropriate permissions for the user who is executing the code) using the following MapR command:

maprcli stream create -path /sample-stream

And create the topic using the following command:

maprcli stream topic create -path /sample-stream -topic fast-messages

To allow all users to interact with the stream, use the following command:

maprcli stream edit -path /sample-stream -produceperm p -consumeperm p -topicperm p

And that’s it—the sample code should run just fine. The producer creates 100 messages and writes them to the stream, and the consumer picks up any messages it sees. The rest of the Stream operations are the same as the documentation for the Java API. I recommend moving JSON messages on the Stream, it allows any consumer/language to work with the data.

What’s Next

Now you know how to get Ruby to interact with MapR-DB and MapR Streams. I’m very excited about the prospects of using the Converged Data Platform in future projects; the possibilities are truly endless. Going forward, I intend on wrapping the Streams and MapR-DB APIs into a more “Ruby-esque” package and providing this as a Gem for general consumption. I also intend on writing a MapR-DB query API that mimics that of the query syntax used by MongoDB (JSON-based queries, not API-built ones). The Gem’s source will be posted to my GitHub when it’s complete. Feel free to reach out to me if you have any issues, questions, or complaints.

Thanks for reading!

Ryan Victory

Ryan Victory is a Senior Information Security Engineer at Comerica Bank specializing in "Hunting" - using big data technology to find threats that traditional security controls fail to detect (the "unknown unknowns"). Ryan applies deep knowledge of Information Security principles along with networking expertise and programming skills to create models and tools that analyze large amounts of security log data to identify threats. Ryan loves exploring new technologies, especially in the data field, and currently uses Spark and Hive on MapR to analyze security data, exploiting Spark's parallel processing power and its machine learning to help identify security threats.
Subscribe
Notify of
guest

This site uses Akismet to reduce spam. Learn how your comment data is processed.

0 Comments
Inline Feedbacks
View all comments
Back to top button