Enterprise Java

Local installation of standalone HBase and Apache Storm simple cluster

We mainly use Apache Storm for streaming processing and Apache HBase as NoSQL wide-column database.

Even if Apache Cassandra is a great NoSQL database, we mostly prefer HBase because of Cloudera distribution and as it is more consistent (check CAP theorem) than Cassandra.

HBase is based on HDFS, but it can be easy installed as standalone for testing purposes. You just need to download latest version, extract compressed file, start standalone node and then start an HBase shell and play.

$> tar zxvf hbase-1.1.2-bin.tar.gz
$> cd hbase-1.1.2/bin/
$> ./start-hbase.sh
$> ./hbase shell
hbase(main):001:0> create 'DummyTable', 'cf'
hbase(main):001:0> scan 'DummyTable'

When you start HBase in standalone mode, then it automatically starts a local Zookeeper node too (running in default port 2181).

$> netstat -anp|grep 2181

Zookeeper is used by HBase and Storm as a distributed coordinator mechanism. Now, as you have already running a local Zookeeper node, then you are ready to configure and run a local Storm cluster.

  • Download latest Storm
  • Extract
  • Configure “STORM_HOME/conf/storm.yaml” (check below)
  • Start local cluster:
    • $> cd STORM_HOME/bin
    • $> ./storm nimbus
    • $> ./storm supervisor
    • $> ./storm ui
  • Logs are located at “STORM_HOME/logs/” directory
  • Check local Storm UI at: localhost:8080

Contents of new “storm.yaml” configuration file:

storm.zookeeper.servers:
- "localhost"

nimbus.host: "localhost"

supervisor.slots.ports:
- 6701
- 6702

You can also set parameter “worker.childopts” to set JVM options for each Worker (processing nodes). Here is a simple example for my local JVMs, where I set min/max heap size, garbage collection strategy, enable JXM and GC logs.

worker.childopts: "-server -Xms512m -Xmx2560m -XX:PermSize=128m -XX:MaxPermSize=512m -XX:+UseParallelOldGC -XX:ParallelGCThreads=3 -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -verbose:gc -Xloggc:/tmp/gc-storm-worker-%ID%.log -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.port=1%ID% -XX:+PrintFlagsFinal -Djava.awt.headless=true -Djava.net.preferIPv4Stack=true"

Parameter “worker.childopts” is loaded by all the Worker JVM nodes. Variable “%ID%” corresponds to port (6701 or 6702) assigned to each Worker. As you can see, I have used it to enable different JMX port for each worker and different GC log file.

We are using Storm using JDK 7, but JDK 8 seems to be compatible too. Latest Storm has switched from Logback to Log4j2 (check full release notes here and here).

Using the above instructions, you will be able to run HBase and Storm mini cluster in your laptop without any problem.

Adrianos Dadis

Adrianos is working as senior software engineer in telcos business domain. Particularly interested in enterprise integration, multi-tier architecture and middleware services. He mainly works with Weblogic, JBoss, Java EE, Spring, Drools, Oracle SOA Suite and various ESBs.
Subscribe
Notify of
guest

This site uses Akismet to reduce spam. Learn how your comment data is processed.

0 Comments
Inline Feedbacks
View all comments
Back to top button