I will describe here a few details for Storm and Kafka integration modules, a few important bugs that you should be aware and how to overcome some of them (especially for production installations). I am heavily using Apache Storm in production installations with Kafka as my main input source (Spout). Storm integration modules with Kafka and versions: Storm 0.x supports ...
Read More »Home »
Apache Storm: How to configure KafkaBolt with Flux
Flux in a mini framework that can help us define and deploy a Storm topology. Flux has various wrappers that help you define the required stream(s) and initialize your Bolts and Spouts (using constructor with or without arguments and call custom configuration methods automatically via reflection). What you only need to use Flux is to add it as dependency in ...
Read More »Configurable ETL processing using Apache Storm and Kite SDK Morphlines
From the first days I have worked as software engineer, I always hear the same request by many sides: “We want to have everything configurable, we want to change everything on runtime and we want to have a visual tool to apply all this logic in order to non-developer people use and configure our application.” I like this generic scope ...
Read More »Real time sentiment analysis example with Apache Storm
Real Time Sentiment Analysis refers to processing streams of natural language text (or voice) in order to extract subjective information. The trivial use case is for building a recommendation engine or for finding social media trends. I have selected Apache Storm as real time processing engine. Storm is very robust (we are using it on production) and very easy to ...
Read More »Local installation of standalone HBase and Apache Storm simple cluster
We mainly use Apache Storm for streaming processing and Apache HBase as NoSQL wide-column database. Even if Apache Cassandra is a great NoSQL database, we mostly prefer HBase because of Cloudera distribution and as it is more consistent (check CAP theorem) than Cassandra. HBase is based on HDFS, but it can be easy installed as standalone for testing purposes. You ...
Read More »Storm event processor – GC log file per worker
In the last three months, I am working with a new team building a product for Big Data analytics on Telecom domain. Storm event processor is one of the main frameworks we use and it is really great. You can read more details on its official documentation (which has been improved). Storm uses Workers to do your job, where each ...
Read More »Set WildFly binding address and shutdown using CLI
It’s very easy to bind WildFly on a hostname/IP just using command line parameters. I have a simple GNU/Linux box that I use it to play with various things, one of them is WildFly. I start WildFly listening on a specific IP using this commands: $> cd /opt/wildfly/wildfly-8.0.0.Beta1/bin $> ./standalone.sh -c standalone-full.xml -b=192.168.1.10 -bmanagement=192.168.1.10 ...
Read More »Add Apache Camel and Spring as jboss modules in WildFly
These days I am playing with Wildfly and Apache Camel and Spring. A simple way to communicate between EARs / WARs is using direct-vm component of Camel. There are many ways to achieve this with or without Camel. Camel works like a charm in WildFly without any need for extra configurations. Camel is great!!! In order to avoid pack all ...
Read More »Java heap space, native heap and memory problems
Recently, I was discussing with a friend, why the Java process uses more memory than the maximum heap that we set when starting the java process. All java objects that code creates are created inside Java heap space, which its size is defined by the -Xmx option. But a java process is consisted by many spaces, not only by the ...
Read More »