Abhishek Jain

About Abhishek Jain

Abhishek is a lead software engineer with Impetus Technologies, pioneered in outsource product development. He has technical expertise in designing and implementing solutions around distributed application, database technologies, middle-ware technologies and SaaS platform.

The complex (event) world

This blog entry attempts to summarize the technologies in the CEP domain, touching over their prime feature(s) as well as their lackings.  It seems sometimes that the term CEP is being overused (as did happen with ‘ESB’) and the write-up below reflects our perception and version of it.

ESPER (http://esper.codehaus.org/) is a popular open source component for complex event processing (CEP) available for java. It includes rich support for pattern matching and stream processing based on sliding time or length windows. Despite the fervent discussion over the term ‘CEP’ (http://www.dbms2.com/2011/08/25/renaming-cep-or-not/), ESPER seem to be a good fit for the term CEP, as it appears to be able to really identify “complex events” from a stream of simple events, thanks to ESPER’s EPL (Event Processing Language).

Recently, while searching for an open source solution for real time CEP, our group stumbled across Twitter’s Storm project (https://github.com/nathanmarz/storm). It claims to be most comparable to Yahoo’s S4, while being in the same space as “Complex Event Processing” systems like Esper and Streambase. I am not sure about Streambase, but digging deeper into the Storm project made it look much different from CEP and from the ESPER solution. Ditto with S4 (http://incubator.apache.org/s4/). While S4 and Storm seem to be good at real time stream processing in a distributed mode and they appeared (as they claim) to be the “Hadoop for Real Time”, they didn’t seem to have provisions to match patterns (and thus to indicate complex events).
Searching for a definition for CEP (that our study can relate to) led to the following bullets, (http://colinclark.sys-con.com/node/1985400) which include the below four as prerequisite for a system/solution to be called a CEP component/project/solution:

  • Domain Specific Language 
  • Continuous Query 
  • Time or Length Windows
  • Temporal Pattern Matching
The lack of continuous query supporting time/length windows and temporal pattern matching seem to altogether missing in current versions of the S4 and Storm projects. Probably, it is due to their infancy and they will mature up to include such features in future. As of now, they only seem fit for pre-processing the event stream before passing it over to a CEP engine like ESPER. Their ability to do distributed processing (a la map-reduce mode) can help to speed up the pre-processing where events can be filtered off, or enriched via some lookup/computation etc. There have also been some attempts to integrate Storm with Esper (http://tomdzk.wordpress.com/2011/09/28/storm-esper/).

While the processing systems like S4 and Storm lack important features of CEP, ESPER based systems have the disadvantage of being memory bound. Having too many events or having a large time windows can potentially cause ESPER to run out of memory. If ESPER is used to process real time streams, for e.g. from a social media, there will be lot of data accumulating in the ESPER memory. On a high level, the problem statement is to invent a CEP solution for big data. On a finer level, the problem statement include architecting a CEP solution for handling on-board (batched) as well as in-flight (Real-Time) data.

In DarkStar’s terminology (http://www.eventprocessing-communityofpractice.org/EPS-presentations/Clark_EP.pdf), the requirement is “having matched a registered pattern in real time, discover similar patterns in the database”. Since being memory bound is a limitation, it may prove useful if some mechanism to condense the in-memory events can be arrived at. The condensed data however should still be meaningful and retain the context of the original stream.

DarkStar does this using Symbolic Aggregate Approximation (http://www.cs.ucr.edu/~eamonn/SAX.htm), and they claim to address the aforementioned requirements by using SAX together with AsterData’s nCluster which is a mpp (massively parallel processing) database with an embedded analytics engine based on SQL/MapReduce (http://www.asterdata.com/resources/mapreduce.php).

to be continued (as we research further) …

Reference: The complex (event) world from our JCG partner Abhishek Jain at the NS.Infra blog.

Related Whitepaper:

Functional Programming in Java: Harnessing the Power of Java 8 Lambda Expressions

Get ready to program in a whole new way!

Functional Programming in Java will help you quickly get on top of the new, essential Java 8 language features and the functional style that will change and improve your code. This short, targeted book will help you make the paradigm shift from the old imperative way to a less error-prone, more elegant, and concise coding style that’s also a breeze to parallelize. You’ll explore the syntax and semantics of lambda expressions, method and constructor references, and functional interfaces. You’ll design and write applications better using the new standards in Java 8 and the JDK.

Get it Now!  

Leave a Reply

9 + four =

Java Code Geeks and all content copyright © 2010-2014, Exelixis Media Ltd | Terms of Use | Privacy Policy
All trademarks and registered trademarks appearing on Java Code Geeks are the property of their respective owners.
Java is a trademark or registered trademark of Oracle Corporation in the United States and other countries.
Java Code Geeks is not connected to Oracle Corporation and is not sponsored by Oracle Corporation.

Sign up for our Newsletter

20,709 insiders are already enjoying weekly updates and complimentary whitepapers! Join them now to gain exclusive access to the latest news in the Java world, as well as insights about Android, Scala, Groovy and other related technologies.

As an extra bonus, by joining you will get our brand new e-books, published by Java Code Geeks and their JCG partners for your reading pleasure! Enter your info and stay on top of things,

  • Fresh trends
  • Cases and examples
  • Research and insights
  • Two complimentary e-books