Home » Author Archives: Mathieu Dumoulin

Author Archives: Mathieu Dumoulin

Mathieu is a Data Engineer on the MapR Professional Services team, and is based in the Asia-Pacific region. He started using Hadoop in 2012 at the Fujitsu Canada Innovation Lab, where he built a large-scale text classification system from scratch. Since then, Mathieu split his time between being a Search Engineer and managing a new Data Science team for a large Japanese HR company. His current interests are focused on Apache Drill, Apache Spark, and Deep Learning. Mathieu holds both a B.A.Sc. in Computer Science and a Master of Computer Science degree from the Université Laval in Canada.

Performance Tuning of an Apache Kafka/Spark Streaming System

Real-world case study in the telecom industry Debugging a real-life distributed application can be a pretty daunting task. Most common Google searches don’t turn out to be very useful, at least at first. In this blog post, I will give a fairly detailed account of how we managed to accelerate by almost 10x an Apache Kafka/Spark Streaming/Apache Ignite application and ...

Read More »

Real-time Smart City Traffic Monitoring Using Microservices-based Streaming Architecture (Part 2)

Modern Open Source Complex Event Processing For IoT This series of blog posts details my findings as I bring to production a fully modern take on Complex Event Processing, or CEP for short. In many applications, ranging from financials to retail and IoT applications, there is tremendous value in automating tasks that require to take action in real time. Putting ...

Read More »

Better Complex Event Processing at Scale Using a Microservices-based Streaming Architecture (Part 1)


A microservice-based streaming architecture combined with an open source rule engine makes real-time business rules easy This post is intended as a detailed account of a project I have made to integrate an OSS business rules engine with a modern stream messaging system in the Kafka style. The goal of the project, better known as Complex Event Processing (CEP), is ...

Read More »

CLDB Monitoring Using JMX as a Modern Alternative to Ganglia


There are many options for monitoring the performance and health of a MapR cluster. In this post, I will present the lesser-known method for monitoring the CLDB using the Java Management Extensions (JMX). According to one of the most highly regarded MapR Data Engineers, Akihiko Kusanagi, using JMX to get CLDB metrics can be seen as a more modern and ...

Read More »

Distributed Deep Learning with Caffe Using a MapR Cluster

We have experimented with CaffeOnSpark on a 5 node MapR 5.1 cluster running Spark 1.5.2 and will share our experience, difficulties, and solutions on this blog post. Deep Learning and Caffe Deep learning is getting a lot of attention recently, with AlphaGo beating a top world  player at a game that was thought so complicated as to be out of reach of ...

Read More »

Want to take your Java skills to the next level?

Grab our programming books for FREE!

Here are some of the eBooks you will get:

  • Spring Interview QnA
  • Multithreading & Concurrency QnA
  • JPA Minibook
  • JVM Troubleshooting Guide
  • Advanced Java
  • Java Interview QnA
  • Java Design Patterns