Very fast Camels and Cloud Messaging

Christian PostaFebruary 27th, 2015Last Updated: February 26th, 2015

0 53 3 minutes read

Apache Camel is a popular, mature, open-source integration library. It implements the Enterprise Integration Patterns which is a set of patterns that often come up when integrating distributed systems. I’ve written a lot about Camel in the past, including why I like it better than Spring Integration, how the routing engine works, how to use JMS selectors with AWS SQS, and a lot more.

Camel also implements 197 connectors/adapters for talking to external systems (go to the source code, components/ directory and run this: ls -lp components/ | grep / | wc -l), github has a lot more, and you can write your own pretty trivially. This gives Camel a much broader range of connectivity options than any other integration library.

Recently, I was fortunate to be able to help out a top, household-named, e-retailer with their use of Camel. They take online orders and process them using an event driven architecture which includes publishing events like “order_received”, “order_cancelled”, “order_ready_to_ship” and others. These events are handled by microservices interested in participating in the order processing flows, and are quite loosely coupled because of the EDA in place.

The nature of this type of retail business in question is very seasonal. And there are times during the year (holidays, etc) that tend to increase load by orders of magnitude. So being able to scale without outages to meet these seasonal peaks is paramount.

Luckily, as they’re a smart bunch, they use Apache Camel for their integrations, and specifically the implementation of some of these services. Each order generates quite a few events, and they must be processed in a timely fashion and keep up with the rest of the load. The queueing service for this was Amazon SQS, and Camel has an AWS SQS component for that.

For nominal load, Camel was processing these events just fine. But when the queue got deeper, Camel was having some trouble keeping up. We were only getting 200 messages per minute, which doesn’t pass the smell test. Digging in a little deeper, we found that the AWS libraries allow you to scale vertically increasing the number of connections and by batching message delivery (max, 10 batched messages). Batching helps, and Camel was implemented to deal with the batching, but it still wasn’t fast enough, only about 10K messages per hour.

Upon further digging we could see only a single thread was handling the polling of the message queue. So instead of processing the messages inline with the thread that polls the queue, we decided to use a SEDA queue so that we can pull messages from SQS and quickly dump into an in-memory queue so we could start the next poll, something like this:

from("amazon-sqs://order.queue").to("seda:incomingOrders");

from("seda:incomingOrders").process(do our processing in another thread...);

This allowed us to deal with the load using the Staged Event Driven Architecture pattern. This change gave us another boost in performance to about 40K messages per hour, but we’re talking about a very popular commerce site, so still not enough scaling to meet the needs of the system during peak.

So we took one more look and wondered why we couldn’t have multiple threads/connections polling at once? The AWS libraries were written with this in mind, but there wasn’t a way to configure Camel to do this for this specific type of endpoint. Camel can do this for other endpoints (JMS, SEDA, etc), but we needed to make a small little change in Camel SQS for this.

And here’s the beauty of using open-source, community-style, development philosophies: the code is open, the community is welcoming of changes, and now future users of Camel and its functionality can reap the benefits of this collaboration.

So we committed a patch that allows you to set a concurrentConsumers option on the sqs queue which will ramp up the number of threads used for connecting to and polling the queue. Something like this:

from("amazon-sqs://order.queue?concurrentConsumers=50").to(.... processing here....)

See the documentation on camel-sqs for more information. This change will be part of the Apache Camel 2.15.0 release which should be coming out in the next couple of weeks.

With this setting, we were able to handle all of the load Black Friday and Cyber Monday could throw at the site, at one point processing > 1.5 million messages per hour.

Thank you Open Source!

Reference:

Very fast Camels and Cloud Messaging from our JCG partner Christian Posta at the Christian Posta – Software Blog blog.