If there’s one thing that developers hate it’s seeing your piece of code becoming a performance hog. That piece of code that does it’s job beautifully, comes crashing down when there’s a load test. That definitely does make a developer very, very, frustrated.
My team recently had to go through a piece of beautifully written (NOT!), performance hogging pieces of code and twisting and turning into something that doesn’t even break a sweat under a ton of load. Fortunately or miraculously, we was able to achieve this feat after going at it for a couple of days. So this blog is about a few guidelines I follow when I concentrate on improving performance.
1. Identify the critical performance path
First of all, you need to know what you need to fix, before you start fixing it. The same rule applies here. For example, if your application receives data over the net to be stored. You are looking at a data transfer over a network and a data storage on disk, probably in a RDBMS.
After this, separate out everything else. A few ways to do this separation can be to change the rest of application functionality as periodic tasks, on demand operations or separate applications themselves. Of course, to do this your architecture must be loosely coupled. For me, SOA & web services work best.
2. Avoid blocking
Do not block! This has been said so many times but you see it violated in so many places. Be async wherever possible, when dealing with a critical performance path. For example, if you block a network call, potentially you are at risk of not only hogging your own application but hogging the application that send you data as well. At a high load rate, the application that send you data can run out of connections and if not properly handled, messages waiting to be delivered can rapidly stack up in memory. Results can vary from a harmless timeout to catastrophic Out Of Memory situations.
A simple solution to this kind of situation, is to use a queue and a thread pool. The moment a message comes in, we queue it and allow the connection to return and close. This would play an important part, in loving thy neighbouring application and avoiding message stacking and makes data transfer, a smooth affair.
3. Fetching data – Memory vs Disk vs Network
Be mindful of this. Quoting Ryan Dahl – creator of NodeJS – fetching from memory is like walking over to your colleague’s cubicle across the room and getting a note. Fetching data over a network is going over to your colleague’s office half way around the world and getting a note (Check out his talk). I love this analogy, simply because it explains what many of us just choose to ignore. An unnecessary network call is asking for trouble and is most probably one of your biggest hogs. Identify data that you read from disk, probably from a RDBMS, or over a network and you’ll see patterns where you’ll be fetching the same data over and over again. Yes, the solution is obvious. Let’s remember the difference between memory vs disk and let’s just Cache it. Use a good library if you have complex caching needs or just do it with a simple object. Either way, it’ll instantly boost your performance by many folds.
Note: Just like you cache it, make sure your implementation to un-cache (invalidate the cache) is solid as well. Otherwise, you will be seeing some nasty bugs that will rob you of many nights of sleep.
4. Context switching
Threads are great for async operations. But the moment you introduce them, you also introduce context switching between threads. If you have >1000 threads running in your application, chances are you will be doing more context switching than any real work. One way to overcome this is to use ‘Fixed’ thread pools. I highlight fixed because many of us just are used to using unbounded pools. This is asking for trouble. There is a high chance of your app hitting an unpredicted state and suddenly having 1000s of threads and just crashing.
The other common mistake that’s done is introducing unnecessary context switches by introducing what I call bottleneck threads. For example, assume there is an application that has a queue with a heavy number of worker threads. Then you create a ‘single’ thread that’s in charge of dequeuing objects from this queue and calling worker threads to do their job. It is very rare that we have control over the OS and the context switching policy. So chances are, at a high load, this single deque thread doesn’t run enough to deque all the jobs and allocate enough worker threads, creating a performance bottleneck. So, in situations like this a better approach might be to spawn a worker thread that dequeues by itself and does the appropriate work or just dies off if there’s nothing to deque. Since we are using a pool, there will be hardly be a cost thread creation.
5. Never leave it unbounded
You may use connection pools, thread pools, queues and many other mechanisms to gain performance. And probably during tests, all these does very well for you, and your application operates as smooth as butter. But leaving any of these unbounded is asking for trouble. As a best practice cap all your unbounded resource to a suitable maximum value, which is a good defensive practice against unpredictable conditions.
I believe that any application that require high performance or at least moderate performance should be written with the above guide lines in mind. By following the simple guidelines above, my team was able to boost the performance of an application that crashed with around 50 concurrent requests to operate smoothly even at a concurrencies of 1000+.
PS: I probably will follow this up with a part II of this article to cover a few more pointers that I have missed out.