Home » Java » Enterprise Java » The New Way of Handling Java Errors Before They Hit Production

About Henn Idan

Henn Idan
Henn works at OverOps, helping developers know when and why code breaks in production. She writes about Java, Scala and everything in between. Lover of gadgets, apps, technology and tea.

The New Way of Handling Java Errors Before They Hit Production

How we realized the old way of solving errors in pre-production is not enough, and how we were able to change that

There’s no such thing as perfect code on the first try, and we can all testify that we’ve learned that the hard way. It doesn’t matter how many test cycles, code reviews or tools we use, there’s always at least one sneaky bug that manages to surprise us.

In the following post, we’ll share our own story of such a bug, the common workflow that developers use to solve it compared to the new way we do it at OverOps. Spoiler alert: log files don’t cut it, and now it’s time to understand why.

Big shoutout to Dor Levi who helped us through this issue and post!

Act I: Detecting there’s an issue

A couple of weeks ago our AWS server started sending out distress signals. Our QA team came across it during one of their tests, and complained that the server couldn’t handle their load testing. They turned to our VP R&D seeking help. The VP R&D pulled out his credit card, increased the server load and told QA that the issue was fixed.

As you can guess, the problem was far from over. After the second cycle of complaint -> increasing AWS payment -> complaint, we realized that there’s more to this issue and it needs further investigation. We had to figure out what went wrong, and our first step was to open the Amazon CloudWatch Monitoring Details, which gave us the following chart:

Throttled read requests on our server. Yikes.

As you can see, the “batch-get” request started to go haywire right after 10:00 AM, consuming an ever-growing amount of resources and failing operations, effectively causing an outage of the staging environment on which QA were running load tests.

While the data Amazon provides can tell us which type of operation is causing the issue, it can’t tell us where it is coming from and why is it happening in the first place. It was time to roll up the sleeves and start digging inside the code. Read the full whitepaper to find out what we did next.

(+1 rating, 1 votes)
You need to be a registered member to rate this.
1 Comment Views Tweet it!
Do you want to know how to develop your skillset to become a Java Rockstar?
Subscribe to our newsletter to start Rocking right now!
To get you started we give you our best selling eBooks for FREE!
1. JPA Mini Book
2. JVM Troubleshooting Guide
3. JUnit Tutorial for Unit Testing
4. Java Annotations Tutorial
5. Java Interview Questions
6. Spring Interview Questions
7. Android UI Design
and many more ....
I agree to the Terms and Privacy Policy

1
Leave a Reply

avatar
1 Comment threads
0 Thread replies
0 Followers
 
Most reacted comment
Hottest comment thread
1 Comment authors
Thomas Recent comment authors

This site uses Akismet to reduce spam. Learn how your comment data is processed.

  Subscribe  
newest oldest most voted
Notify of
Thomas
Guest
Thomas

What type of clickbait is that? If you don’t want to share your stuff with the public, don’t post it here! Why do I need to download the information from a third party site?