The Truth Behind the Big Exceptions Lie

Alex ZhitnitskyJune 21st, 2016Last Updated: June 15th, 2016

0 59 6 minutes read

Exceptions are probably the most misused Java language feature. Here’s why

Let’s break some myths. There is no tooth fairy. Santa isn’t real. TODO comments. finalfinalversion-final.pdf. Soapless soap. And… Exceptions are in fact exceptions. The latter might need some more convincing, but we got you covered.

For this post, we asked Avishai Ish-Shalom, an experienced systems architect and a longtime friend of the blog (most importantly, a big fan of furry hats), to join us for a quick chat about the current state of exceptions in Java applications. Here’s what we found out.

Exceptions are by definition far from normal

Let’s kick off with a quote from the official Java documentation: “An exception is an event that occurs during the execution of a program that DISRUPTS the normal flow of instructions”. Honest disclosure: we’ve added the caps ourselves.

In practice, the normal flow of instructions in most applications is filled with “normal” recurrences of these so called “normal” exceptions, that cause “normal” disruptions.

There’s a increasing high level of noise in most applications, with exceptions thrown, logged, then indexed and analyzed which… are mostly meaningless.

This operational noise, apart from creating unnecessarily stress on the system, makes you lose touch with the exceptions that really matter. Imagine an eCommerce application with a new important exception that started happening, signalling that something has gone wrong and affected, say, a 100 users aren’t able to checkout. Now, cover it up with thousands of useless “normal” exceptions and try to understand what went wrong.

For example, most applications have a “normal” level of error events. In this following screenshot, we can see it’s about 4k events per hour:

Takipi’s error analysis dashboard – Error trends

If we’re “lucky”, a new error would show itself as a spike in the graph, like we have right here with an IllegalStateException occurring hundreds of thousands of times around 1am (Ouch). We can immediately see what caused a spike.

The green line indicates the total number of events, and the rest of the lines indicate specific exceptions and logged errors / warnings.

The danger comes from exceptions with only a few, small, but lethal instances that are buried within the so called “normal” level of exception.

What at are these “normal” exceptions you’re talking about?

Unlike real errors that require code changes to fix, exceptions today indicate a plethora of other scenarios that really don’t carry any actionable insights. They only weigh down on the system. Consider these 2 scenarios that any experienced developer can anticipate:

Business Errors – Anything the user / data might do which the business flow does not permit. Like any kind of form validation, filling in text inside a phone number form field, checking out with an empty cart, etc. Internally as well, NumberFormatException reached rank #2 out of the top 10 exceptions in our latest post covering a research of over 1B in production environments.
System Errors – Anything you ask from the OS and it might say no, things that are out of your control. Like, trying to access a file you don’t have permissions for.

Real exceptions on the other hand, are things you weren’t aware of when writing the code, like an OutOfMemoryException, or even a NullPointerException that messes things up unexpectedly. Issues that require you to take action to resolve them.

Exceptions are designed to crash & burn

Uncaught exceptions kill your thread, and might even crash the whole application or put it in some “zombie state” when an important thread is dead and the rest are stuck waiting for it. Some applications know how to handle that, most don’t.

The exception’s main purpose in Java is to help you catch the bug and solve it, not crossing lines into application logic land. They were meant to help in debugging which is why they try to contain as much info as possible from the application’s perspective.

Another issue this can create is inconsistent state, when the application flow gets… jumpy, it’s even worse than a goto statement. It has the same shortcomings, with some twists of its own:

It breaks the flow of the program
It’s hard to track and understand what will happen next
Hard to cleanup, even with finally blocks
Heavyweight, unlike “goto”, it carries all the stack and additional extra data with it

Use “error” flows without exceptions

If you try to use an exception to deal with predictable situations that should be handled by application logic, you’re in trouble. The same trouble most Java applications are in.

Issues that can be expected to happen, aren’t really exceptions by the book. An interesting solution comes from Futures in Scala – handling errors without exceptions. Scala example from official scala docs:

import scala.util.{Success, Failure}

val f: Future[List[String]] = Future {
    session.getRecentPosts
}

f onComplete {
    case Success(posts) => for (post <- posts) println(post)
    case Failure(t) => println("An error has occured: " + t.getMessage)
}

Exceptions may be thrown by the code run inside the future, but they are contained and don’t leak outside. The possibility of failure is made explicit by the Failure(t) branch and it’s very easy to follow code execution.

In the new Java 8 CompletableFuture feature (of which we just recently wrote), we can use completeExceptionally() although it’s not as pretty.

The plot gets thicker with APIs

Let’s say we have a system that uses a library for database access, how would the DB library expose its errors to the outside world? Welcome to the wild wild west. And keep in mind the library may still throw generic errors, like java.net.UnknownHostException or NullPointerException

One real life example of how this can go wrong is a library that wraps JDBC, and just throws a generic DBException without giving you a chance to know what’s wrong. Maybe it’s all just fine and there’s just a connectivity error, or maybe… you actually need to change some code.

A common solution is the DB library using a base exception, say, DBException, from which library exceptions inherit. This allows the library user to catch all library errors with one try block. But what about the system errors that may have caused the library to err? The common solution is to wrap any exception happening inside it. So if it’s unable to resolve a DNS address, which is more of a system error then a library error, it will catch it and rethrow this higher level exception – which the user of the library should know to catch. Try-catch nightmare, with a hint of nested exceptions wrapping other exceptions.

If we put Actors into the mix, the control flow even gets messier. Async programming with exceptions is a mess. It can kill an Actor, restart it, a message will be sent to some other Actor with the original error and you lose the stack.

So… What can you do about it?

Starting from scratch and avoiding unnecessary exceptions is always easier, however most likely that it’s not the case. With an existing system, like a 5 year old application, you’re in for a lot of plumbing work (If you’re lucky, and get managerial approval to fix the noise).

Ideally we’d want all exceptions to be actionable, meaning, drive actions that would prevent them from happening again, and not just acknowledge that these things sometimes happen.

To sum up, un-actionable exceptions cause a lot of mess around:

Performance
Stability
Monitoring / log analysis
And… Hide real exceptions that you want to see and act on

The solution is… doing the hard work of pruning away the noise and creating control flows that make more sense. Another creative solution is changing the log levels, if it’s not an actionable exception, don’t log it as an error. That’s only a cosmetic solution but might get you to 80% of the work.

Ultimately, logs and dashboards are only cosmetics, there’s a need to fix the issue at its core and avoid unactionable exceptions altogether.

At Takipi, we recently found that on average 97% of logged errors come from the top 10 unique errors. To check out the current state of exceptions and logged errors in your application, attach the Takipi agent and you’ll have a complete understanding of how code behaves in your production environment (and how to fix it) in a matter of minutes. Check it out.

Final Thoughts

The bottom line is, do you have an Exception that doesn’t result in code changes? You shouldn’t even be wasting time looking at it.

This post is based on a lightning talk that Avishai did called “Actionable Exceptions”:

Reference:

The Truth Behind the Big Exceptions Lie from our JCG partner Alex Zhitnitsky at the Takipi blog.

Alex ZhitnitskyJune 21st, 2016Last Updated: June 15th, 2016

0 59 6 minutes read

The Truth Behind the Big Exceptions Lie

Exceptions are by definition far from normal

What at are these “normal” exceptions you’re talking about?

Exceptions are designed to crash & burn

Use “error” flows without exceptions

The plot gets thicker with APIs

So… What can you do about it?

Final Thoughts

Thank you!

Alex Zhitnitsky

Thank you!

Exceptions are by definition far from normal

What at are these “normal” exceptions you’re talking about?

Exceptions are designed to crash & burn

Use “error” flows without exceptions

The plot gets thicker with APIs

So… What can you do about it?

Final Thoughts

Thank you!

Related Articles

Thank you!