Core Java

Beware Of findFirst() And findAny()

After filtering a Java 8 Stream it is common to use findFirst() or findAny() to get the element that survived the filter. But that might not do what you really meant and subtle bugs can ensue.

So What’s Wrong With findFirst() And findAny()?

As we can see from their Javadoc (here and here) both methods return an arbitrary element from the stream – unless the stream has an encounter order, in which case findFirst() returns the first element. Easy.

A simple example looks like this:

public Optional<Customer> findCustomer(String customerId) {
	return customers.stream()
			.filter(customer -> customer.getId().equals(customerId))
			.findFirst();
}

Of course this is just the fancy version of the good old for-each-loop:

public Optional<Customer> findCustomer(String customerId) {
	for (Customer customer : customers)
		if (customer.getId().equals(customerId))
			return Optional.of(customer);
	return Optional.empty();
}

But both variants contain the same potential bug: they are built on the implicit assumption that there can only be one customer with any given ID.

Now, this might be a very reasonable assumption. Maybe this is a known invariant, guarded by dedicated parts of the system, relied upon by others. In that case this is totally fine.

Often the code relies on a unique matching element but does nothing to assert this.

But in many cases I see out in the wild, it is not. Maybe the customers were just loaded from an external source that makes no guarantees about the uniqueness of their IDs. Maybe an existing bug allowed two books with the same ISBN. Maybe the search term allows surprisingly many unforeseen matches (did anyone say regular expressions?).

Often the code’s correctness relies on the assumption that there is a unique element matching the criteria but it does nothing to enforce or assert this.

Worse, the misbehavior is entirely data-driven, which might hide it during testing. Unless we have this scenario in mind, we might simply overlook it until it manifests in production.

Even worse, it fails silently! If the assumption that there is only one such element proves to be wrong, we won’t notice this directly. Instead the system will misbehave subtly for a while before the effects are observed and the cause can be identified.

So of course there is nothing inherently wrong with findFirst() and findAny(). But it is easy to use them in a way that leads to bugs within the modeled domain logic.

Published by Steven Depolo under CC-BY 2.0
Published by Steven Depolo under CC-BY 2.0

Failing Fast

So let’s fix this! Say we’re pretty sure that there’s at most one matching element and we would like the code to fail fast if there isn’t. With a loop we have to manage some ugly state and it would look as follows:

public Optional<Customer> findOnlyCustomer(String customerId) {
	boolean foundCustomer = false;
	Customer resultCustomer = null;
	for (Customer customer : customers)
		if (customer.getId().equals(customerId))
			if (!foundCustomer) {
				foundCustomer = true;
				resultCustomer = customer;
			} else {
				throw new DuplicateCustomerException();
			}
 
	return foundCustomer
			? Optional.of(resultCustomer)
			: Optional.empty();
}

Now, streams give us a much nicer way. We can use the often neglected reduce, about which the documentation says:

Performs a reduction on the elements of this stream, using an associative accumulation function, and returns an Optional describing the reduced value, if any. This is equivalent to:

Stream.reduce

boolean foundAny = false;
T result = null;
for (T element : this stream) {
    if (!foundAny) {
        foundAny = true;
        result = element;
    }
    else
        result = accumulator.apply(result, element);
}
return foundAny ? Optional.of(result) : Optional.empty();

but is not constrained to execute sequentially.

Doesn’t that look similar to our loop above?! Crazy coincidence…

So all we need is an accumulator that throws the desired exception as soon as it is called:

public Optional<Customer> findOnlyCustomerWithId_manualException(String customerId) {
	return customers.stream()
			.filter(customer -> customer.getId().equals(customerId))
			.reduce((element, otherElement) -> {
				throw new DuplicateCustomerException();
			});
}

This looks a little strange but it does what we want. To make it more readable, we should put it into a Stream utility class and give it a nice name:

public static <T> BinaryOperator<T> toOnlyElement() {
	return toOnlyElementThrowing(IllegalArgumentException::new);
}
 
public static <T, E extends RuntimeException> BinaryOperator<T>
toOnlyElementThrowing(Supplier<E> exception) {
	return (element, otherElement) -> {
		throw exception.get();
	};
}

Now we can call it as follows:

// if a generic exception is fine
public Optional<Customer> findOnlyCustomer(String customerId) {
	return customers.stream()
			.filter(customer -> customer.getId().equals(customerId))
			.reduce(toOnlyElement());
}
 
// if we want a specific exception
public Optional<Customer> findOnlyCustomer(String customerId) {
	return customers.stream()
			.filter(customer -> customer.getId().equals(customerId))
			.reduce(toOnlyElementThrowing(DuplicateCustomerException::new));
}

How is that for intention revealing code?

This will materialize the entire stream.

It should be noted that, unlike findFirst() and findAny(), this is of course no short-circuiting operation and will materialize the entire stream. That is, if there is indeed only one element. The processing of course stops as soon as a second element is encountered.

Reflection

We have seen how findFirst() and findAny() do not suffice to express the assumption that there is at most one element left in the stream. If we want to express that assumption and make sure the code fails fast if it is violated, we need to reduce(toOnlyElement()).

Thanks to Boris Terzic for making me aware of this intention mismatch in the first place.

Reference: Beware Of findFirst() And findAny() from our JCG partner Nicolai Parlog at the CodeFx blog.

Nicolai Parlog

Nicolai is a thirty year old boy, as the narrator would put it, who has found his passion in software development. He constantly reads, thinks, and writes about it, and codes for a living as well as for fun.Nicolai is the editor of SitePoint's Java channel, writes a book about Project Jigsaw, blogs about software development on codefx.org, and is a long-tail contributor to several open source projects. You can hire him for all kinds of things.
Subscribe
Notify of
guest

This site uses Akismet to reduce spam. Learn how your comment data is processed.

3 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Michal
Michal
8 years ago

One “hackish” way to achieve almost the same thing with a lot simpler code would be to use an intermediary map: customers.stream() .collect(Collectors.toMap(customer -> customer.getId(), Function.identity())) .get(customerId); This will throw an exception in case of duplicate keys (IllegalStateException) but of course will be way slower than your solution. However, I would say you’re fixing the problem in the wrong place – your “customers” collection should have been a set guaranteeing you won’t have duplicates in the first place. In the rare cases where you’re not looking for identity based condition, stream’s findFirst() API tells you exactly what it will do… Read more »

Nicolai Parlog
8 years ago
Reply to  Michal

Not sure what the big deal is about. There is no big deal. It’s just that there is no ready alternative to ‘find…’ that expresses uniqueness if you require it. So I showed when you might need it and how to create it. I like your approach, too, although it is a little more verbose Another way is to create a specific collector for this. After someone on my blog came up with the idea, I added it to as href=https://gist.github.com/nicolaiparlog/74ac912658f0e11e9057#file-b_findfirstfindanycollecttoonlyelement-java-L58″>the gist. But why is this necessary? Often the code’s correctness relies on the assumption that there is a unique… Read more »

noob
noob
3 years ago

I dont really get OP view point about this.
Optional is applied for One element, so in case your resource contains duplicated ID, thats your dataset, nothing wrong with stream operation like findFirst or findAny.
AFAIK, you want to do 2 things at once (checking duplicated and find matching).
Why not simply separating 2 stream to do 2 thing?

Back to top button