About Tomasz Nurkiewicz

Java EE developer, Scala enthusiast. Enjoying data analysis and visualization. Strongly believes in the power of testing and automation.

Introduction to writing custom collectors in Java 8

Java 8 introduced the concept of collectors. Most of the time we barely use factory methods from Collectors class, e.g. collect(toList())toSet() or maybe something more fancy like counting() or groupingBy(). Not many of us actually bother to look how collectors are defined and implemented. Let’s start from analysing what Collector<T, A, R> really is and how it works.

Collector<T, A, R> works as a “sink” for streams – stream pushes items (one after another) into a collector, which should produce some “collected” value in the end. Most of the time it means building a collection (like toList()) by accumulating elements or reducing stream into something smaller (e.g. counting() collector that barely counts elements). Every collector accepts items of type T and produces aggregated (accumulated) value of type R (e.g. R = List<T>). Generic type A simply defines the type of intermediate mutable data structure that we are going to use to accumulate items of type T in the meantime. Type A can, but doesn’t have to be the same as R – in simple words the mutable data structure that we use to collect items from input Stream<T> can be different than the actual output collection/value. That being said, every collector must implement the following methods:

interface Collector<T,A,R> {
    Supplier<A>          supplier()
    BiConsumer<A,T>      acumulator() 
    BinaryOperator<A>    combiner() 
    Function<A,R>        finisher()
    Set<Characteristics> characteristics()
} 
  • supplier() returns a function that creates an instance of accumulator – mutable data structure that we will use to accumulate input elements of type T.
  • accumulator() returns a function that will take accumulator and one item of type T, mutating accumulator.
  • combiner() is used to join two accumulators together into one. It is used when collector is executed in parallel, splitting input Stream<T> and collecting parts independently first.
  • finisher() takes an accumulator A and turns it into a result value, e.g. collection, of type R. All of this sounds quite abstract, so let’s do a simple example.

Obviously Java 8 doesn’t provide a built-in collector for ImmutableSet<T> from Guava. However creating one is very simple. Remember that in order to iteratively build ImmutableSet we use ImmutableSet.Builder<T> – this is going to be our accumulator.

import com.google.common.collect.ImmutableSet;

public class ImmutableSetCollector<T> 
        implements Collector<T, ImmutableSet.Builder<T>, ImmutableSet<T>> {
    @Override
    public Supplier<ImmutableSet.Builder<T>> supplier() {
        return ImmutableSet::builder;
    }

    @Override
    public BiConsumer<ImmutableSet.Builder<T>, T> accumulator() {
        return (builder, t) -> builder.add(t);
    }

    @Override
    public BinaryOperator<ImmutableSet.Builder<T>> combiner() {
        return (left, right) -> {
            left.addAll(right.build());
            return left;
        };
    }

    @Override
    public Function<ImmutableSet.Builder<T>, ImmutableSet<T>> finisher() {
        return ImmutableSet.Builder::build;
    }

    @Override
    public Set<Characteristics> characteristics() {
        return EnumSet.of(Characteristics.UNORDERED);
    }
}

First of all look carefully at generic types. Our ImmutableSetCollector takes input elements of type T, so it works for any Stream<T>. In the end it will produce ImmutableSet<T> – as expected. ImmutableSet.Builder<T> is going to be our intermediate data structure.

  • supplier() returns a function that creates new ImmutableSet.Builder<T>. If you are not that familiar with lambdas in Java 8, ImmutableSet::builder is a shorthand for () -> ImmutableSet.builder().
  • accumulator() returns a function that takes builder and one element of type T. It simply adds said element to the builder.
  • combiner() returns a function that will accept two builders and turn them into one by adding all elements from one of them into the other – and returning the latter. Finally finisher() returns a function that will turn ImmutableSet.Builder<T> into ImmutableSet<T>. Again this is a shorthand syntax for: builder -> builder.build().
  • Last but not least, characteristics() informs JDK what capabilities our collector has. For example if ImmutableSet.Builder<T> was thread-safe (it isn’t), we could say Characteristics.CONCURRENT as well.

We can now use our custom collector everywhere using collect():

final ImmutableSet<Integer> set = Arrays
        .asList(1, 2, 3, 4)
        .stream()
        .collect(new ImmutableSetCollector<>());

However creating new instance is slightly verbose so I suggest creating static factory method, similar to what JDK does:

public class ImmutableSetCollector<T> implements Collector<T, ImmutableSet.Builder<T>, ImmutableSet<T>> {

    //...

    public static <T> Collector<T, ?, ImmutableSet<T>> toImmutableSet() {
        return new ImmutableSetCollector<>();
    }
}

From now on we can take full advantage of our custom collector by simply typing: collect(toImmutableSet()). In the second part we will learn how to write more complex and useful collectors.

Update

@akarazniewicz pointed out that collectors are just verbose implementation of folding. With my love and hate relationship with folds, I have to comment on that. Collectors in Java 8 are basically object-oriented encapsulation of the most complex type of fold found in Scala, namely GenTraversableOnce.aggregate[B](z: ⇒ B)(seqop: (B, A) ⇒ B, combop: (B, B) ⇒ B): Baggregate() is like fold(), but requires extra combop to combine two accumulators (of type B) into one. Comparing this to collectors, parameter z comes from a supplier()seqop()reduction operation is an accumulator() and combop is a combiner(). In pseudo-code we can write:

finisher(
    seq.aggregate(collector.supplier())
        (collector.accumulator(), collector.combiner()))

GenTraversableOnce.aggregate() is used when concurrent reduction is possible – just like with collectors.

Do you want to know how to develop your skillset to become a Java Rockstar?

Subscribe to our newsletter to start Rocking right now!

To get you started we give you two of our best selling eBooks for FREE!

JPA Mini Book

Learn how to leverage the power of JPA in order to create robust and flexible Java applications. With this Mini Book, you will get introduced to JPA and smoothly transition to more advanced concepts.

JVM Troubleshooting Guide

The Java virtual machine is really the foundation of any Java EE platform. Learn how to master it with this advanced guide!

Given email address is already subscribed, thank you!
Oops. Something went wrong. Please try again later.
Please provide a valid email address.
Thank you, your sign-up request was successful! Please check your e-mail inbox.
Please complete the CAPTCHA.
Please fill in the required fields.

Leave a Reply


nine × 1 =



Java Code Geeks and all content copyright © 2010-2014, Exelixis Media Ltd | Terms of Use | Privacy Policy | Contact
All trademarks and registered trademarks appearing on Java Code Geeks are the property of their respective owners.
Java is a trademark or registered trademark of Oracle Corporation in the United States and other countries.
Java Code Geeks is not connected to Oracle Corporation and is not sponsored by Oracle Corporation.
Do you want to know how to develop your skillset and become a ...
Java Rockstar?

Subscribe to our newsletter to start Rocking right now!

To get you started we give you two of our best selling eBooks for FREE!

Get ready to Rock!
You can download the complementary eBooks using the links below:
Close