Accumulative: Custom Java Collectors Made Easy

Tomasz LinkowskiFebruary 19th, 2019Last Updated: February 19th, 2019

0 85 6 minutes read

Accumulative is an interface proposed for the intermediate accumulation type A of Collector<T, A, R> in order to make defining custom Java Collectors easier.

Introduction

If you’ve ever used Java Streams, you most likely used some Collectors, e.g.:

But have you ever used…

A composed Collector?
- It takes another Collector as a parameter, e.g.: Collectors.collectingAndThen.
A custom Collector?
- Its functions are specified explicitly in Collector.of.

This post is about custom Collectors.

Collector

Let’s recall the essence of the Collector contract (comments mine) :

/**
 * @param <T> (input) element type
 * @param <A> (intermediate) mutable accumulation type (container)
 * @param <R> (output) result type
 */
public interface Collector<T, A, R> {
 
  Supplier<A> supplier(); // create a container
 
  BiConsumer<A, T> accumulator(); // add to the container
 
  BinaryOperator<A> combiner(); // combine two containers
 
  Function<A, R> finisher(); // get the final result from the container
 
  Set<Characteristics> characteristics(); // irrelevant here
}

The above contract is functional in nature, and that’s very good! This lets us create Collectors using arbitrary accumulation types (A), e.g.:

A: StringBuilder (Collectors.joining)
A: OptionalBox (Collectors.reducing)
A: long[] (Collectors.averagingLong)

Proposal

Before I provide any rationale, I’ll present the proposal, because it’s brief. Full source code of this proposal is available as a GitHub gist.

Accumulative Interface

I propose to add the following interface dubbed Accumulative (name to be discussed) to the JDK:

public interface Accumulative<T, A extends Accumulative<T, A, R>, R> {
 
  void accumulate(T t); // target for Collector.accumulator()
 
  A combine(A other); // target for Collector.combiner()
 
  R finish(); // target for Collector.finisher()
}

This interface, as opposed to Collector, is object-oriented in nature, and classes implementing it must represent some mutable state.

Collector.of Overload

Having Accumulative, we can add the following Collector.of overload:

public static <T, A extends Accumulative<T, A, R>, R> Collector<T, ?, R> of(
        Supplier<A> supplier, Collector.Characteristics... characteristics) {
  return Collector.of(supplier, A::accumulate, A::combine, A::finish, characteristics);
}

Average-Developer Story

In this section, I show how the proposal may impact an average developer, who knows only the basics of the Collector API. If you know this API well, please do your best to imagine you don’t before reading on…

Example

Let’s reuse the example from my latest post (simplified even further). Assume that we have a Stream of:

interface IssueWiseText {
  int issueLength();
  int textLength();
}

and that we need to calculate issue coverage:

total issue length
─────────────
total text length

This requirement translates to the following signature:

1	`Collector<IssueWiseText, ?, Double> toIssueCoverage();`

Solution

An average developer may decide to use a custom accumulation type A to solve this (other solutions are possible, though). Let’s say the developer names it CoverageContainer so that:

T: IssueWiseText
A: CoverageContainer
R: Double

Below, I’ll show how such a developer may arrive at the structure of CoverageContainer.

Structure Without Accumulative

Note: This section is long to illustrate how complex the procedure may be for a developer inexperienced with Collectors. You may skip it if you realize this already

Without Accumulative, the developer will look at Collector.of, and see four main parameters:

Supplier<A> supplier
BiConsumer<A, T> accumulator
BinaryOperator<A> combiner
Function<A, R> finisher

To handle Supplier<A> supplier, the developer should:

mentally substitute A in Supplier<A> to get Supplier<CoverageContainer>
mentally resolve the signature to CoverageContainer get()
recall the JavaDoc for Collector.supplier()
recall method reference of the 4th kind (reference to a constructor)
realize that supplier = CoverageContainer::new

To handle BiConsumer<A, T> accumulator, the developer should:

BiConsumer<CoverageContainer, IssueWiseText>
void accept(CoverageContainer a, IssueWiseText t)
mentally transform the signature to an instance-method one
void accumulate(IssueWiseText t)
recall method reference of the 3rd kind (reference to an instance method of an arbitrary object of a particular type)
realize that accumulator = CoverageContainer::accumulate

To handle BinaryOperator<A> combiner:

BinaryOperator<CoverageContainer>
CoverageContainer apply(CoverageContainer a, CoverageContainer b)
CoverageContainer combine(CoverageContainer other)
combiner = CoverageContainer::combine

To handle Function<A, R> finisher:

Function<CoverageContainer, Double>
Double apply(CoverageContainer a)
double issueCoverage()
finisher = CoverageContainer::issueCoverage

This long procedure results in:

class CoverageContainer {
  void accumulate(IssueWiseText t) { }
 
  CoverageContainer combine(CoverageContainer other) { }
 
  double issueCoverage() { }
}

And the developer can define toIssueCoverage() (having to provide the arguments in proper order):

Collector<IssueWiseText, ?, Double> toIssueCoverage() {
  return Collector.of(
          CoverageContainer::new, CoverageContainer::accumulate,
          CoverageContainer::combine, CoverageContainer::finish
  );
}

Structure With Accumulative

Now, with Accumulative, the developer will look at the new Collector.of overload and will see only one main parameter:

Supplier<A> supplier

and one bounded type parameter:

A extends Accumulative<T, A, R>

So the developer will start with the natural thing — implementing Accumulative<T, A, R> and resolving T, A, R for the first and last time:

class CoverageContainer implements Accumulative<IssueWiseText, CoverageContainer, Double> {
 
}

At this point, a decent IDE will complain that the class must implement all abstract methods. What’s more — and that’s the most beautiful part — it will offer a quick fix. In IntelliJ, you hit “Alt+Enter” → “Implement methods”, and… you’re done!

class CoverageContainer implements Accumulative<IssueWiseText, CoverageContainer, Double> {
 
  @Override
  public void accumulate(IssueWiseText issueWiseText) {
     
  }
 
  @Override
  public CoverageContainer combine(CoverageContainer other) {
    return null;
  }
 
  @Override
  public Double finish() {
    return null;
  }
}

So… you don’t have to juggle the types, write anything manually, nor name anything!

Oh, yes — you still need to define toIssueCoverage(), but it’s simple now:

Collector<IssueWiseText, ?, Double> toIssueCoverage() {
  return Collector.of(CoverageContainer::new);
}

Isn’t that nice?

Implementation

The implementation isn’t relevant here, as it’s nearly the same for both cases (diff).

Rationale

Too Complex Procedure

I hope I’ve demonstrated how defining a custom Collector can be a challenge. I must say that even I always feel reluctant about defining one. However, I also feel that — with Accumulative — this reluctance would go away, because the procedure would shrink to two steps:

Implement Accumulative<T, A, R>
Call Collector.of(YourContainer::new)

Drive to Implement

JetBrains coined “the drive to develop“, and I’d like to twist it to “the drive to implement”.

Since a Collector is simply a box of functions, there’s usually no point (as far as I can tell) to implement it (there are exceptions). However, a Google search for “implements Collector” shows (~5000 results) that people do it.

And it’s natural, because to create a “custom” TYPE in Java, one usually extends/implements TYPE. In fact, it’s so natural that even experienced developers (like Tomasz Nurkiewicz, a Java Champion) may do it.

To sum up, people feel the drive to implement, but — in this case — JDK provides them with nothing to implement. And Accumulative could fill this gap…

Relevant Examples

Finally, I searched for examples where it’d be straightforward to implement Accumulative.

In OpenJDK (which is not the target place, though), I found two:

Collectors.reducing (diff)
Collectors.teeing (diff)

On Stack Overflow, though, I found plenty: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53.

I also found a few array-based examples that could be refactored to Accumulative for better readability: a, b, c.

Naming

Accumulative is not the best name, mainly because it’s an adjective. However, I chose it because:

I wanted the name to start with A (as in <T, A, R>),
my best candidate (Accumulator) was already taken by BiConsumer<A, T> accumulator(),
AccumulativeContainer seemed too long.

In OpenJDK, A is called:

which prompts the following alternatives:

AccumulatingBox
AccumulationState
Collector.Container
MutableResultContainer

Of course, if the idea were accepted, the name would go through the “traditional” name bikeshedding

Summary

In this post, I proposed to add Accumulative interface and a new Collector.of overload to the JDK. With them, creating a custom Collector would no longer be associated by developers with a lot of effort. Instead, it’d simply become “implement the contract” & “reference the constructor”.

In other words, this proposal aims at lowering the bar of entering the custom-Collector world!

Appendix

Optional reading below.

Example Solution: JDK 12+

In JDK 12+, we’ll be able to define toIssueCoverage() as a composed Collector, thanks to Collectors.teeing (JDK-8209685):

static Collector<IssueWiseText, ?, Double> toIssueCoverage() {
  return Collectors.teeing(
          Collectors.summingInt(IssueWiseText::issueLength),
          Collectors.summingInt(IssueWiseText::textLength),
          (totalIssueLength, totalTextLength) -> (double) totalIssueLength / totalTextLength
  );
}

The above is concise, but it may be somewhat hard to follow for a Collector API newbie.

Example Solution: the JDK Way

Alternatively, toIssueCoverage() could be defined as:

static Collector<IssueWiseText, ?, Double> toIssueCoverage() {
  return Collector.of(
          () -> new int[2],
          (a, t) -> { a[0] += t.issueLength(); a[1] += t.textLength(); },
          (a, b) -> { a[0] += b[0]; a[1] += b[1]; return a; },
          a -> (double) a[0] / a[1]
  );
}

I dubbed this the “JDK way”, because some Collectors are implemented like that in OpenJDK (e.g. Collector.averagingInt).

Yet, while such terse code may be suitable for OpenJDK, it’s certainly not suitable for business logic because of the level of readability (which is low to the point that I call cryptic).

Published on Java Code Geeks with permission by Tomasz Linkowski, partner at our JCG program. See the original article here: Accumulative: Custom Java Collectors Made Easy

Opinions expressed by Java Code Geeks contributors are their own.

Tomasz LinkowskiFebruary 19th, 2019Last Updated: February 19th, 2019

0 85 6 minutes read

Accumulative: Custom Java Collectors Made Easy

Introduction

Collector

Proposal

Accumulative Interface

Collector.of Overload

Average-Developer Story

Example

Solution

Structure Without Accumulative

Structure With Accumulative

Implementation

Rationale

Too Complex Procedure

Drive to Implement

Relevant Examples

Naming

Summary

Appendix

Example Solution: JDK 12+

Example Solution: the JDK Way

Thank you!

Tomasz Linkowski

Thank you!

Introduction

Collector

Proposal

Accumulative Interface

Collector.of Overload

Average-Developer Story

Example

Solution

Structure Without Accumulative

Structure With Accumulative

Implementation

Rationale

Too Complex Procedure

Drive to Implement

Relevant Examples

Naming

Summary

Appendix

Example Solution: JDK 12+

Example Solution: the JDK Way

Thank you!

Related Articles

Thank you!