Why I distrust wildcards and why we need them anyway

Gavin KingAugust 4th, 2014Last Updated: August 3rd, 2014

0 24 6 minutes read

In any programming language that combines subtype polymorphism (object orientation) with parametric polymorphism (generics), the question ofvariance arises. Suppose I have a list of strings, type List<String>. Can I pass that to a function which accepts List<Object>? Let’s start with this definition:

interface List<T> {
    void add(T element);
    Iterator<T> iterator();
    ...
}

Broken covariance

Intuitively, we might at first think that this should be allowed. This looks OK:

void iterate(List<Object> list) {
    Iterator<Object> it = list.iterator();
    ...
}
iterate(ArrayList<String>());

Indeed, certain languages, including Eiffel and Dart do accept this code. Sadly, it’s unsound, as can be seen in the following example:

//Eiffel/Dart-like language with
//broken covariance:
void put(List<Object> list) {
    list.add(10);
}
put(ArrayList<String>());

Here we pass a List<String> to a function accepting List<Object>, which attempts to add an Integer to the list.

Java makes this same mistake with arrays. The following code compiles:

//Java:
void put(Object[] list) {
    list[0]=10;
}
put(new String[1]);

It fails at runtime with an ArrayStoreException.

Use-site variance

Java takes a different approach, however, for generic class and interface types. By default, a class or interface type is invariant, which is to say, that:

is assignable to L<V> if and only if U is exactly the same type as V.

Since this is extremely inconvenient much of the time, Java supports something called use-site variance, where:

L<U> is assignable to L<? extends V> if U is a subtype of V, and
L<U> is assignable to L<? super V> if U is a supertype of V.

The ugly syntax ? extends V or ? super V is called a wildcard. We also say that:

L<? extends V> is covariant in V, and that
L<? super V> is contravariant in V.

Since Java’s wildcard notation is so ugly, we’re not going to use it anymore in this discussion. Instead, we’ll write wildcards using the keywords inand out for contravariance and covariance respectively. Thus:

L<out V> is covariant in V, and
L<in V> is contravariant in V.

A given V is called the bound of the wildcard:

out V is an upper-bounded wildcard, and V is its upper bound, and
in V is a lower-bounded wildcard, and V is its lower bound.

In theory, we could have a wildcard with both an upper and lower bound, for example, L<out X in Y>.
We can express multiple upper bounds or multiple lower bounds using an intersection type, for example, L<out U&V> or L<in U&V>.
Note that the type expressions L<out Anything> and L<in Nothing> refer to exactly the same type, and this type is a supertype of all instantiations of L.
You’ll often see people refer to wildcarded types as existential types. What they mean by this is that if I know that list is of type List<out Object>:

List<out Object> list;

Then I know that there exists an unknown type T, a subtype of Object, such that list is of type List<T>.
Alternatively, we can take a more Ceylonic point of view, and say that List<out Object> is the union of all types List<T> where T is a subtype ofObject.
In a system with use-site variance, the following code does not compile:

void iterate(List<Object> list) {
    Iterator<Object> it = list.iterator();
    ...
}
iterate(ArrayList<String>()); //error: List<String> not a List<Object>

But this code does:

void iterate(List<out Object> list) {
    Iterator<out Object> it = list.iterator();
    ...
}
iterate(ArrayList<String>());

Correctly, this code does not compile:

void put(List<out Object> list) {
    list.add(10); //error: Integer is not a Nothing
}
put(ArrayList<String>());

Now we’re at the entrance to the rabbit hole. In order to integrate wildcarded types into the type system, while rejecting unsound code like the above example, we need a much more complicated algorithm for type argument substitution.

Member typing in use-site variance

That is, when we have a generic type like List<T>, with a method void add(T element), instead of just straightforwardly substituting Object forT, like we do with ordinary invariant types, we need to consider the variance of the location in which the type parameter occurs. In this case, Toccurs in a contravariant location of the type List, namely, as the type of a method parameter. The complicated algorithm, which I won’t write down here, tells us that we should substitute Nothing, the bottom type, in this location.
Now imagine that our List interface has a partition() method with this signature:

interface List<T> {
    List<List<T>> partition(Integer length);
    ...
}

What is the return type of partition() for a List<out Y>? Well, without losing precision, it is:

List<in List<in Y out Nothing> out List<in Nothing out Y>>

Ouch.
Since nobody in their right mind wants to have to think about types like this, a sensible language would throw away some of those bounds, leaving something like this:

List<out List<out Y>>

Which is vaguely acceptable. Sadly, even in this very simple case, we’re already well beyond the point where the programmer can easily follow along with what the typechecker is doing.
So here’s the essence of why I distrust use-site variance:

A strong principle in the design of Ceylon is that the programmer should always be able to reproduce the reasoning of the compiler. It is verydifficult to reason about some of the complex types that arise with use-site variance.
It has a viral effect: once those wildcard types get a foothold in the code, they start to propagate, and it’s quite hard to get back to my ordinary invariant types.

Declaration-site variance

A much saner alternative to use-site variance is declaration-site variance, where we specify the variance of a generic type when we declare it. This is the system we use in Ceylon. Under this system, we need to split List into three interfaces:

interface List<out T> {
     Iterator<T> iterator();
     List<List<T>> partition(Integer length);
     ...
}
 
interface ListMutator<in T> {
    void add(T element);
}
 
interface MutableList<T>
    satisfies List<T>&ListMutator<T> {}

List is declared to be a covariant type, ListMutator a contravariant type, and MutableList an invariant subtype of both.
It might seem that the requirement for multiple interfaces is a big disadvantage of declaration-site variance, but it often turns out to be useful to separate mutation from read operations, and:

mutating operations are very often invariant, whereas
read operations are very often covariant.

Now we can write our functions like this:

void iterate(List<Object> list) {
    Iterator<Object> it = list.iterator();
    ...
}
iterate(ArrayList<String>());
 
void put(ListMutator<Integer> list) {
    list.add(10);
}
put(ArrayList<String>()); //error: List<String> is not a ListMutator<Integer>

You can read more about declaration-site variance here.

Why we need use-site variance in Ceylon

Sadly, Java doesn’t have declaration-site variance, and clean interoperation with Java is something that is very important to us. I don’t like adding a major feature to the typesystem of our language purely for the purposes of interoperation with Java, and so I’ve resisted adding wildcards to Ceylon for years. In the end, reality and practicality won, and my stubborness lost. So Ceylon 1.1 now features use-site variance with single-bounded wildcards.
I’ve tried to keep this feature as tightly constrained as possible, with just the minimum required for decent Java interop. That means that, like in Java:

there are no double-bounded wildcards, of form List<in X out Y>, and
a wildcarded type can not occur in the extends or satisfies clause of a class or interface definition.

Furthermore, unlike Java:

there are no implicitly-bounded wildcards, upper bounds must always be written in explicitly, and
there is no support for wildcard capture.

Wildcard capture is a very clever feature of Java, which makes use of the “existential” interpretation of a wildcard type. Given a generic function like this one:

List<T> unmodifiableList<T>(List<T> list) => ... :

Java would let me call unmodifiableList(), passing a wildcarded type like List<out Object>, returning another wildcarded List<out Object>, reasoning that there is some unknown X, a subtype of Object for which the invocation would be well-typed. That is, this code is considered well-typed, even though the type List<out Object> is not assignable to List<T> for any T:

List<out Object> objects = .... ;
List<out Object> unmodifiable = unmodifiableList(objects);

In Java, typing errors involving wildcard capture are almost impossible to understand, since they involve the unknown, and undenoteable, type. I have no plans to add support for wildcard capture to Ceylon.

Try it out

Use-site variance is already implemented and already works in Ceylon 1.1, which you can get from GitHub, if you’re super-motivated.
Even though the main motivation for this feature was great Java interop, there will be other, hopefully rare, occasions where wildcards will be useful. That doesn’t, however, indicate any significant shift in our approach. We will continue using declaration-site variance in the Ceylon SDK except in extreme cases.

UPDATE:
I just realized I forgot to say thanks to Ross Tate for helping me with the finer points of the member typing algorithm for use site variance. Very tricky stuff that Ross knows off the top of his head!

Reference:

Why I distrust wildcards and why we need them anyway from our JCG partner Gavin King at the Ceylon Team blog blog.