Investigating the HashDoS issue

Christopher MeyerFebruary 24th, 2012Last Updated: October 21st, 2012

0 29 7 minutes read

Nearly one month ago I have written some thoughts on how the HashDoS problem presented at the 28C3 or other code defects could perhaps be fixed temporarily without interaction of vendors.

Now it’s time to deeper investigate the complexity attack and have at look at the sources. I quitely assume that java.util.HashMap and java.util.Hashtable are the most common used data structures of Java which are affected by this attack, so this article will only focus the code behind these types.

A brief kick start to hash functions and indexed data structures

Hash indexed data structures are very popular because of their simple usage and benefits:

no bothering with index tables to find the right position of desired data
access to data by using keywords instead of index numbers
nearly constant time for add or delete operations

To achieve these benefits hash indexed data structures follow a clever idea on how to index data. The index is computed by hashing a keyword which is associated with data behind it. Consider the following example for an easy code-like example:

myHashIndexedDataStructure[hash(keyword)] = particular data

Sounds perfect, but it has one major drawback: the used hash functions are in most cases not cryptograhic ones.

According to Wikipedia the only mandatory characteristic for a function to call itself hash function is to

“map[s] large data sets of variable length, called keys, to smaller data sets of a fixed length”

In contrast to call itself a cryptographic hash function (once again, a definition taken from Wikipedia) it has to fulfill further, even much stronger, requirements:

”

it is easy (but not necessarily quick) to compute the hash value for any given message
it is infeasible to generate a message that has a given hash
it is infeasible to modify a message without changing the hash
it is infeasible to find two different messages with the same hash

”
To make a long story short let’s summarize what we learned and what’s to conclude with this knowledge:

hash indexed data structures utilize hash functions
hash functions are not necessarily collision resistant as long as they are not cryptographic ones
lack of collision resistance implies means to easily compute multiple values with the same hash value

In case of colliding keywords a hash indexed data structure needs some kind of a plan b) – a fall-back algorithm – on how to handle multiple data sets with equal hash values of the keywords .

In practice are several approaches feasible:

probing (shift to fixed or computable intervals)
multiple hashing
entry chaining (building lists of colliding entries)
overriding existing entries

Which of these strategies takes Java? At first we will inspect the code of java.util. Hashtable (only the interesting parts are presented, the rest of the code is left out for clarity reasons:

public synchronized V put(K key, V value) { 
   ... 
   // Makes sure the key is not already in the hashtable. 
   Entry tab[] = table; 
   int hash = key.hashCode(); 
   int index = (hash & 0x7FFFFFFF) %tab.length; 
   for (Entry<K,V> e = tab[index] ; e != null ; e = e.next) { 
      if((e.hash == hash) && e.key.equals(key)) { 
         V old = e.value; 
         e.value = value; 
         return old; 
      } 
   } 
   ... 
   // Creates the new entry. 
   Entry<K,V> e = tab[index]; 
   tab[index] = new Entry<>(hash, key, value, e); 
   count++; 
   return null; 
}

As can be seen this class uses the hashCode() function of the key object (the keyword) to compute the hash value. It follows an ANDing (& operator), for proper representation as Integer, MODULO (% operator) the table size (building a cyclic ring structure: (table.length + 1) mod table.length ? 1, division with remainders) to always address an entry in tab[].
After this all Entry(-ies) are considered and checked if the hash values are identical and the object itself is identical. The if-clause prevents storing of multiple instances of an identical object – the older one is simply replaced by the newer.

In case no identical object (concerning hash value and equals() method) was found at the current position identified by key.hashCode() a new Entry will be created, placed at the current position and handled the old Entry object at this position.
So far it looks like java.util.Hashtable is using some kind of list as data structure behind each tab[].

This assumption is confirmed when looking at the code for java.util.Hashtable.Entry<K,V> which is a private inner class.

private static class Entry<K,V> implements Map.Entry<K,V> { 
   int hash; 
   K key; 
   V value; 
   Entry<K,V> next;

The next Entry-object simply points to the following Entry. This represents a custom built linked list.
The code of java.util.HashMap is more complex and behaves partly different (null values are allowed, !unsynchronized!), but is based on the same idea. Investigating the code here will not reveal anything new, except the fact that Entry is reimplemented once again…).

Both implementations rely on linked lists behind each entry of the hash indexed array.

Attack idea

Now that we know the implementation details behind java.util.Hashtable and java.util.HashMap we could come back to the attack referred as HashDoS. The attack implements the idea of Crosby, S.A., Wallach, D.S.: Denial of Service via Algorithmic Complexity Attacks. In: Proceedings of the 12th conference on USENIX Security Symposium – Volume 12, USENIX Association (2003)
To summarize: a hash indexed data structure can be extremely slowed down by provoking an unfavorable state. The ideal hash indexed data structure looks like this:

... 
   table[hash(keyA)] = dataA 
   table[hash(keyB)] = dataB 
   table[hash(keyC)] = dataC 
   table[hash(keyD)] = dataD 
   ...

In this case time for altering, deleting or adding data with a keyword with different hash value is nearly constant. The position can easily be found by using the hash value of the keyword as index and the data is instantly present without iterating a list.
Let’s have a look at a different, unfavorable state of a hash indexed data structure:

...     
   hash(keyA) == hash(keyB) == hash(keyC) == hash(keyD)
   = k 
   
   table[k] = dataA -> dataB -> dataC -> dataD 
   ...

With a structure like this times of constant time for CRUD operations are over… It is necessary to

compute the hash value for the keyword
iterate through the linked list
compare each entry’s keyword if it matches the one the application is looking for

This significantly slows down the processing thread. A very fast data structure has turned into a linked list with additional overhead (computing a hash value). All benefits of a hash indexed data structure are wiped. As if it isn’t bad enough most hash indexed data structures enable a feature called rehashing. When the data structure exceeds a defined load (in Java for example 75%) the table is rehashed for optimization reasons. Most times this feature is absolutely desirable, but in this special case it even more slows down the whole process.

Exploiting the problem

To exploit this behavior it is necessary to compute a whole bunch of colliding keywords. If we assume the keyword to be of type java.lang.String for example we may have a look at its hashCode() function:

public int hashCode() { 
      int h = hash; 
      if (h == 0) { 
        int off = offset; 
        char val[] = value; 
        int len = count; 
   
        for (int i = 0; i < len; i++) { 
         h = 31*h + val[off++]; 
        } 
        hash = h; 
      } 
      return h; 
   }

This seems to be a customized version of a function called DJBX33A, designed by D.J. Bernstein where collisions can easily be found.
The function has an interesting property which will be demonstrated on the following example:

"0w".hashCode() = 1607 
   "1X".hashCode() = 1607 
   "29".hashCode() = 1607 
   "0w1X".hashCode() = 1545934 
   "0w29".hashCode() = 1545934 
   "1X0w".hashCode() = 1545934 
   "1X29".hashCode() = 1545934 
   "290w".hashCode() = 1545934 
   "291X".hashCode() = 1545934 
   ...

We see that concatenation of colliding values results again in colliding values. We can do this further on and get a huge collection of colliding keywords. This makes finding collisions even more easy than simply bruteforcing.

We tested this against a local web service and could significantly slow down the running web application server by using the colliding keywords as tag attributes.
I’m uncertain if one could actually crash the machine or if there is some kind of non-obvious mechanism to prevent the server from killing itself (we have not investigated the processing code at server side), but one can definitely prevent the server from acting properly in an acceptable period of time. Requests to the web service can so easily be delayed.

Maybe I will put some effort in collection measurement data (#colliding keys – system response time) in the near future. If I do, you will find the data on this blog…

Corner points to take with you

never rely on hashCode() alone – it can be error prone
avoid things like if(password.hashCode() == referencePassword.hashCode()) { return godMode; } else { return simpleMode; }
spend a few seconds on implementation details when making your decision for/against a data type/structure
filter incoming data – crop its size, reject overlong parameters etc.
be careful and always look out for coding best practices!

Further interesting points to look at

In this example we used java.lang.String as keyword objects. It would be interesting what else could be used and where at the JRE code or at heavy used projects colliding hash values are used for data structuring or even worse purposes.
One could have a look at how Object.hashCode() is implemented (it’s native code) – this would be a great target since all other objects extend this base class. If an extending class does not override the hashCode() function and relies on proper, collision-free output this could be useful for more sophisticated attacks. Consider what happens if serialization relies on corresponding code….

If anybody already knows some vulnerable places please let us know! We are very interested, but couldn’t look as deep as we would like to, due to lack of time.

Thanks

Once again I would like to thank Juraj Somorovsky for rich joint research work! Further on, our thanks go to Andrea Barisani from the oCERT team and Vincent Danen from the Red Hat Security Response Team who discussed this problem with us!

Reference: Investigating the HashDoS issue from our JCG partner Christopher Meyer at the Java security and related topics blog.