Concatenating two Strings with the plus operator is the source of all evil
— Anonymous Java dev
NOTE: The source code for the tests discussed here can be found on Github
It’s from university time that I learned to regard
String concatenation in Java using the ‘+’ plus operator as a deadly performance sin. Recently there has been an internal review at Backbase R&D where such recurring mantra was dismissed as a myth due to
StringBuilder under the hood any time you use the plus operator to join Strings. I set myself up to prove such a point and verify the reality under different environments.
Relying on your compiler to optimize your
String concatenation means that things might change heavily depending on the JDK vendor you adopt. As far as platform support goes for my daily job, three main vendors should be considered:
- Oracle JDK
- IBM JDK
- ECJ — for developers only
Moreover, while we officially support Java 5 through 6, we are also looking into supporting Java 7 for our products, adding another three-folded level of indirection on top of the three vendors. For the sake of
lazyness simplicity, the
ecj compiled bytecode will be run with a single JDK, namely Oracle JDK7. I prepared a Virtualbox VM with all the above JDK installed, then I developed some classes to express three different concatenation methods, amounting to three to four concatenations per method invocaiton, depending on the specific test case. The test classes are run a thousands times for each test round, with a total of 100 rounds each test case. The same VM is used to run all the rounds for the same test case, and it’s restarted across different test cases, all to let the Java runtime perform all the optimizations it can, without affecting the other test cases in any way. The default options were used to start all JVMs. More details can be found in the benchmark runner script.
Full code for both test cases and the test suite is available on Github. The following different test cases were produced to measure performance differences of the String concatenation with plus against the direct use of a
// String concat with plus String result = 'const1' + base; result = result + 'const2';
// String concat with a StringBuilder new StringBuilder() .append('const1') .append(base) .append('const2') .append(append) .toString(); }
//String concat with an initialized StringBuilder new StringBuilder('const1') .append(base) .append('const2') .append(append) .toString();
The general idea is to provide a concatenation both at the head and at the tail of constant
Strings over a variable. The difference between the last two cases, both making explicit use of
StringBuilder, is in the latter using the 1-arg constructor which initializes the builder with the initial part of the result.
Enough talking, down below here you can have a look at the generated graphs, where each data point corresponds to a single test round (e.g. 1000 executions of the same test class). The discussion of the results and some more juicy details will follow.
Oracle JKD5 is the clear loser here, appearing to be in a B league when compared to the others. But that’s not really the scope of this exercise, and thus we’ll gloss over it for the time being. That said, there are two other interesting bits I observe in the above graph. The first is that indeed there is generally quite a difference between the use of the plus operator vs an explicit
StringBuilder, especially if you’re using Oracle Java5 which performs tree times worse the the rest of the crew.
The second observation is that while it generally holds for most of the JDKs that an explicit
StringBuilder will offer up to twice the speed as the regular plus operator, IBM JDK6 seems not to suffer from any performance loss, always averaging 25ms to complete the task in all test cases. A closer look at the generated bytecode reveals some interesting details
NOTE: the decompiled classes are also available on Github Across all possible JDKs
StringBuilders are always used to implement
String concatenation even in presence of a plus sign. Moreover, across all vendors and versions, there is almost no difference at all for the same test case. The only one that stands a bit apart is
ecj, which is the only one to cleverly optimize the
CatPlus test case to invoke the 1-arg constructor of the
StringBuilder instead of the 0-arg version.
Comparing the resulting bytecode exposes what could affect performance in the different scnarios:
- when concatenating with plus, new instances of
StringBuilderare created any time a concatenation happens. This can easily result in a performance degradation due to useless invocation of the constructor plus more stress on the garbage collector due to throw away instances
- compilers will take you literally and only initalize
StringBuilderwith its 1-arg constructor if and only if you write it that way in the original code. This results in respectively four and three invocations of
StringBuilder.appendfor CatSB and CatSB2.
Bytecode analysis offers the final answer to the original question. Do you need to explicitly use a
StringBuilder to improve performance? Yes The above graphs clearly show that, unless you’re using IBM JDK6 runtime, you will loss 50% performance when using the plus operator, although it’s the one to perform slightly worse across the candidates when expliciting
StringBuilders. Also, it’s quite interesting to see how JIT optimizations impact the overall performance: for instance, even in presence of different bytecode between the two explicit
StringBuilder test cases, the end result is absolutely the same in the long run.