About Carlo Sciolla

An enterprise software engineer by day and Clojurian, meetup organizer, blogger and biker by night, Carlo is an Open Source enthusiast and passionate of every thing software. Currently working as Product Lead at Backbase, he is also the organizer of the Amsterdam Clojurians meetup and the yearly October Amsterdam Clojure conference.

Java StringBuilder myth debunked

The myth

Concatenating two Strings with the plus operator is the source of all evil

– Anonymous Java dev

NOTE: The source code for the tests discussed here can be found on Github

It’s from university time that I learned to regard String concatenation in Java using the ‘+’ plus operator as a deadly performance sin. Recently there has been an internal review at Backbase R&D where such recurring mantra was dismissed as a myth due to javac using StringBuilder under the hood any time you use the plus operator to join Strings. I set myself up to prove such a point and verify the reality under different environments.

The test

Relying on your compiler to optimize your String concatenation means that things might change heavily depending on the JDK vendor you adopt. As far as platform support goes for my daily job, three main vendors should be considered:

  • Oracle JDK
  • IBM JDK
  • ECJ — for developers only

Moreover, while we officially support Java 5 through 6, we are also looking into supporting Java 7 for our products, adding another three-folded level of indirection on top of the three vendors. For the sake of lazyness simplicity, the ecj compiled bytecode will be run with a single JDK, namely Oracle JDK7. I prepared a Virtualbox VM with all the above JDK installed, then I developed some classes to express three different concatenation methods, amounting to three to four concatenations per method invocaiton, depending on the specific test case. The test classes are run a thousands times for each test round, with a total of 100 rounds each test case. The same VM is used to run all the rounds for the same test case, and it’s restarted across different test cases, all to let the Java runtime perform all the optimizations it can, without affecting the other test cases in any way. The default options were used to start all JVMs. More details can be found in the benchmark runner script.

The code

Full code for both test cases and the test suite is available on Github. The following different test cases were produced to measure performance differences of the String concatenation with plus against the direct use of a StringBuilder:

// String concat with plus
String result = 'const1' + base;
result = result + 'const2';
// String concat with a StringBuilder
new StringBuilder()
              .append('const1')
              .append(base)
              .append('const2')
              .append(append)
              .toString();
}
//String concat with an initialized StringBuilder
new StringBuilder('const1')
              .append(base)
              .append('const2')
              .append(append)
              .toString();

The general idea is to provide a concatenation both at the head and at the tail of constant Strings over a variable. The difference between the last two cases, both making explicit use of StringBuilder, is in the latter using the 1-arg constructor which initializes the builder with the initial part of the result.

The results

Enough talking, down below here you can have a look at the generated graphs, where each data point corresponds to a single test round (e.g. 1000 executions of the same test class). The discussion of the results and some more juicy details will follow.

catplus

catsb

catsb2

The discussion

Oracle JKD5 is the clear loser here, appearing to be in a B league when compared to the others. But that’s not really the scope of this exercise, and thus we’ll gloss over it for the time being. That said, there are two other interesting bits I observe in the above graph. The first is that indeed there is generally quite a difference between the use of the plus operator vs an explicit StringBuilder, especially if you’re using Oracle Java5 which performs tree times worse the the rest of the crew.

The second observation is that while it generally holds for most of the JDKs that an explicit StringBuilder will offer up to twice the speed as the regular plus operator, IBM JDK6 seems not to suffer from any performance loss, always averaging 25ms to complete the task in all test cases. A closer look at the generated bytecode reveals some interesting details

The bytecode

NOTE: the decompiled classes are also available on Github Across all possible JDKs StringBuilders are always used to implement String concatenation even in presence of a plus sign. Moreover, across all vendors and versions, there is almost no difference at all for the same test case. The only one that stands a bit apart is ecj, which is the only one to cleverly optimize the CatPlus test case to invoke the 1-arg constructor of the StringBuilder instead of the 0-arg version.

Comparing the resulting bytecode exposes what could affect performance in the different scnarios:

  • when concatenating with plus, new instances of StringBuilder are created any time a concatenation happens. This can easily result in a performance degradation due to useless invocation of the constructor plus more stress on the garbage collector due to throw away instances
  • compilers will take you literally and only initalize StringBuilder with its 1-arg constructor if and only if you write it that way in the original code. This results in respectively four and three invocations of StringBuilder.append for CatSB and CatSB2.

The conclusion

Bytecode analysis offers the final answer to the original question. Do you need to explicitly use a StringBuilder to improve performance? Yes The above graphs clearly show that, unless you’re using IBM JDK6 runtime, you will loss 50% performance when using the plus operator, although it’s the one to perform slightly worse across the candidates when expliciting StringBuilders. Also, it’s quite interesting to see how JIT optimizations impact the overall performance: for instance, even in presence of different bytecode between the two explicit StringBuilder test cases, the end result is absolutely the same in the long run.

myth-confirmed
Reference: Java StringBuilder myth debunked from our JCG partner Carlo Sciolla at the Skuro blog.

Do you want to know how to develop your skillset to become a Java Rockstar?

Subscribe to our newsletter to start Rocking right now!

To get you started we give you two of our best selling eBooks for FREE!

JPA Mini Book

Learn how to leverage the power of JPA in order to create robust and flexible Java applications. With this Mini Book, you will get introduced to JPA and smoothly transition to more advanced concepts.

JVM Troubleshooting Guide

The Java virtual machine is really the foundation of any Java EE platform. Learn how to master it with this advanced guide!

Given email address is already subscribed, thank you!
Oops. Something went wrong. Please try again later.
Please provide a valid email address.
Thank you, your sign-up request was successful! Please check your e-mail inbox.
Please complete the CAPTCHA.
Please fill in the required fields.

12 Responses to "Java StringBuilder myth debunked"

  1. Why is the title “Java StringBuilder myth debunked”? Didn’t you confirm that the myth is actually true?

  2. Sergei says:

    >> Concatenating two Strings

    May be you should test someone, like this?

    String a = c1.getString();
    String b = c2.getString();
    String c = a + b;

  3. I have a feeling that your unnecessarily splitting a concatenation that can and should be done in a single statement into 3 separate statements has a significant effect on your results here.

    String result = “const1″ + base;
    result = result + “const2″;
    return result + append;

    should just be

    return “const1″ + base + “const2″ + append;

    It’s not surprising that the compiler doesn’t optimize perfectly when given code like that.

    (As an aside, don’t use System.currentTimeMillis() for measuring elapsed time… only System.nanoTime() can be relied on for that. System.currentTimeMillis() is provides wall clock time, which can be adjusted by the system at any time.)

  4. Jan Schoubo says:

    …also I find the more common case has multiple string +’es:

    res = “File: “+filename+” was open for “+n+” sec. but less than “+nw+” records were written”;

    Perhaps that would make the difference less dramatic?

  5. Mike Brock says:

    i’m not sure that I consider this much of a revelation. Your test code using the ‘+’ operator will generate three StringBuilders instead of one. The Java compiler has no optimization where it will determine, across expressions, that its safe to reuse the same StringBuilder instance.

    These results don’t surprise me at all, and I think may lead people to the wrong conclusions about when it is and is not appropriate to use StringBuilder versus the string append operator.

    If you look at an example like this:

    public String getSomeString() {
    return foo + “:” + bar;
    }

    and then:

    public String getSomeString() {
    return new StringBuilder(foo).append(“:”).append(bar);
    }

    … These two versions will produce EXACTLY the same bytecode. But the first version is certainly easier to read.

    But yes, if you do something like this:

    public String getSomeString() {
    String str = foo;
    str += “:”;
    str += bar;
    return str;
    }

    vs something like this:

    public String getSomeString() {
    StringBuilder str = new StringBuilder(foo);
    str.append(“:”);
    str.append(bar);
    return str.toString();
    }

    The second example will be faster. I can easily intuit this, because the decompiled version of the first example will look something like this:

    public String getSomeString() {
    String str = foo;
    StringBuilder a0 = new StringBuilder(str);
    a.append(“:”);
    str = a.toString();
    StringBuilder a1 = new StringBuilder(str);
    a1.append(bar);
    str = a1.toString();
    return str;
    }

    … But I already knew this. =)

    • Good for you that you knew all the details, I’m no expert on compiler optimizations, and for me it’s not obvious what could be faster and what not.

      As an example, ecj shows that something can be improved here and there over the basic approach of always creating fresh StringBuilders on demand. Moreover, I would have expected JIT optimizations to kick in with a somewhat visible impact, and it was a surprise for me not to see any.

      • Mike Brock says:

        The JIT optimizations *do* kick in. But there is not HotSpot operation to coalesce StringBuilders. The dataflow analysis to determine if it was truly safe to do would be prohibitively expensive.

  6. @twitter-14392123:disqus the first writing of the post was a follow up to an internal discussion where the claim to debunk was that compilers are clever enough to optimize string concatenations, relieving programmers from manually create and manage StringBuilders. Re-reading it in its final version I agree that the title could have been adjusted.

  7. Jakob Jenkov says:

    The real performance killer of the + operator is when it is used inside loops. For each iteration of the loop, new StringBuilder instances are created, and turned into String instances using .toString(). Every time a StringBuilder is created from the String being built, all the characters must be copied into the StringBuilder to keep the original String immutable. And, when the StringBuilder is turned into a String again, all the characters are again copied from the StringBuilder into the String, again to make sure that the new String is immutable. As the String grows, the same characters are copied again and again. Each iteration in the loop will be slower than the previous. That is the real performance killer of the + operator. Who cares about a small overhead when doing a single concatenation?

  8. Good explanation. All the time I think about this situation. But I usually like to use str.concat()! But I never had a time to compare wich one is better.

  9. Myth Buster says:

    I like how you titled this “myth debunked” and confirmed it.
    Really helpful! Not confusing at all! Durp

Leave a Reply


seven − = 4



Java Code Geeks and all content copyright © 2010-2014, Exelixis Media Ltd | Terms of Use | Privacy Policy
All trademarks and registered trademarks appearing on Java Code Geeks are the property of their respective owners.
Java is a trademark or registered trademark of Oracle Corporation in the United States and other countries.
Java Code Geeks is not connected to Oracle Corporation and is not sponsored by Oracle Corporation.
Do you want to know how to develop your skillset and become a ...
Java Rockstar?

Subscribe to our newsletter to start Rocking right now!

To get you started we give you two of our best selling eBooks for FREE!

Get ready to Rock!
You can download the complementary eBooks using the links below:
Close