Guava Splitter vs StringUtils

So I recently wrote a post about good old reliable Apache Commons StringUtils, which provoked a couple of comments, one of which was that Google Guava provides better mechanisms for joining and splitting Strings. I have to admit, this is a corner of Guava I’ve yet to explore. So thought I ought to take a closer look, and compare with StringUtils, and I have to admit I was surprised at what I found.

Splitting strings eh? There can’t be many different ways of doing this surely?

Well Guava and StringUtils do take a sylisticly different approach. Lets start with the basic usage.
 

// Apache StringUtils...
String[] tokens1 = StringUtils.split('one,two,three',',');

// Guava splitter...
Iterable<String> tokens2 = Splitter.on(',').split('one,two,three');

So, my first observation is that Splitter is more object orientated. You have to create a splitter object, which you then use to do the splitting. Whereas the StringUtils splitter methods uses a more functional style, with static methods.

Here I much prefer Splitter. Need a reusable splitter that splits comma separated lists? A splitter that also trims leading and trailing white space, and ignores empty elements? Not a problem:

Splitter niceCommaSplitter = Splitter.on(',')
                              .omitEmptyString()
                              .trimResults();

niceCommaSplitter.split('one,, two,  three'); //'one','two','three'
niceCommaSplitter.split('  four  ,  five  '); //'four','five'

That looks really useful, any other differences?

The other thing to notice is that Splitter returns an Iterable<String>, whereas StringUtils.split returns a String array.

Don’t really see that making much of a difference, most of the time I just want to loop through the tokens in order anyway!

I also didn’t think it was a big deal, until I examined the performance of the two approaches. To do this I tried running the following code:

final String numberList = 'One,Two,Three,Four,Five,Six,Seven,Eight,Nine,Ten';

long start = System.currentTimeMillis();  
for(int i=0; i<1000000; i++) {
    StringUtils.split(numberList , ',');   
}
System.out.println(System.currentTimeMillis() - start);

start = System.currentTimeMillis();
for(int i=0; i<1000000; i++) {
    Splitter.on(',').split(numberList );
}
System.out.println(System.currentTimeMillis() - start);

On my machine this output the following times:

594
31

Guava’s Splitter is almost 10 times faster!

Now this is a much bigger difference than I was expecting, Splitter is over 10 times faster than StringUtils. How can this be? Well, I suspect it’s something to do with the return type. Splitter returns an Iterable<String>, whereas StringUtils.split gives you an array of Strings! So Splitter doesn’t actually need to create new String objects.

It’s also worth noting you can cache your Splitter object, which results in an even faster runtime.

Blimey, end of argument? Guava’s Splitter wins every time?

Hold on a second. This isn’t quite the full story. Notice we’re not actually doing anything with the result of the Strings? Like I mentioned, it looks like the Splitter isn’t actually creating any new Strings. I suspect it’s actually deferring this to the Iterator object it returns.

So can we test this?

Sure thing. Here’s some code to repeatedly check the lengths of the generated substrings:

final String numberList = 'One,Two,Three,Four,Five,Six,Seven,Eight,Nine,Ten';
long start = System.currentTimeMillis();  
for(int i=0; i<1000000; i++) {
  final String[] numbers = StringUtils.split(numberList, ',');
    for(String number : numbers) {
      number.length();
    }
  }
System.out.println(System.currentTimeMillis() - start);

Splitter splitter = Splitter.on(',');
start = System.currentTimeMillis();
for(int i=0; i<1000000; i++) {
  Iterable<String> numbers = splitter.split(numberList);
    for(String number : numbers) {
      number.length();
    }
  }
System.out.println(System.currentTimeMillis() - start);

On my machine this outputs:

609
2048

Guava’s Splitter is almost 4 times slower!

Indeed, I was expecting them to be about the same, or maybe Guava slightly faster, so this is another surprising result. Looks like by returning an Iterable, Splitter is trading immediate gains, for longer term pain. There’s also a moral here about making sure performance tests are actually testing something useful.

In conclusion I think I’ll still use Splitter most of the time. On small lists the difference in performance is going to be negligible, and Splitter just feels much nicer to use. Still I was surprised by the result, and if you’re splitting lots of Strings and performance is an issue, it might be worth considering switching back to Commons StringUtils.
 

Reference: Guava Splitter vs StringUtils from our JCG partner Tom Jefferys at the Tom’s Programming Blog blog.

Related Whitepaper:

Bulletproof Java Code: A Practical Strategy for Developing Functional, Reliable, and Secure Java Code

Use Java? If you do, you know that Java software can be used to drive application logic of Web services or Web applications. Perhaps you use it for desktop applications? Or, embedded devices? Whatever your use of Java code, functional errors are the enemy!

To combat this enemy, your team might already perform functional testing. Even so, you're taking significant risks if you have not yet implemented a comprehensive team-wide quality management strategy. Such a strategy alleviates reliability, security, and performance problems to ensure that your code is free of functionality errors.Read this article to learn about this simple four-step strategy that is proven to make Java code more reliable, more secure, and easier to maintain.

Get it Now!  

4 Responses to "Guava Splitter vs StringUtils"

  1. oussama zoghlami says:

    except guava’s Splitter and Joiner, i think that StringUtils is more richer. It’s time for guava team to improve their ‘Strings’ utility class ;)

  2. It’s worth mentioning that the Splitter Iterator delays the actual splitting, whereas StringUtils.split does it all up front. This may make a difference in certain use cases, like when searching for the first match and still needing all the preceding values, but not the following ones, or when you only need a subset of the return values and never store the others to variables. It’s also a boon when parsing large strings as it doesn’t have to store the whole array in memory at one time. There might also be cases where returning an Iterator makes the code simpler than having to wrap an array with one. Sometimes maintenance cost is worth more than actual performance, especially if it’s not in the critical path.

    Granted it’s probably true that most splitting is done on short strings and optimizing on edge cases may not be the best idea for general purpose tools. But now you have two tools that are good at two different scenarios!

  3. assylias says:

    Beware of micro benchmarks: http://stackoverflow.com/questions/504103/how-do-i-write-a-correct-micro-benchmark-in-java For example, it is conceivable that the second loop in your first example is simply ignored by the JVM because it does not have any side effects. There are many factors that could significantly affect your results.

  4. Sam_Sonite says:

    “There’s also a moral here about making sure performance tests are actually testing something useful.”

    Maybe follow your own advice. How is your test useful?

    Also, were your tests written in Groovy. All the strings have single quotes?

    here are my results if you sum the lengths and print it out:

    run:
    39000000: 375
    39000000: 427

    summing the lengths with a leading whitespace in one value

    run:
    40000000: 357
    40000000: 436

    … and trimming the results
    run:
    39000000: 456
    39000000: 586

    Not nearly as dramatic as you exclaim. Now, this difference is over 1 million trial so the only question I would think would be one of style. If you are using StringUtils and not using guava anywhere else is it worth loading the jar just to split a string with a more functional vs imperative style? Vice-versa if you are not using StringUtils. is it worth switching for negligible impact? You might even make the same argument for using java.lang.String’s own split.

Leave a Reply


− 1 = four



Java Code Geeks and all content copyright © 2010-2014, Exelixis Media Ltd | Terms of Use | Privacy Policy
All trademarks and registered trademarks appearing on Java Code Geeks are the property of their respective owners.
Java is a trademark or registered trademark of Oracle Corporation in the United States and other countries.
Java Code Geeks is not connected to Oracle Corporation and is not sponsored by Oracle Corporation.

Sign up for our Newsletter

20,709 insiders are already enjoying weekly updates and complimentary whitepapers! Join them now to gain exclusive access to the latest news in the Java world, as well as insights about Android, Scala, Groovy and other related technologies.

As an extra bonus, by joining you will get our brand new e-books, published by Java Code Geeks and their JCG partners for your reading pleasure! Enter your info and stay on top of things,

  • Fresh trends
  • Cases and examples
  • Research and insights
  • Two complimentary e-books