Core Java

Writing 2 Characters into a Single Java char

Here’s another nice trick we used when creating the ultra low latency Chronicle FIX-Engine.

When it comes to reading data off a stream of bytes it’s way more efficient, if possible, to store data in a char rather than having to read it into a String.  (At the very least you are avoiding creating a String object, although this can be mitigated by using a cache or working with CharSequence rather than String but that’s the subject of another post.)

Using JMH benchmarks I’ve found these timings: (I haven’t included the source code for this as this is going to be the subject of another post where I describe the different methodologies in more detail).

Reading 2 ascii characters off a bytes stream into:

String - 34.48ns
Pooled String - 28.57ns
StringBuilder - 21.27ns
char (using 2 chars method) - 6.75ns

The point is that it takes at least 3 times longer to read data into a String than a char, and that doesn’t even take into account the garbage created.

So it goes without saying that when you know that you are expecting data that is always a single character, rather than reading that data into a String variable you should read it into a char.

Now what if you know that that data you are expecting on the stream is no more than 2 characters. (You find this situation, for example in FIX 5.0 tag 35 msgType). Do you have to use a String so that you can accommodate the extra character?  At first thoughts it appears so, after all a char can only contain a single character.

Or can it?

A java char is made up of 2 bytes not one.  Therefore if you know that your data is made up of ascii characters you know that only a single byte (of the 2 bytes in the char) will be used. For example ‘A’ is 65 though to ‘z’ which is 122.

You can print out the values that fit into a single byte with this simple loop:

for (int i = 0; i < 256; i++) {
    char c = (char)i;
    System.out.println(i+ ":" + c);
}

You are now free to use the other bye of the char to hold the second ascii character.

This is the way to do it:

In this example you have read 2 bytes ‘a’ and ‘b’ and want to store them in a single char.

byte a = (byte)'a';
byte b = (byte)'b';
//Now place a and b into a single char
char ab = (char)((a << 8) + b);

//To retrieve the bytes individually see code below 
System.out.println((char)(ab>>8) +""+ (char)(ab & 0xff)); 

To better understand this let’s look at the binary:

byte a  = (byte)'a' // 01100001

byte b  = (byte)'b' // 01100010

As you can see below, when viewed as a char, the top 8 bits are not being used

char ca = 'a' // 00000000 01100001

char cb = 'b' // 00000000 01100010

Combine the characters with a taking the top 8 bits and b the bottom 8 bits.

char ab = (char)((a << 8) + b); // 01100001 01100010

Summary

It’s more efficient reading data into a char rather than a String.  If you know that you have a maximum of 2 ascii characters they can be combined into a single Java char.  Of course only use this technique if you really are worried about ultra low latency!

Reference: Writing 2 Characters into a Single Java char from our JCG partner Daniel Shaya at the Rational Java blog.

Daniel Shaya

Daniel has been programming in Java since it was in beta. Working predominantly in the finance industry he has created real time trading and margin risk applications. He is currently a director at OpenHFT where we are building next generation Java low latency products.
Subscribe
Notify of
guest

This site uses Akismet to reduce spam. Learn how your comment data is processed.

4 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
DaFab
DaFab
8 years ago

What kind of real-life application would require that?
My daily life of Java ecosystem user is full of applications heavily relying on databases and for which the performance is mainly database driven.
Dividing the access time to a character stream by 3 is simply meaningless …

Daniel Shaya
8 years ago
Reply to  DaFab

Welcome to the world of ultra low latency messaging systems and high frequency trading:)

Jose Garcia
Jose Garcia
8 years ago

Nice to know!

Thanks for info :D

Dmytro Vorobyov
Dmytro Vorobyov
8 years ago

The option -XX:+UseCompressedStrings is more easy in use .

Back to top button