Home » Java » Core Java » JLBH Examples 3 – The Affects of Throughput on Latency

About Daniel Shaya

Daniel Shaya
Daniel has been programming in Java since it was in beta. Working predominantly in the finance industry he has created real time trading and margin risk applications. He is currently a director at OpenHFT where we are building next generation Java low latency products.

JLBH Examples 3 – The Affects of Throughput on Latency

In this post:

  • A discussion about the effects of thoughput on latency
  • How use JLBH to measure TCP loopback
  • Adding probes to test both halves of the TCP round trip
  • Watching the effect of increasing throughput on latency
  • Understanding that you have to drop throughput to achieve good latencies at high percentiles.

In the post we saw the effects of accounting for coordinated omission or measuring the effect a delay to one iteration will have an affect on subsequent iterations.

Intuitively we understand that throughput will affect latency. It seems natural that a
s we raise throughput we will also raise latency.

Going into a very crowded shop will affect how fast you can select and purchase your goods. On the other hand consider a very rarely frequented shop. It could be that in such a shop the shopkeeper is away from the till having a tea break and that your purchase will be delayed as you wait for him to put down his cup of tea and make his way to the counter where he can server you.

This is exactly what you find when running benchmarks and vary throughput.

In general as you raise throughput latency increases but at some point when throughput drops below a certain threshold latency can increase as well.

The code below times a round trip TCP call over loopback.

We add two probes:

  • client2server – the amount of time taken to complete the first half of the round trip
  • server2client – the amount of time taken to complete the second half of the trip

These probes do not take into account co-ordinated omission – it is only the end to end time that takes coordinated omission into account.

Here’s the code for the benchmark:

package org.latency.tcp;

import net.openhft.affinity.Affinity;
import net.openhft.chronicle.core.Jvm;
import net.openhft.chronicle.core.jlbh.JLBHOptions;
import net.openhft.chronicle.core.jlbh.JLBHTask;
import net.openhft.chronicle.core.jlbh.JLBH;
import net.openhft.chronicle.core.util.NanoSampler;

import java.io.EOFException;
import java.io.IOException;
import java.net.InetSocketAddress;
import java.nio.ByteBuffer;
import java.nio.ByteOrder;
import java.nio.channels.ServerSocketChannel;
import java.nio.channels.SocketChannel;

public class TcpBenchmark implements JLBHTask {
    private final static int port = 8007;
    private static final boolean BLOCKING = false;
    private final int SERVER_CPU = Integer.getInteger("server.cpu", 0);
    private JLBH jlbh;
    private ByteBuffer bb;
    private SocketChannel socket;
    private NanoSampler client2serverProbe;
    private NanoSampler server2clientProbe;

    public static void main(String[] args) {
        JLBHOptions jlbhOptions = new JLBHOptions()
                .warmUpIterations(50000)
                .iterations(50000)
                .throughput(20000)
                .runs(5)
                .jlbhTask(new TcpBenchmark());
        new JLBH(jlbhOptions).start();
    }

    @Override
    public void init(JLBH jlbh) {
        this.jlbh = jlbh;
        client2serverProbe = jlbh.addProbe("client2server");
        server2clientProbe = jlbh.addProbe("server2clientProbe");
        try {
            runServer(port);
            Jvm.pause(200);

            socket = SocketChannel.open(new InetSocketAddress(port));
            socket.socket().setTcpNoDelay(true);
            socket.configureBlocking(BLOCKING);

        } catch (IOException e) {
            e.printStackTrace();
        }

        bb = ByteBuffer.allocateDirect(8).order(ByteOrder.nativeOrder());
    }

    private void runServer(int port) throws IOException {

        new Thread(() -> {
            if (SERVER_CPU > 0) {
                System.out.println("server cpu: " + SERVER_CPU);
                Affinity.setAffinity(SERVER_CPU);
            }
            ServerSocketChannel ssc = null;
            SocketChannel socket = null;
            try {
                ssc = ServerSocketChannel.open();
                ssc.bind(new InetSocketAddress(port));
                System.out.println("listening on " + ssc);

                socket = ssc.accept();
                socket.socket().setTcpNoDelay(true);
                socket.configureBlocking(BLOCKING);

                System.out.println("Connected " + socket);

                ByteBuffer bb = ByteBuffer.allocateDirect(8).order(ByteOrder.nativeOrder());
                while (true) {
                    readAll(socket, bb);

                    bb.flip();
                    long time = System.nanoTime();
                    client2serverProbe.sampleNanos(time - bb.getLong());
                    bb.clear();
                    bb.putLong(time);
                    bb.flip();

                    writeAll(socket, bb);
                }
            } catch (IOException e) {
                e.printStackTrace();
            } finally {
                System.out.println("... disconnected " + socket);
                try {
                    if (ssc != null)
                        ssc.close();
                } catch (IOException ignored) {
                }
                try {
                    if (socket != null)
                        socket.close();
                } catch (IOException ignored) {
                }
            }
        }, "server").start();

    }

    private static void readAll(SocketChannel socket, ByteBuffer bb) throws IOException {
        bb.clear();
        do {
            if (socket.read(bb) < 0)
                throw new EOFException();
        } while (bb.remaining() > 0);
    }

    @Override
    public void run(long startTimeNs) {
        bb.position(0);
        bb.putLong(System.nanoTime());
        bb.position(0);
        writeAll(socket, bb);

        bb.position(0);
        try {
            readAll(socket, bb);
            server2clientProbe.sampleNanos(System.nanoTime() - bb.getLong(0));
        } catch (IOException e) {
            e.printStackTrace();
        }

        jlbh.sample(System.nanoTime() - startTimeNs);
    }

    private static void writeAll(SocketChannel socket, ByteBuffer bb) {
        try {
            while (bb.remaining() > 0 && socket.write(bb) >= 0) ;
        } catch (IOException e) {
            e.printStackTrace();
        }
    }

    @Override
    public void complete() {
        System.exit(0);
    }
}

And here are the results when run at a throughput of 20,000 iterations / second:

Warm up complete (50000 iterations took 2.296s)
-------------------------------- BENCHMARK RESULTS (RUN 1) ---------Run time: 2.5s
Correcting for co-ordinated:true
Target throughput:20000/s = 1 message every 50us
End to End: (50,000)                            50/90 99/99.9 99.99 - worst was 34 / 2,950  19,400 / 20,450  20,450 - 20,450
client2server (50,000)                          50/90 99/99.9 99.99 - worst was 16 / 26  38 / 72  287 - 336
server2clientProbe (50,000)                     50/90 99/99.9 99.99 - worst was 16 / 27  40 / 76  319 - 901
OS Jitter (26,960)                              50/90 99/99.9 99.99 - worst was 9.0 / 16  44 / 1,340  10,220 - 11,800
--------------------------------------------------------------------
-------------------------------- BENCHMARK RESULTS (RUN 2) ---------
Run time: 2.5s
Correcting for co-ordinated:true
Target throughput:20000/s = 1 message every 50us
End to End: (50,000)                            50/90 99/99.9 99.99 - worst was 42 / 868  4,590 / 5,110  5,370 - 5,370
client2server (50,000)                          50/90 99/99.9 99.99 - worst was 20 / 27  38 / 92  573 - 2,560
server2clientProbe (50,000)                     50/90 99/99.9 99.99 - worst was 19 / 27  38 / 72  868 - 1,740
OS Jitter (13,314)                              50/90 99/99.9 99.99 - worst was 9.0 / 16  32 / 96  303 - 672
--------------------------------------------------------------------
-------------------------------- BENCHMARK RESULTS (RUN 3) ---------
Run time: 2.5s
Correcting for co-ordinated:true
Target throughput:20000/s = 1 message every 50us
End to End: (50,000)                            50/90 99/99.9 99.99 - worst was 34 / 152  999 / 2,160  2,290 - 2,290
client2server (50,000)                          50/90 99/99.9 99.99 - worst was 17 / 26  36 / 54  201 - 901
server2clientProbe (50,000)                     50/90 99/99.9 99.99 - worst was 16 / 25  36 / 50  225 - 1,740
OS Jitter (14,306)                              50/90 99/99.9 99.99 - worst was 9.0 / 15  23 / 44  160 - 184
---------------------------------------------------------------------------------------------------- SUMMARY (end to end)---------------
Percentile   run1         run2         run3      % Variation   var(log)
50:            33.79        41.98        33.79        13.91       
90:          2949.12       868.35       151.55        75.92       
99:         19398.66      4587.52       999.42        70.53       
99.9:       20447.23      5111.81      2162.69        47.62     99.99:      20447.23      5373.95      2293.76        47.24       
worst:      20447.23      5373.95      2293.76        47.24       
--------------------------------------------------------------------
-------------------------------- SUMMARY (client2server)------------
Percentile   run1         run2         run3      % Variation   
50:            16.13        19.97        16.90        10.81       
90:            26.11        27.14        26.11         2.55       
99:            37.89        37.89        35.84         3.67       
99.9:          71.68        92.16        54.27        31.76       
99.99:        286.72       573.44       200.70        55.32       
worst:        335.87      2555.90       901.12        55.04       
--------------------------------------------------------------------
-------------------------------- SUMMARY (server2clientProbe)-------
Percentile   run1         run2         run3      % Variation   
50:            16.13        18.94        16.13        10.43       
90:            27.14        27.14        25.09         5.16       
99:            39.94        37.89        35.84         3.67       
99.9:          75.78        71.68        50.18        22.22       
99.99:        319.49       868.35       225.28        65.55       
worst:        901.12      1736.70      1736.70         0.00       
--------------------------------------------------------------------

What should happen is that:

client2server + server2client ~= endToEnd

And this more of less what happens at the 50th percentile

Take the second run for the purposes of this demonstration:

19.97 + 18.94 ~= 41.98

If that was all you were measuring you might say that there would be no problem running 20k/s messages through my machine.

However my laptop clearly can’t handle this volume and if we look at the second run again for the 90th percentile.

27.14 + 27.14 !~= 868.35

And as you move up the percentiles it just gets worse and worse…

If however I change the throughput to 5k messages / second I see this on the 90th percentile:

32.23 + 29.38 ~= 62.46

So we see that if you want to achieve low latencies at the high percentiles you have to drop your throughput to the correct level.

This is why it so important that we are able to vary throughputs with JLBH.

(0 rating, 0 votes)
You need to be a registered member to rate this.
Start the discussion Views Tweet it!
Do you want to know how to develop your skillset to become a Java Rockstar?
Subscribe to our newsletter to start Rocking right now!
To get you started we give you our best selling eBooks for FREE!
1. JPA Mini Book
2. JVM Troubleshooting Guide
3. JUnit Tutorial for Unit Testing
4. Java Annotations Tutorial
5. Java Interview Questions
6. Spring Interview Questions
7. Android UI Design
and many more ....
I agree to the Terms and Privacy Policy

Leave a Reply

avatar

This site uses Akismet to reduce spam. Learn how your comment data is processed.

  Subscribe  
Notify of