Local variables inside a loop and performance

Overview

Sometimes a question comes up about how much work allocating a new local variable takes.  My feeling has always been that the code becomes optimised to the point where this cost is static i.e. done once, not each time the code is run.

Recently Ishwor Gurung suggested considering moving some local variables outside a loop. I suspected it wouldn’t make a difference but I had never tested to see if this was the case.
 
 

The test

This is the test I ran:

public static void main(String... args) {
    for (int i = 0; i < 10; i++) {
        testInsideLoop();
        testOutsideLoop();
    }
}

private static void testInsideLoop() {
    long start = System.nanoTime();
    int[] counters = new int[144];
    int runs = 200 * 1000;
    for (int i = 0; i < runs; i++) {
        int x = i % 12;
        int y = i / 12 % 12;
        int times = x * y;
        counters[times]++;
    }
    long time = System.nanoTime() - start;
    System.out.printf("Inside: Average loop time %.1f ns%n", (double) time / runs);
}

private static void testOutsideLoop() {
    long start = System.nanoTime();
    int[] counters = new int[144];
    int runs = 200 * 1000, x, y, times;
    for (int i = 0; i < runs; i++) {
        x = i % 12;
        y = i / 12 % 12;
        times = x * y;
        counters[times]++;
    }
    long time = System.nanoTime() - start;
    System.out.printf("Outside: Average loop time %.1f ns%n", (double) time / runs);
}

and the output ended with:

Inside: Average loop time 3.6 ns
Outside: Average loop time 3.6 ns
Inside: Average loop time 3.6 ns
Outside: Average loop time 3.6 ns

Increasing the time the test takes to 100 million iterations made little difference to the results.

Inside: Average loop time 3.8 ns
Outside: Average loop time 3.8 ns
Inside: Average loop time 3.8 ns
Outside: Average loop time 3.8 ns

Replacing the modulus and multiplication with >>, &, + I got

int x = i & 15;
int y = (i >> 4) & 15;
int times = x + y;

prints

Inside: Average loop time 1.2 ns
Outside: Average loop time 1.2 ns
Inside: Average loop time 1.2 ns
Outside: Average loop time 1.2 ns

While modulus is relatively expensive the resolution of the test is to 0.1 ns or less than 1/3 of a clock cycle. This would show any difference between the two tests to an accuracy of this.

Using Caliper

As @maaartinus comments, Caliper is a micro-benchmarking library so I was interested in how much slower it might be that doing the code by hand.

public static void main(String... args) {
    Runner.main(LoopBenchmark.class, args);
}

public static class LoopBenchmark extends SimpleBenchmark {
    public void timeInsideLoop(int reps) {
        int[] counters = new int[144];
        for (int i = 0; i < reps; i++) {
            int x = i % 12;
            int y = i / 12 % 12;
            int times = x * y;
            counters[times]++;
        }
    }

    public void timeOutsideLoop(int reps) {
        int[] counters = new int[144];
        int x, y, times;
        for (int i = 0; i < reps; i++) {
            x = i % 12;
            y = i / 12 % 12;
            times = x * y;
            counters[times]++;
        }
    }
}

The first thing to note is the code is shorter as it doesn’t include timing and printing boiler plate code.  Running this I get on the same machine as the first test.

0% Scenario{vm=java, trial=0, benchmark=InsideLoop} 4.23 ns; σ=0.01 ns @ 3 trials
50% Scenario{vm=java, trial=0, benchmark=OutsideLoop} 4.23 ns; σ=0.01 ns @ 3 trials

benchmark   ns linear runtime
InsideLoop 4.23 ==============================
OutsideLoop 4.23 =============================

vm: java
trial: 0

Replacing the modulus with shift and and

0% Scenario{vm=java, trial=0, benchmark=InsideLoop} 1.27 ns; σ=0.01 ns @ 3 trials
50% Scenario{vm=java, trial=0, benchmark=OutsideLoop} 1.27 ns; σ=0.00 ns @ 3 trials

benchmark   ns linear runtime
InsideLoop 1.27 =============================
OutsideLoop 1.27 ==============================

vm: java
trial: 0

This is consistent with the first result and only about 0.4 – 0.6 ns slower for one test. (about two clock cycles), and next to no difference for the shift, and, plus test.  This may be due to the way calliper samples the data but doesn’t change the outcome.

It is worth nothing that when running real programs, you typically get longer times than a micro-benchmark as the program will be doing more things so the caching and branch predictions is not as ideal.  A small over estimate of the time taken may be closer to what you can expect to see in a real program.

Conclusion

This indicated to me that in this case it made no difference.  I still suspect the cost of allocating local variables is don’t once when the code is compiled by the JIT and there is no per-iteration cost to consider.
 

Reference: Can synchronization be optimised away? from our JCG partner Peter Lawrey at the Vanilla Java blog.

Related Whitepaper:

Bulletproof Java Code: A Practical Strategy for Developing Functional, Reliable, and Secure Java Code

Use Java? If you do, you know that Java software can be used to drive application logic of Web services or Web applications. Perhaps you use it for desktop applications? Or, embedded devices? Whatever your use of Java code, functional errors are the enemy!

To combat this enemy, your team might already perform functional testing. Even so, you're taking significant risks if you have not yet implemented a comprehensive team-wide quality management strategy. Such a strategy alleviates reliability, security, and performance problems to ensure that your code is free of functionality errors.Read this article to learn about this simple four-step strategy that is proven to make Java code more reliable, more secure, and easier to maintain.

Get it Now!  

3 Responses to "Local variables inside a loop and performance"

  1. ikk says:

    The declaration is just to tell the compiler that the variable exists and what scope it has. The “executable code” is only on the right side of the equals sign. I would be **very** surprised if the declaration location made a difference at all.

    There could be a difference if it changed the number of allocations and deallocations. Since you are using only primitives and doing the same work (i.e. the right side of the equals sign) in both tests, the results are always identical.

    (of course, all of this depends on the language you are using: while in C and Java declarations do not generate executable code, they only contribute to the stack allocation done at the beginning of the method; OTOH in C++ a variable declaration for a custom type can also execute a costly initialization and impact performance)

  2. Alexis Bauchu says:

    I bet there will be much more difference with memory allocation inside / outside the loop. For example, use an object, a kind of “Point” to store your data. ie:

    Outside:

    {
    int[] counters = new int[144];
    Point p = new Point();
    int times;
    for (int i = 0; i < reps; i++) {
    p.x = i % 12;
    p.y = i / 12 % 12;
    times = p.x * p.y;
    counters[times]++;
    }
    }

    Inside:
    {
    int[] counters = new int[144];
    int times;
    for (int i = 0; i < reps; i++) {
    Point p = new Point();
    p.x = i % 12;
    p.y = i / 12 % 12;
    times = p.x * p.y;
    counters[times]++;
    }
    }

    I'm interested in the results you'll get from this.

  3. Chris Newland says:

    Don’t prematurely optimise at the expense of correct scoping. If you don’t need these variables outside the loop then they should be declared inside it. Leave optimisation to the compiler / JIT until you have identified a bottleneck through profiling!

Leave a Reply


six − = 2



Java Code Geeks and all content copyright © 2010-2014, Exelixis Media Ltd | Terms of Use | Privacy Policy
All trademarks and registered trademarks appearing on Java Code Geeks are the property of their respective owners.
Java is a trademark or registered trademark of Oracle Corporation in the United States and other countries.
Java Code Geeks is not connected to Oracle Corporation and is not sponsored by Oracle Corporation.

Sign up for our Newsletter

20,709 insiders are already enjoying weekly updates and complimentary whitepapers! Join them now to gain exclusive access to the latest news in the Java world, as well as insights about Android, Scala, Groovy and other related technologies.

As an extra bonus, by joining you will get our brand new e-books, published by Java Code Geeks and their JCG partners for your reading pleasure! Enter your info and stay on top of things,

  • Fresh trends
  • Cases and examples
  • Research and insights
  • Two complimentary e-books