Core Java

Scalable Counters For Multi Core

Counters are required everywhere , for e.g. to find key KPI of application, load on application, total number of request served, some KPI for finding throughput of application & many more.

With all these requirement complexity of concurrency is also added & that makes this problem interesting.

How to implement concurrent counter

 
 

  • Synchronized – This was the only option before JDK 1.5, since now we are waiting for JDK8 release , so definitely this is not an option.
  • Lock based  – You should never attempt this for counter , it will perform very badly
  • Wait Free – Java does’t have support for Fetch-and-add, so bit difficult to implement it.
  • Lock free – With very good support of Compare-and-swap, this looks good option to use.

How does Compare-and-Swap based counter performs

I used AtomicInteger for this test and this counter is incremented for 1 Million time by each thread & to increase the contention number of threads are increased gradually.

Test Machine Details

OS : Windows 8

JDK : 1.7.0.25

CPU : Intel i7-3632QM , 8 Core

RAM : 8 GB

Atomic Int

Y Axis – Time taken to increment 1 Million times

X Axis – Number of threads

As number of threads are increased, time taken to increment counter increases and it is due to contention. For CAS based counter , it is CAS failure that causes slowdown.

Is this best performance that we can get ? no definitely their are better solution to implement concurrent counter, lets have look at them.

Alternate Concurrent Counter

Lets look at some of the solution to implement counter that handles contention in better way:

  • Core Based Counter – Maintain counter for each logical core, so that way you will have less contention. Only issue you have this type of counter is that if number of threads are more than logical core then you will start noticing contention.
  • Thread Based Counter – Maintain counters for total number of threads that will be using system. This works well when number of threads are more than number of logical cores.

Lets test it

Time taken by different types of counter

All Counter - Time Taken

Y Axis – Time taken to increment 1 Million times

X Axis – Number of threads

Concurrent Counter performs much better than Atomic based counter, for 16 threads it is around 5X times better, that is huge difference!

CAS Failure Rate

Counter CAS Failure

Y Axis – CAS Failure in 100Ks

X Axis – Number of threads

Due to contention, Atomic based counter see lot of failure and it goes up exponential as i add more threads & other counters performs pretty well.

Observation

Multi core machines becoming easily available & we have to change the way we handle concurrency, traditional way of doing concurrency is not going to scale in today’s time when having 24 or 48 core server is very common.

  • To reduce the contention you have to use multiple counters and then aggregate them later
  • Core based counter works well if number of threads will be less or same as number of cores
  • Thread based counter is good when number of threads are much more than available core
  • Key to reduce contention is identify counter to which thread will write,i have used simple approach based on thread id, but much better approach are available, look at ThreadLocalRandom of JDK 8 for some ideas.
  • Thread based approach is used in LongAdder of JDK8, which creates many slots to reduce contention.

Code for all the counters used in this test are available @ Github
 

Reference: Scalable Counters For Multi Core from our JCG partner Ashkrit Sharma at the Are you ready blog.

Ashkrit Sharma

Pragmatic software developer who loves practice that makes software development fun and likes to develop high performance & low latency system.
Subscribe
Notify of
guest

This site uses Akismet to reduce spam. Learn how your comment data is processed.

0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Back to top button