Home » Java » Core Java » Which memory is faster Heap or ByteBuffer or Direct ?

About Ashkrit Sharma

Ashkrit Sharma
Pragmatic software developer who loves practice that makes software development fun and likes to develop high performance & low latency system.

Which memory is faster Heap or ByteBuffer or Direct ?

Java is becoming new C/C++ , it is extensively used in developing High Performance System. Good for millions of Java developer like me!
In this blog i will share my experiment with different types of memory allocation that can be done in java and what type of benefit you get with that.

Memory Allocation In Java

What type of support Java provide for memory allocation:

– Heap Memory

I don’t i have to explain this, all java application starts with this.  All object allocated using “new” keyword goes under Heap Memory

– Non Direct ByteBuffer

It is wrapper over byte array, just flavor of Heap Memory.
ByteBuffer.allocate() can be used to create this type of object, very useful if you want to deal in terms of bytes not Object.

– Direct ByteBuffer

This is the real stuff that java added since JDK 1.4.
Description of Direct ByteBuffer based on Java Doc

“A direct byte buffer may be created by invoking the allocateDirect factory method of this class. The buffers returned by this method typically have somewhat higher allocation and deallocation costs than non-direct buffers. The contents of direct buffers may reside outside of the normal garbage-collected heap, and so their impact upon the memory footprint of an application might not be obvious. It is therefore recommended that direct buffers be allocated primarily for large, long-lived buffers that are subject to the underlying system’s native I/O operations. In general it is best to allocate direct buffers only when they yield a measureable gain in program performance.”

Important thing to note about Direct Buffer is

  • It is Outside of JVM
  • Free from Garbage Collector reach.

These are very important things if you care about performance.
MemoryMapped file are also flavor of Direct byte buffer, i shared some of my finding with that in below blogs:

Off Heap or Direct Memory

This is almost same as Direct ByteBuffer but with little different, it can be allocated by unsafe.allocateMemory, as it is direct memory so it creates no GC overhead. Such type of memory must be manually released.

In theory Java programmer are not allowed to do such allocation and i think reason could be

  • It is complex to manipulate such type of memory because you are only dealing with bytes not object
  • C/C++ community will not like it

Lets take deep dive into memory allocation

For memory allocation test i will use 13 byte of message & it is broken down into

  • int – 4 byte
  • long – 8 byte
  • byte – 1 byte

I will only test write/read performance, i am not testing memory consumption/allocation speed.

Write Performance

Allocation - Write TP

X Axis – No Of Reading

Y Axis – Op/Second in Millions
5 Million 13 bytes object are written using 4 types of allocation.

  1. Direct ByteBuffer & Off Heap are best in this case, throughput is close to
  2. 350 Million/Sec
  3. Normal ByteBuffer is very slow, TP is just 85 Million/Sec
  4. Direct/Off Heap is around 1.5X times faster than heap

I did same test with 50 Million object to check how does it scale, below is graph for same.
Allocation - Write TP 50
X Axis – No Of Reading
Y Axis – Op/Second in Millions
Numbers are almost same as 5 Million.

Read Performance

Lets look at read performance
Allocation - Read TP 5
X Axis – No Of Reading
Y Axis – Op/Second in Millions
This number is interesting, OFF heap is blazing fast throughput for 12,000 Millions/Sec. Only close one is HEAP read which is around 6X times slower than OFF Heap.
Look at Direct ByteBuffer , it is tanked at just 400 Million/Sec, not sure why it is so.

Lets have look at number for 50 Million Object
Allocation - Read TP 50
X Axis – No Of Reading
Y Axis – Op/Second in Millions
Not much different.


Off heap via Unsafe is blazing fast with 330/11200 Million/Sec.
Performance for all other types of allocation is either good for read or write, none of the allocation is good for both.
Special note about ByteBuffer, it is pathetic , i am sure you will not use this after seeing such number. DirectBytebuffer sucks in read speed, i am not sure why it is so slow.

So if memory read/write is becoming bottle neck in your system then definitely Off-heap is the way to go, remember it is highway, so drive with care.

Code is available @git hub


Do you want to know how to develop your skillset to become a Java Rockstar?

Subscribe to our newsletter to start Rocking right now!

To get you started we give you our best selling eBooks for FREE!


1. JPA Mini Book

2. JVM Troubleshooting Guide

3. JUnit Tutorial for Unit Testing

4. Java Annotations Tutorial

5. Java Interview Questions

6. Spring Interview Questions

7. Android UI Design


and many more ....




  1. Ashkrit,
    Thanks for this analysis.

    ByteBuffer.allocateDirect(); also uses Unsafe by using DirectByteBuffer, so how is Unsafe.allocateMemory() different?

    The results you have shown are really fascinating in favor of Off-Heap using Unsafe. Some of the questions to you here:

    1. you never used setMemory() function. why? is it because Unsafe.putXXX() anyways were going to overwrite any pre-existing values?

    2. when you perform operations inside the off-heap class, like identifyIndex(), etc, where are they getting executed? on-heap/off-heap? I ask this because, using Unsafe.putXXX() you use off-heap memory but all other operations like getDeclaredFields() of a particular class to get its data, and other operations like getting value of a particular field of an object before setting it off-heap, where does it operate? If on heap, then aren’t we just copying data from JVM heap to non-heap?

    Pls. let me know the above. Thanks

    • 1 – Regarding you question of why i did’t use setMemory.
      This function is used to set some initial value to the memory that is allocated , if you don’t do it then you will see some garbage, just like c/c++.
      i did’t do it because it was ok for my test to ignore that step, but in real word we must set value to 0.

      2 – Regarding you second question.
      All the functions of Unsafe operate in JVM space, so there is cost of transfer because all function are native, but native function of Unsafe are special they are intrinsics, intrinsics functions does’t have overhead like plain native/JNI call.

      Intrinsic function are optimized , so you don’t see big overhead of byte conversion for direct operation.

      I think java Intrinsic deserve blog post.

      • Ashkrit,
        Thanks for your reply.
        I understood the answer to the first question. I asked it because, I thought setMemory(offset, size, 0) is a standard practice in Unsafe to make sure the entire block is set to ZERO. Was suspecting that you dint do it because it could cause some additional time. Do you think additional time Vs data corruption? what would you risk?

        For the 2nd questions, I understand that Unsafe method calls are using intrinsic functions of java which are native like, Unsafe.allocate(), Unsafe.putXXX(), Unsafe.getXXX(). But, my question is different. I want to know that a function like identifyIndex() –> which is not Intrinsic to Unsafe <– where is it executed? heap OR off-heap?

        Now since you said, that intrinsic functions are different that JNI calls and dont have overhead of JNI, how is it different from a c function using JNI in terms of execution and not efficiency of time/overhead?

        Do you think intrinsic functions of Unsafe class can be utilized for doing something like getting handle to low-level registers for say, getting to a particular core of the processor? For such a thing, wont you still need JNI? It would be great if you can use Unsafe like intrinsic function for such jobs. Can I not just call some C code from Unsafe?

        Thanks Ashkrit for this blog. Very informative.

        • For 1st –
          It is more data corruption than time. If data structure using off heap can manage the pointer in such a way that it never allow access to corrupted data or invalid/slate data then we can get rid of resetting.

          2nd –
          identifyindex is executed off heap.
          JVM does lot of smart thing for intrinsic, like it will do method inline, which i think does’t not happen with native method, in many case intrinsic will try to use feature of underlying platform.
          for eg – Integer.bitCount() is intrinsic ,if you look in source code, it has java impl to find bit count but since it is intrinsic it will use POPCNT machine instruction to find bit count, which is very fast.

          I am not sure if you have looked into http://hg.openjdk.java.net/jdk8/awt/hotspot/file/d61761bf3050/src/share/vm/classfile/vmSymbols.hpp

          This has list of all such function.

          Unsafe is gate to get into C world, right now not every thing is exposed via unsafe.
          It will be great if you can find more info about processor, like which core thread is running or pinning thread to specific core etc or access to fetch-add(i.e alternate to CAS).

          There are talks to remove unsafe from hotspot, but this will setback for may high performance application, unless java decide to give such API as main API.

          You might be interested in Unsafe surve.

          • Ashkrit,
            Thanks again for this information and explanation. My question of off-heap usage is more related to code execution that necessarily uses heap like TestMemoryAllocator.java should be using java heap while as you said identifyindex () should still be off-heap because its related to Unsafe’s positioning.

            So, if I am designing a system for high speeds, what should I presume to be off-heap? Would it be all the code restricted to the usage Unsafe? If that’s the case, then what’s with the communication between objects on heap and data off-heap?

            Intrinsic functions are very useful. thanks for the link and I took the survey. I dont think Unsafe is going away from HotSpot because the buzz is that Oracle is trying to make in-roads into low-latency finance field and is getting a bit behind IBM JVM and a lot behind Azul. So, they are trying to come up with low-jitter Unsafe API that is actually Safe to use. One of their already implemented stuff is Native Byte Buffer but it sucks in reading. Any particular reason, you dint use ByteBuffer API functions in your tests?

            I am very interested in low-level programming and trying to take that direction so I dont have to go into the oceanic world of programming with C for low-latency apps. Java is so much easier to maintain and faster to code as well and well-object oriented. I live in the US close to NYC and there are a lot of folks here who already use java for low-latency, however, concepts matter. As long as there is Unsafe and advanced bitwise operations not going away from JAVA, I anticipate a lot of companies changing to Java from C/C++.

  2. Forgot to mention that DirectByteBuffer implementation and off-heap implementation looks identical. However, their read times are different most likely because while writing its both underlying implementations are using Unsafe but while reading, DirectByteBuffer is bringing data to heap. Any data brought from native memory to JVM will need to be converted into a byte array (its allocation on heap as an object takes time) that’s why it seems to be taking so much more time, even more than the direct heap implementation. This is a thought, but I havent verified it through tests.

    • With numbers it looked like DirectByteBuffer brings all data in Heap, but i did’t spend time to confirm that.
      I have to also do some test to confirm this fact.

      • Ok sure some test data will be great but reading DirectByteBuffer to JVM heap is done via byte[] and that’s when it takes a lot of time.

    • I don’t follow this explanation – the Off-heap implementation is calling:

      public byte getByte() {
      return unsafe.getByte(pos + byte_offset);

      the DirectMeomoryBuffer implementation is calling:

      public byte More …get() {
      return ((unsafe.getByte(ix(nextGetIndex()))));

      Where is the “bringing memory to heap space” happening in DMB vs off heap objects?

  3. Ashkrit,
    One more question on why havent you used ByteBuffer class functions?

    • Reply option is disable on the original thread, so using this one

      I use below approach to solve the problem of building high speed application
      Stay away from class like object model because it hurts performance due to memory indirection

      – Keep data in memory using simple array like structure, best is column like approach which gives excellent performance because data is laid out linearly and you get benefit of hardware CPU cache, Prefetcher etc.
      In this approach you can get reasonable performance but GC comes in picture and you have to deal with it.

      – If you want to Keep GC out then i start looking at unsafe for direct memory allocation.
      It is possible to do better on what type of API you provide to access/manipulate off heap data, internally i think has to be streams of byte but to outside world nice Object view.

      Once you have some objects off heap then the issue of how do on-heap refer off heap, i have only used pointer (i.e reference by index or section of memory) based approach to achieve such thing.

      Some other things that comes to mind is creating in-memory index/dictionary of off heap object, so that you don’t deal with some numbers and you can still get them in reasonably time.
      In-Memory index is being heavily used in Mongo DB, which is using Memory Mapped file to keep all the data, but it keeps index in memory and it contains reference to byte indexes.

      I did’t get you question about ByteBuffer class , which function are you taking about ?

      Lot of interesting stuff happens in US!
      I am based on singapore and have very few options of where people are really using java for high performance application.

      I think application performance is decided by data structure, algo & design you make not by just language.
      So there is lot of myth around java is slow and hopefully that will go.

  4. Ashkrit,
    Thanks for your reply again. I certainly appreciate it.

    ByteBuffer functions like position(),limit(), put()/get(), etc. which are defined in the API. I think you havent used them because it would be done on JAVA heap, esp. for writing. So, you got similar to Unsafe results for writing. However, I think because reads still bring the data to JAVA heap, the results are poor. I just wanted to check with you, if its true.

    I am going to read upon your memory-mapped article files later. Thanks for great postings.

    Yes, there is a lot of great opportunity here, but I believe, London is ahead on low-latency related development than NYC.

    I agree that nice class view is seen by the outside world, however, internally, its using fast Off-Heap execution. One of the areas that I am a bit nervous about is thread safety with off-heap data manipulation. How do you port large datastructures off-heap? and make them threadsafe.

    Thanks again for your article.

  5. I have one further question based on the article presented here:


    It seems to me you are using the same technique to access Off-heap memory, and yet you experience much better results, clearly unaffected by the JNI calls. Could you explain why that might be the case?

  6. My this article was as part of curiosity from the article you mention above, i have commented on it with my result before writing this blog!

    Difference is that i am allocating one big array for all the objects and the blog that you mention allocate memory per object

    Allocating one big array has lot of nice benefit
    – Predictable memory access
    – Most of the data will be prefetched

    This type of allocation will be more cpu cache friendly.

  7. It seem that your implementation of HeapAllocatedObject is not apple-to-apple comparison, as you added extra layer HeapValue and caused addition index-lookup and memory fragmentation.

    As the Object construct and GC are only happen in HeapAllocatedObject, but not others. it supposed to be slowest.

  8. One of the reason to write this blog is to share the overhead you have with plain java object.
    Heapvalue will have all the overhead that is associated with any object due to layout used by java, all the heap allocation will have GC overhead also and direct memory is free from it and that is one of the big reason of write speed you get with direct memory.

    Bytebuffer shows interesting result , it has worst write performance although it is just backed by bytearray and most compact way to store data on heap.

    Access by index is not adding any significant overhead.

    • Thanks for your interesting posts. So what is the reason caused ByteBuffer slower than Heap in both read and write performance?

      • I have to do some more investigation to workout the cause, but some of the factor to consider are

        – for ByteBuffer data is stored in bytearray , so every time you ask for long fair bit of shift operation happens for converting byte to Long value because of bigEndian/littleendian on both read & write side.

        – Another thing to consider can be bigEndian/littleendian, for littleendian byte array is read in reverse order(8th byte to 1th byte), so there is high chance that CPU prefetchers are not of much use, i have not benchmarked this assumption

  9. The Unsafe is (btw) just like memcpy(&arr, (char*)&integer, 4);
    You cannot compare this to manual encoding (like bitshifting…).

  10. “This is almost same as Direct ByteBuffer but with little different, it can be allocated by unsafe.allocateMemory, as it is direct memory so it creates no GC overhead”
    Should’ve explained this properly. You are showing huge differences in results, but describing as “little different”. Shows you also don’t know much about it.

    “C/C++ community will not like it”
    Shows your immaturity… why on earth they won’t like it.

    • That sentence was more from API usage point of view not on real result. I not expert in this area, i am just sharing my observation and if you don’t like it then it is fine. It would be nice if you can share your experience with Direct Bytebuffer.

      Regarding C/C++ community you missed the whole context

  11. Would you be willing to share the code you used for this test? I performed a similar test, but using 100M int primitives, and found on-heap (via int[]) to be fastest for both read and write. I’d be curious to see where the difference lies. I suspect your more complex data structure might kill on-heap due to additional pointer redirection and loss of locality, but as always, the devil’s in the details.

    As for my own version of the test, I’ve just started looking at output from -XX:+PrintAssembly but it seems like the JIT is better able to optimize serialized array access than the other forms. Of course, this has implications for real applications with more random access patterns, but that’s why they always say microbenchmarking is so difficult, especially in Java.

    • Link to code is broken on this page.
      Code is available @ https://github.com/ashkrit/blog/tree/master/src/main/java/allocation

      Test code is also available @ https://github.com/ashkrit/blog/blob/master/src/main/java/allocation/TestMemoryAllocator.java

      Can you share your test env details.

      My testing was done on below specs.

      OS : Windows 8
      JDK : 1.7 build 45+
      Processor : i7-3632QM @ 2.20 GHz

      • My test environment is:
        OS : Ubuntu 14.04
        JDK : OpenJDK 1.7.0_91
        CPU: Xeon E5-1650

        Measured average time per item over 100M integers is as follows…
        int[] – 0.952ns = 1050 Mops/sec
        Unsafe – 1.228ns = 814 Mops/sec
        direct ByteBuffer – 1.328ns = 753 Mops/sec
        heap ByteBuffer – 1.886ns = 530 Mops/sec
        int[] – 0.502ns = 1992 Mops/sec
        Unsafe – 0.872ns = 1146 Mops/sec
        direct ByteBuffer – 0.944ns = 1059 Mops/sec
        heap ByteBuffer – 2.175ns = 459 Mops/sec

        I’ve posted the source of my test here: https://docs.google.com/document/d/1_Mi8atjsYqGtuWUDmfTnnVG0qaeS2f6j3th9oeMXXKs/edit?usp=sharing

        I’ve placed each individual test into a single method and all are static in an effort to maximize compile-time optimizations. The only method calls I make are to the framework classes such as ByteBuffer. Like you, I sum the values while reading (and set that result to a volatile) to ensure the reads aren’t optimized away.

        Out of curiosity I downloaded your code and ran it in my environment with 100M objects on a 16GB heap (same as I used for mine)…
        HEAP – 146 Mops/sec
        OFFHEAP – 310 Mops/sec
        DBB – 323 Mops/sec
        BB – 110 Mops/sec
        HEAP – 267 Mops/sec
        OFFHEAP – 514 Mops/sec
        DBB – 293 Mops/sec
        BB – 110 Mops/sec

        Of note, I never came anywhere near 12000 Mops/sec. Nor would I expect it to be possible: if each object is 13 bytes, as in your example, that would amount to 156 GB/sec, which is far more than any current CPU’s memory bandwidth. You may want to recheck those values.

        It does make sense to me that a primitive array would perform far better than a reference array, and it will be similarly interesting to see how these numbers change once Java 9 value types are a thing. With those, the objects themselves can be in the array, avoiding having to dereference two pointers for each access and the corresponding loss of locality. The array will look much more like the packed bytes in the unsafe blob.

        • Bench mark code for read function had “dead code elimination” problem due to which that numbers were wrong!

          Later code was fixed but i forgot to update the blog with latest result. Numbers shared by you are very much close to what i am also getting.

          Yes agree JDK9 will do something to improve object layout.
          Gil Tene from azul started with library that optimized memory layout of object.
          you can have look @ http://objectlayout.github.io/ObjectLayout/ for more details.

  12. Thanks, nice post

Leave a Reply

Your email address will not be published. Required fields are marked *


Want to take your Java skills to the next level?

Grab our programming books for FREE!

Here are some of the eBooks you will get:

  • Spring Interview QnA
  • Multithreading & Concurrency QnA
  • JPA Minibook
  • JVM Troubleshooting Guide
  • Advanced Java
  • Java Interview QnA
  • Java Design Patterns