Home » Java » Core Java » ArrayList Using Memory Mapped File

About Ashkrit Sharma

Pragmatic software developer who loves practice that makes software development fun and likes to develop high performance & low latency system.

ArrayList Using Memory Mapped File

Introduction

In-Memory computing is picking up due to affordable hardware, most of the data is kept in RAM to meet latency and throughput goal, but keeping data in RAM create Garbage Collector overhead especially if you don’t pre allocate. So effectively we need garbage less/free approach to avoid GC hiccups

Garbage free/less data structure

There are couple of option to achieve it

Object Pool

Object pool pattern is very good solution, i wrote about that in Lock Less Object Pool blog

Off Heap Objects

JVM has very good support for creating off-heap objects. You can get rid of GC pause if you take this highway and highway has its own risk!

MemoryMapped File

This is mix of Heap & Off Heap, like best of world.

Memory mapped file will allow to map part of the data in memory and that memory will be managed by OS, so it will create very less memory overhead in JVM process that is mapping file. This can help in managing data in garbage free way and you can have JVM managing large data. Memory Mapped file can be used to develop IPC, i wrote about that in power-of-java-memorymapped-file blog. In this blog i will create ArrayList that is backed up by MemoryMapped File, this array list can store millions of object and with almost no GC overhead. It sounds crazy but it is possible.

Lets gets in action

In this test i use Instrument object that has below attribute

– int id

– double price

So each object is of 12 byte. This new Array List holds 10 Million Object and i will try to measure writer/read performance

Writer Performance

BigArrayList Write
X Axis – No Of Reading
Y Axis – Time taken to add 10 Million in Ms

Adding 10 Million element is taking around 70 Ms, it is pretty fast.

Writer Throughput

Lets look at another aspect of performance which is throughput:

BigArrayList WriteTP
X Axis – No Of Reading
Y Axis – Throughput /Second , in Millions

Writer throughput is very impressive, i ranges between 138 Million to 142 Million

Reader Performance

BigArrayList Reader
X Axis – No Of Reading
Y Axis – Time taken to read 10 Million in Ms

It is taking around 44 Ms to read 10 Million entry, very fast. With such type of performance you definitely challenge database.
Reader Throughput

BigArrayList ReaderTP
X Axis – No Of Reading
Y Axis – Throughput /Second , in Millions

Wow Throughput is great it is 220+ million per second

It looks very promising with 138 Million/Sec writer throughput & 220 Million/Sec reader throughput.

Comparison With Array List

Lets compare performance of BigArrayList with ArrayList,

Writer Throughput – BigArrayList Vs ArrayList

BigArrayList Vs ArrayList WriteTP
Throughput of BigArrayList is almost constant at around 138 Million/Sec, ArrayList starts with 50 Million and drops under 5 million.

ArrayList has lot of hiccups and it is due to 

– Array Allocation

– Array Copy

– Garbage Collection overhead

BigArrayList is winner in this case, it is 7X times faster than arraylist.

Reader Throughput – BigArrayList Vs ArrayList

BigArrayList Vs ArrayList ReaderTP
ArrayList performs better than BigArrayList, it is around 1X time faster.

BigArrayList is slower in this case because

– It has to keep mapping file in memory as more data is requested

– There is cost of un-marshaling

Reader Throughput for BigArrayList is 220+ Million/Sec, it is still very fast and only few application want to process message faster than that.

So for most of the use-case this should work.

Reader performance can be improved by using below techniques 

– Read message in batch from mapped stream

– Pre-fetch message by using Index, like what CPU does

By doing above changes we can improve performance by few million, but i think for most of the case current performance is pretty good

Conclusion

Memory mapped file is interesting area to do research, it can solve many performance problem. Java is now being used for developing trading application and GC is one question that you have to answer from day one, you need to find a way to keep GC happy and MemoryMapped is one thing that GC will love it.

Code used for this blog is available @ GitHub , i ran test with 2gb memory. Code does’t handle some edge case , but good enough to prove the point that that MemoryMapped file can be winner in many case.
 

Reference: ArrayList Using Memory Mapped File from our JCG partner Ashkrit Sharma at the Are you ready blog.

Do you want to know how to develop your skillset to become a Java Rockstar?

Subscribe to our newsletter to start Rocking right now!

To get you started we give you our best selling eBooks for FREE!

1. JPA Mini Book

2. JVM Troubleshooting Guide

3. JUnit Tutorial for Unit Testing

4. Java Annotations Tutorial

5. Java Interview Questions

6. Spring Interview Questions

7. Android UI Design

and many more ....

 

5 comments

  1. Hi,

    How do memory mapped file compare to off heap memory? I know that one drawback of memory mapped file is that you have to create a big temp file, but what about the performances? are there similar?

    Another point: I would like to manage a list of objects which can have different sizes. How do I do that? AFAIK, you always use fixed size objects in your examples.

    I suppose I have to manage a kind of index at the start of the structure which gives me the offset of each object. But what is the most efficient: to put this index on heap or off heap?

    And then of course will arise issues with fragmentation, compaction… There must be some classic algorithms which deal with this.

    • I have not done test to compare, it will interesting to compare, but i guess it will slow, may be i will write blog on that.

      One of main benefit of Memorymapped file is that it can be used for IPC and for loading only section of file in memory, so you can really work on large dataset.

      This implementation can’t handle objects of different size, you are very close to how it can be implemented. You have to create index & data file, index file can contains start and end address .

      May be you want to look @ https://github.com/peter-lawrey/Java-Chronicle
      Chronicle is designed to handle message of different size. Chronicle keeps Index & data file both in memory.

      Another real example of Memory mapped is MongoDB, it is using Memorymapped file for keeping all the Index info in memory.

      Memorymapped file is interesting area for research, many commercial product are using it.

  2. Hi, i’ve a text file of 300MB and i want the text file to loaded into an ArrayList and if i try to do it android its giving me java.lang.OutOfMemoryError as an exception. can you give me some suggestion for this problem or if you have a sample code for this??
    Thank you………

    • Did you profile code ?
      What is size of individual message ?
      What is the value of “noOfMessage” constructor parameter ?

      Code that i have posted is very much generic it has nothing to do with android or normal server application.

  3. Hi,
    why you need mask variable? When app trying to get index it is taking wrong address…

Leave a Reply

Your email address will not be published. Required fields are marked *

*


5 − = one

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

Want to take your Java Skills to the next level?
Grab our programming books for FREE!
  • Save time by leveraging our field-tested solutions to common problems.
  • The books cover a wide range of topics, from JPA and JUnit, to JMeter and Android.
  • Each book comes as a standalone guide (with source code provided), so that you use it as reference.
Last Step ...

Where should we send the free eBooks?

Good Work!
To download the books, please verify your email address by following the instructions found on the email we just sent you.