A Disk-Backed arraylist

Bozhidar BozhanovDecember 17th, 2019Last Updated: December 16th, 2019

1 286 2 minutes read

It sometimes happens that your list can become too big to fit in memory and you have to do something in order to avoid running out of memory.

The proper way to do that is streaming – instead of fitting everything in memory, you should stream data from the source and discard the entries that are already processed.

However, there are cases when code that’s outside of your control requires a List and you can’t use streaming. These cases are rather rare but in case you hit them, you have to find a workaround. One is to re-implement the code to work with streaming, but depending on the way the library is written, it may not be possible. So the other option is to use a disk-backed list – one that works as a list, but underneath stores and loads elements from disk.

Searching for existing solutions results in several 3+ years old repos like this one and this one and this one.

And then there’s MapDB, which is great and supported. It’s mostly about maps, but it does support a List as well, as shown here.

And finally, you have the option to implement something simpler yourself, in case you need just iteration and almost nothing else. I’ve done it here – DiskBackedArrayList.java. It doesn’t support many things (not all methods are overridden to throw an exception, but they should). But most importantly, it doesn’t support random adding and random getting, and also toArray(). It’s purely “fill the collection” and then “iterate the collection”. It relies on ObjectOutputStream which is not terribly efficient, but is simple to use. Note that I’ve allowed a short in-memory prependList in case small amounts of data need to be prepended to the list.

The list gets filled in memory until a specified threshold and then gets flushed to disk, clearing the memory which starts getting filled again. This too can be more efficient – with background flushing in another thread that doesn’t interfere with adding elements to the list, but optimizations complicate things and in this case the total running time was not an issue. Most importantly, the iterator() method is overridden to return a custom iterator that first streams the prepended list, then reads everything from disk and finally iterates over the latest batch which is still in memory. And finally, the clear() method should be called in the end in order to close the underlying stream. An output stream could be opened and closed on each flush, but ObjectOutputStream can’t be used in append mode due to some implementation specific about writing headers first.

So basically we hide the streaming approach underneath a List interface – it’s still streaming elements and discarding them when not needed. Ideally this should be done at the source of the data (e.g. a database, message queue, etc.) rather than using the disk as overflow space, but there are cases where using the disk is fine. This implementation is a starting point, as it’s not tested in production, but illustrates that you can adapt existing classes to use different data access patterns if needed.

Published on Java Code Geeks with permission by Bozhidar Bozhanov, partner at our JCG program. See the original article here: A Disk-Backed arraylist

Opinions expressed by Java Code Geeks contributors are their own.

Bozhidar BozhanovDecember 17th, 2019Last Updated: December 16th, 2019

1 286 2 minutes read

This site uses Akismet to reduce spam. Learn how your comment data is processed.

1 Comment

Oldest

Newest Most Voted

Inline Feedbacks

View all comments

ElenaGillbert

4 years ago

Hi…
I’m Elena gillbert.A Java disk-based ArrayList can sometimes occur when your list becomes too big to fit in memory and you have to do something to avoid running out of memory. The proper way to do that is streaming — instead of fitting everything in memory, you should stream data from the source and discard the entries that are already processed.

A Disk-Backed arraylist

Thank you!

Bozhidar Bozhanov

Thank you!

Thank you!

Related Articles

Thank you!