Home » Java » Core Java » Choosing a fast unique identifier (UUID) for Lucene

About Michael McCandless

Michael McCandless

Choosing a fast unique identifier (UUID) for Lucene

Most search applications using Apache Lucene assign a unique id, or primary key, to each indexed document. While Lucene itself does not require this (it could care less!), the application usually needs it to later replace, delete or retrieve that one document by its external id. Most servers built on top of Lucene, such as Elasticsearch and Solr, require a unique id and can auto-generate one if you do not provide it.

Sometimes your id values are already pre-defined, for example if an external database or content management system assigned one, or if you must use a URI, but if you are free to assign your own ids then what works best for Lucene?

One obvious choice is Java’s UUID class, which generates version 4 universally unique identifiers, but it turns out this is the worst choice for performance: it is 4X slower than the fastest. To understand why requires some understanding of how Lucene finds terms.

BlockTree terms dictionary

The purpose of the terms dictionary is to store all unique terms seen during indexing, and map each term to its metadata (docFreq, totalTermFreq, etc.), as well as the postings (documents, offsets, postings and payloads). When a term is requested, the terms dictionary must locate it in the on-disk index and return its metadata.

The default codec uses the BlockTree terms dictionary, which stores all terms for each field in sorted binary order, and assigns the terms into blocks sharing a common prefix. Each block contains between 25 and 48 terms by default. It uses an in-memory prefix-trie index structure (an FST) to quickly map each prefix to the corresponding on-disk block, and on lookup it first checks the index based on the requested term’s prefix, and then seeks to the appropriate on-disk block and scans to find the term.

In certain cases, when the terms in a segment have a predictable pattern, the terms index can know that the requested term cannot exist on-disk. This fast-match test can be a sizable performance gain especially when the index is cold (the pages are not cached by the the OS’s IO cache) since it avoids a costly disk-seek. As Lucene is segment-based, a single id lookup must visit each segment until it finds a match, so quickly ruling out one or more segments can be a big win. It is also vital to keep your segment counts as low as possible!

Given this, fully random ids (like UUID V4) should perform worst, because they defeat the terms index fast-match test and require a disk seek for every segment. Ids with a predictable per-segment pattern, such as sequentially assigned values, or a timestamp, should perform best as they will maximize the gains from the terms index fast-match test.

Testing Performance

I created a simple performance tester to verify this; the full source code is here. The test first indexes 100 million ids into an index with 7/7/8 segment structure (7 big segments, 7 medium segments, 8 small segments), and then searches for a random subset of 2 million of the IDs, recording the best time of 5 runs. I used Java 1.7.0_55, on Ubuntu 14.04, with a 3.5 GHz Ivy Bridge Core i7 3770K.

Since Lucene’s terms are now fully binary as of 4.0, the most compact way to store any value is in binary form where all 256 values of every byte are used. A 128-bit id value then requires 16 bytes.

I tested the following identifier sources:

For the UUIDs and Flake IDs I also tested binary encoding in addition to their standard (base 16 or 36) encoding. Note that I only tested lookup speed using one thread, but the results should scale linearly (on sufficiently concurrent hardware) as you add threads.

chart

Zero-padded sequential ids, encoded in binary are fastest, quite a bit faster than non-zero-padded sequential ids. UUID V4 (using Java’s UUID.randomUUID()) is ~4X slower.

But for most applications, sequential ids are not practical. The 2nd fastest is UUID V1, encoded in binary. I was surprised this is so much faster than Flake IDs since Flake IDs use the same raw sources of information (time, node id, sequence) but shuffle the bits differently to preserve total ordering. I suspect the problem is the number of common leading digits that must be traversed in a Flake ID before you get to digits that differ across documents, since the high order bits of the 64-bit timestamp come first, whereas UUID V1 places the low order bits of the 64-bit timestamp first. Perhaps the terms index should optimize the case when all terms in one field share a common prefix.

I also separately tested varying the base from 10, 16, 36, 64, 256 and in general for the non-random ids, higher bases are faster. I was pleasantly surprised by this because I expected a base matching the BlockTree block size (25 to 48) would be best.

There are some important caveats to this test (patches welcome)! A real application would obviously be doing much more work than simply looking up ids, and the results may be different as hotspot must compile much more active code. The index is fully hot in my test (plenty of RAM to hold the entire index); for a cold index I would expect the results to be even more stark since avoiding a disk-seek becomes so much more important. In a real application, the ids using timestamps would be more spread apart in time; I could “simulate” this myself by faking the timestamps over a wider range. Perhaps this would close the gap between UUID V1 and Flake IDs? I used only one thread during indexing, but a real application with multiple indexing threads would spread out the ids across multiple segments at once.

I used Lucene’s default TieredMergePolicy, but it is possible a smarter merge policy that favored merging segments whose ids were more “similar” might give better results. The test does not do any deletes/updates, which would require more work during lookup since a given id may be in more than one segment if it had been updated (just deleted in all but one of them).

Finally, I used using Lucene’s default Codec, but we have nice postings formats optimized for primary-key lookups when you are willing to trade RAM for faster lookups, such as this Google summer-of-code project from last year and MemoryPostingsFormat. Likely these would provide sizable performance gains!

Reference: Choosing a fast unique identifier (UUID) for Lucene from our JCG partner Michael Mc Candless at the Changing Bits blog.
Do you want to know how to develop your skillset to become a Java Rockstar?
Subscribe to our newsletter to start Rocking right now!
To get you started we give you our best selling eBooks for FREE!
1. JPA Mini Book
2. JVM Troubleshooting Guide
3. JUnit Tutorial for Unit Testing
4. Java Annotations Tutorial
5. Java Interview Questions
6. Spring Interview Questions
7. Android UI Design
and many more ....
Email address:

Leave a Reply

Be the First to Comment!

avatar
  Subscribe  
Notify of
Want to take your Java skills to the next level?
Grab our programming books for FREE!
Here are some of the eBooks you will get:
  • Spring Interview QnA
  • Multithreading & Concurrency QnA
  • JPA Minibook
  • JVM Troubleshooting Guide
  • Advanced Java
  • Java Interview QnA
  • Java Design Patterns
The Spring Framework Cookbook
  • Learn the best practices of the Spring Framework
  • Build simple, portable, fast and flexible JVM-based systems and applications
  • Explore specific projects like Boot and Batch
The Spring Data Programming Cookbook
  • Learn how to use data access technologies and cloud-based data services
  • Set up the environment and create a basic project
  • Learn how to handle the various modules (e.g. JPA, MongoDB, Redis etc.)
The Selenium Programming Cookbook
  • Kick-start your own projects using this testing framework for web applications
  • Learn JUnit integration and Standalone Server functionality
  • Find out the most popular Interview Questions about the Selenium Framework
The Mockito Programming Cookbook
  • Kick-start your own web projects using this open source testing framework
  • Write simple test cases using the Mockito Framework
  • Learn how to integrate with JUnit, Maven and other frameworks
The JUnit Programming Cookbook
  • Learn basic usage and configuration of JUnit
  • Create multithreaded tests
  • Learn how to integrate with other testing frameworks
The JSF 2.0 Programming Cookbook
  • Build component-based user interfaces for web applications
  • Set up the environment and create a basic project
  • Learn Internationalization and Facelets Templates
The Amazon S3 Tutorial
  • Develop your own Amazon S3 based applications
  • Learn API usage and pricing
  • Get your own projects up and running in minimum time
Java Design Patterns
  • Learn how Design Patterns are implemented and utilized in Java
  • Understand the reasons why patterns are so important
  • Learn when and how to apply each one of them
Java Concurrency Essentials
  • Dive into the magic of concurrency
  • Learn about concepts like atomicity, synchronization and thread safety
  • Learn about testing concurrent applications
The IntelliJ IDEA Handbook
  • Kick-start your own programming projects using IntelliJ IDEA
  • Learn how to setup and install plugins
  • Create UIs with this Java integrated development environment
The Git Tutorial
  • Learn why Git differs from other version control systems
  • Explore Git's usage and best practises
  • Learn branching strategies
The Eclipse IDE Handbook
  • Explore the most widely used Java IDE
  • Learn how to setup and install plugins
  • Built your own projects up and running in minimum time
The Docker Containerization Cookbook
  • Explore the world’s leading software containerization platform
  • Learn how to wrap a piece of software in a complete filesystem
  • Learn how to use DNS and various commands
Developing Modern Applications With Scala
  • Develop modern Scala applications
  • Build SBT and reactive applications
  • Learn about testing and database access
The Apache Tomcat Cookbook
  • Explore Apache Tomcat open-source web server
  • Learn about installation, configuration, logging and clustering
  • Kick-start your own web projects using Apache Tomcat
The Apache Maven Cookbook
  • Explore the Apache Maven build automation tool
  • Learn about Maven's project structure and configuration
  • Learn about Maven's dependency management and plug-ins
The Apache Hadoop Cookbook
  • Explore the Apache Hadoop open-source software framework
  • Learn distributed caching and streaming
  • Kick-start your own web projects using Apache Hadoop
The Android Programming Cookbook
  • Explore the Android mobile operating system
  • Learn about services and page views
  • Learn about Google Maps and Bluetooth functionality
The Elasticsearch Tutorial
  • Explore the Elasticsearch search engine
  • Develop your own Elasticsearch based applications
  • Learn operations, Java API Integration and reporting
Amazon DynamoDB Tutorial
  • Develop your own Amazon DynamoDB based applications
  • Learn DynamoDB Concepts and Best Practices
  • Get your own projects up and running in minimum time
Java NIO Programming Cookbook
  • Learn features for intensive I/O operations
  • Follow a series of tutorials on Java NIO examples
  • Get knowledge on Java Nio Socket and Asynchronous Channels
JBoss Drools Cookbook
  • Explore Drools business rule management system
  • Follow a series of tutorials on Drools examples
  • Get knowledge on business rules for a shopping domain model
Vaadin Programming Cookbook
  • Explore Vaadin web framework for rich Internet applications
  • Learn the Architecture and Best Practices
  • Get knowledge on Data Binding and Custom Components
Groovy Programming Cookbook
  • Explore Apache Groovy object-oriented programming language
  • Create sample applications and explore interview questions
  • Get knowledge on Callback functionality and various widgets
GWT Programming Cookbook
  • Explore the open source Google Web Toolkit
  • Create sample applications and explore interview questions
  • Create and maintain complex JavaScript front-end applications in Java
Do you want to know how to develop your skillset to become a Java Rockstar?
Subscribe to our newsletter to start Rocking right now!
To get you started we give you our best selling eBooks for FREE!
1. JPA Mini Book
2. JVM Troubleshooting Guide
3. JUnit Tutorial for Unit Testing
4. Java Annotations Tutorial
5. Java Interview Questions
and many more ....
Email address:
Do you want to know how to develop your skillset to become a Java Rockstar?
Subscribe to our newsletter to start Rocking right now!
To get you started we give you our best selling eBooks for FREE!
1. JPA Mini Book
2. JVM Troubleshooting Guide
3. JUnit Tutorial for Unit Testing
4. Java Annotations Tutorial
5. Java Interview Questions
and many more ....
Email address:
Do you want to know how to develop your skillset to become a Java Rockstar?
Subscribe to our newsletter to start Rocking right now!
To get you started we give you our best selling eBooks for FREE!
1. JPA Mini Book
2. JVM Troubleshooting Guide
3. JUnit Tutorial for Unit Testing
4. Java Annotations Tutorial
5. Java Interview Questions
6. Spring Interview Questions
7. Android UI Design
and many more ....
Email address:
Want to be a DynamoDB Master ?
Subscribe to our newsletter and download the Amazon DynamoDB Tutorial right now!
In order to help you master this Amazon NoSQL database service, we have compiled a kick-ass guide with all the major DynamoDB features and use cases! Besides studying them online you may download the eBook in PDF format!
Email address:
Programming Interview Coming Up?
Subscribe to our newsletter and download the Ultimate Multithreading and Concurrency interview questions and answers collection right now!
In order to get you prepared for your next Programming Interview, we have compiled a huge list of relevant Questions and their respective Answers. Besides studying them online you may download the eBook in PDF format!
Email address:
Java Interview Coming Up?
Subscribe to our newsletter and download the Ultimate Spring interview questions and answers collection right now!
In order to get you prepared for your next Java Interview, we have compiled a huge list of relevant Questions and their respective Answers. Besides studying them online you may download the eBook in PDF format!
Email address:
Java Interview Coming Up?
Subscribe to our newsletter and download the Ultimate Java interview questions and answers collection right now!
In order to get you prepared for your next Java Interview, we have compiled a huge list of relevant Questions and their respective Answers. Besides studying them online you may download the eBook in PDF format!
Email address:
Want to be a Java NIO Master ?
Subscribe to our newsletter and download the Java NIO Programming Cookbook right now!
In order to help you master Java NIO Library, we have compiled a kick-ass guide with all the major Java NIO features and use cases! Besides studying them online you may download the eBook in PDF format!
Email address:
Want to be a Drools Master ?
Subscribe to our newsletter and download the JBoss Drools Cookbook right now!
In order to help you master Drools Business Rule Management System, we have compiled a kick-ass guide with all the major Drools features and use cases! Besides studying them online you may download the eBook in PDF format!
Email address:
Want to be an iText Master ?
Subscribe to our newsletter and download the iText Tutorial right now!
In order to help you master iText Library, we have compiled a kick-ass guide with all the major iText features and use cases! Besides studying them online you may download the eBook in PDF format!
Email address:
Want to be an Elasticsearch Master ?
Subscribe to our newsletter and download the Elasticsearch Tutorial right now!
In order to help you master Elasticsearch search engine, we have compiled a kick-ass guide with all the major Elasticsearch features and use cases! Besides studying them online you may download the eBook in PDF format!
Email address:
Want to be a Scala Master ?
Subscribe to our newsletter and download the Scala Cookbook right now!
In order to help you master Scala, we have compiled a kick-ass guide with all the basic concepts! Besides studying them online you may download the eBook in PDF format!
Email address:
Want to be a JUnit Master ?
Subscribe to our newsletter and download the JUnit Programming Cookbook right now!
In order to help you master unit testing with JUnit, we have compiled a kick-ass guide with all the major JUnit features and use cases! Besides studying them online you may download the eBook in PDF format!
Email address:
Want to master Amazon Web Services ?
Subscribe to our newsletter and download the Amazon S3 Tutorial right now!
In order to help you master the leading Web Services platform, we have compiled a kick-ass guide with all its major features and use cases! Besides studying them online you may download the eBook in PDF format!
Email address:
Want to master Spring Framework ?
Subscribe to our newsletter and download the Spring Framework Cookbook right now!
In order to help you master the leading and innovative Java framework, we have compiled a kick-ass guide with all its major features and use cases! Besides studying them online you may download the eBook in PDF format!
Email address:
Want to master Eclipse IDE ?
Subscribe to our newsletter and download the Eclipse IDE Handbook right now!
In order to help you master Eclipse, we have compiled a kick-ass guide with all the basic features of the popular IDE! Besides studying them online you may download the eBook in PDF format!
Email address:
Want to master IntelliJ IDEA ?
Subscribe to our newsletter and download the IntelliJ IDEA Handbook right now!
In order to help you master IntelliJ IDEA, we have compiled a kick-ass guide with all the basic features of the popular IDE! Besides studying them online you may download the eBook in PDF format!
Email address:
Want to master Docker ?
Subscribe to our newsletter and download the Docker Containerization Cookbook right now!
In order to help you master Docker, we have compiled a kick-ass guide with all the basic concepts of the Docker container system! Besides studying them online you may download the eBook in PDF format!
Email address:
Want to create a kick-ass Android App ?
Subscribe to our newsletter and download the Android Programming Cookbook right now!
With this book, you will delve into the fundamentals of Android programming. You will understand user input, views and layouts. Furthermore, you will learn how to communicate over Bluetooth and also leverage Google Maps into your application!
Email address:
Want to be a GIT Master ?
Subscribe to our newsletter and download the GIT Tutorial eBook right now!
In order to help you master GIT, we have compiled a kick-ass guide with all the basic concepts of the GIT version control system! Besides studying them online you may download the eBook in PDF format!
Email address:
Want to be a Hadoop Master ?
Subscribe to our newsletter and download the Apache Hadoop Cookbook right now!
In order to help you master Apache Hadoop, we have compiled a kick-ass guide with all the basic concepts of a Hadoop cluster! Besides studying them online you may download the eBook in PDF format!
Email address:
Want to master Spring Data ?
Subscribe to our newsletter and download the Spring Data Ultimate Guide right now!
In order to help you master Spring Data, we have compiled a kick-ass guide with all the major features and use cases! Besides studying them online you may download the eBook in PDF format!
Email address:
Want to create a kick-ass Android App ?
Subscribe to our newsletter and download the Android UI Design mini-book right now!
With this book, you will delve into the fundamentals of Android UI design. You will understand user input, views and layouts, as well as adapters and fragments. Furthermore, you will learn how to add multimedia to an app and also leverage themes and styles!
Email address:
Want to be a Java 8 Ninja ?
Subscribe to our newsletter and download the Java 8 Features Ultimate Guide right now!
In order to get you up to speed with the major Java 8 release, we have compiled a kick-ass guide with all the new features and goodies! Besides studying them online you may download the eBook in PDF format!
Email address:
Want to master Java Annotations ?
Subscribe to our newsletter and download the Java Annotations Ultimate Guide right now!
In order to help you master the topic of Annotations, we have compiled a kick-ass guide with all the major features and use cases! Besides studying them online you may download the eBook in PDF format!
Email address:
Want to be a JUnit Master ?
Subscribe to our newsletter and download the JUnit Ultimate Guide right now!
In order to help you master unit testing with JUnit, we have compiled a kick-ass guide with all the major JUnit features and use cases! Besides studying them online you may download the eBook in PDF format!
Email address:
Want to master Java Abstraction ?
Subscribe to our newsletter and download the Abstraction in Java Ultimate Guide right now!
In order to help you master the topic of Abstraction, we have compiled a kick-ass guide with all the major features and use cases! Besides studying them online you may download the eBook in PDF format!
Email address:
Want to master Java Reflection ?
Subscribe to our newsletter and download the Java Reflection Ultimate Guide right now!
In order to help you master the topic of Reflection, we have compiled a kick-ass guide with all the major features and use cases! Besides studying them online you may download the eBook in PDF format!
Email address:
Want to be a JMeter Master ?
Subscribe to our newsletter and download the JMeter Ultimate Guide right now!
In order to help you master load testing with JMeter, we have compiled a kick-ass guide with all the major JMeter features and use cases! Besides studying them online you may download the eBook in PDF format!
Email address:
Want to be a Servlets Master ?
Subscribe to our newsletter and download the Java Servlet Ultimate Guide right now!
In order to help you master programming with Java Servlets, we have compiled a kick-ass guide with all the major servlet API uses and showcases! Besides studying them online you may download the eBook in PDF format!
Email address:
Want to be a JAXB Master ?
Subscribe to our newsletter and download the JAXB Ultimate Guide right now!
In order to help you master XML Binding with JAXB, we have compiled a kick-ass guide with all the major JAXB features and use cases! Besides studying them online you may download the eBook in PDF format!
Email address:
Want to be a JDBC Master ?
Subscribe to our newsletter and download the JDBC Ultimate Guide right now!
In order to help you master database programming with JDBC, we have compiled a kick-ass guide with all the major JDBC features and use cases! Besides studying them online you may download the eBook in PDF format!
Email address:
Want to be a JPA Master ?
Subscribe to our newsletter and download the JPA Ultimate Guide right now!
In order to help you master programming with JPA, we have compiled a kick-ass guide with all the major JPA features and use cases! Besides studying them online you may download the eBook in PDF format!
Email address:
Want to be a Hibernate Master ?
Subscribe to our newsletter and download the Hibernate Ultimate Guide right now!
In order to help you master JPA and database programming with Hibernate, we have compiled a kick-ass guide with all the major Hibernate features and use cases! Besides studying them online you may download the eBook in PDF format!
Email address:
Want to be a JSF Master ?
Subscribe to our newsletter and download the JSF 2.0 Programming Cookbook right now!
In order to get you prepared for your JSF development needs, we have compiled numerous recipes to help you kick-start your projects. Besides reading them online you may download the eBook in PDF format!
Email address:
Want to be a Java Master ?
Subscribe to our newsletter and download the Advanced Java Guide right now!
In order to help you master the Java programming language, we have compiled a kick-ass guide with all the must-know advanced Java features! Besides studying them online you may download the eBook in PDF format!
Email address:
Want to be a Java Master ?
Subscribe to our newsletter and download the Java Design Patterns right now!
In order to help you master the Java programming language, we have compiled a kick-ass guide with all the must-know Design Patterns for Java! Besides studying them online you may download the eBook in PDF format!
Email address:
Want to be a Hadoop Master ?
Subscribe to our newsletter and download the Hadoop Tutorial right now!
In order to help you master Apache Hadoop, we have compiled a kick-ass guide with all the basic concepts of a Hadoop cluster! Besides studying them online you may download the eBook in PDF format!
Email address:
Want to be an Elastic Beanstalk Master ?
Subscribe to our newsletter and download the Amazon Elastic Beanstalk Tutorial right now!
In order to help you master AWS Elastic Beanstalk, we have compiled a kick-ass guide with all the major Elastic Beanstalk features and use cases! Besides studying them online you may download the eBook in PDF format!
Email address:
Amazon Elastic Beanstalk Tutorial
  • Develop your own Amazon Elastic Beanstalk based applications
  • Learn Java Integration and Command Line Interfacing
  • Get your own projects up and running in minimum time
Want to take your Java skills to the next level?
Grab our programming books for FREE!
Here are some of the eBooks you will get:
  • Spring Interview QnA
  • Multithreading & Concurrency QnA
  • JPA Minibook
  • JVM Troubleshooting Guide
  • Advanced Java
  • Java Interview QnA
  • Java Design Patterns
Insiders are already enjoying weekly updates and complimentary whitepapers!
Join them now to gain exclusive access to the latest news in the Java world, as well as insights about Android, Scala, Groovy and other related technologies.
Email address:
Want to be an ActiveMQ Master ?
Subscribe to our newsletter and download the Apache ActiveMQ Cookbook right now!
In order to help you master Apache ActiveMQ JMS, we have compiled a kick-ass guide with all the major ActiveMQ features and use cases! Besides studying them online you may download the eBook in PDF format!
Email address:
Apache ActiveMQ Cookbook
  • Explore Apache ActiveMQ Best Practices
  • Learn ActiveMQ Load Balancing and File Transfer
  • Develop your own Apache ActiveMQ projects