Enterprise Java

MongoDB Index Strategies and Types of Indexes

1. MongoDB Index Strategies and Types of Indexes – Introduction

MongoDB is an open-source, document-oriented, and cross-platform database which is developed in C++ and is one of the most popular and used NoSQL type databases. It works on top of JSON-like documents with key-value pairs whose schema can remain undefined across every document. Also, it is free to use as it is published under a combination of the GNU Affero General Public License and the Apache License.

In this lesson, we will discuss the types of indexes in MongoDB and the different strategies we can use to maximize the performance of our database and operations performed on it. To start, we will also look at what is the importance of indexes in a database and how the presence of an index can be an advantage and disadvantage for the queries which run on our data. We will also study about some of the properties which can be used to alter the behaviour of the MongoDB indexes we define in our database to make them behave in an extended manner than what simple indexes are meant for. This altered behaviour helps us to achieve much more things from our indexes than just increased performance at the query level. Let’s get started.

2. What is an Index?

An index in a database allows queries to find and filter data much more efficiently and faster than that query could have performed without the presence of an index. The simplest example of an index is something we all have already used in our books. At the beginning of each book, there is a “Table of contents” which help readers find the page numbers of the topics present in that book. To read a topic, we just have to find it in the TOC which is an ordered list of topics and we can get the page number from that. Indexes work in the same manner. As there can be multiple columns in a table (or a collection in case of MongoDB), an index can be formed on any of the columns present in a collection.

Multiple indexes in a collection are needed when there is a need to perform search and filter data on multiple parameters. For example, for a table which contains data about Books, we can filter Book data based on author name or price or book name or any other field present in the table.

We also mentioned that the presence of indexes can slow down the performance of a database. This is possible when there too many indexes in a collection. This happens because whenever there is an insertion in a collection where too many indexes are built, all of these indexes have to be recalculated to adjust the new data which is not an asynchronous task. Only when all indexes have been updated, only then a write can be called as a success. This means that if there are too many indexes on any of the collections in a DB, all of these will have to be revisited and recalculated by MongoDB when an insertion happens in the corresponding collection.

3. Types of Indexes in MongoDB

MongoDB provides many different ways in which an index can be formed and stored in the memory (and disk). Each of these indexes serves a different purpose and might be applicable for only some of the datatypes. Let’s look at these index types here.

3.1 Single Field Index

MongoDB supports single field indexes in all data-types and can be defined on any user-defined field of a document.

This is to be noted that for a single field index, the sort order of an index key doesn’t matter as MongoDB can read the index in either direction. if we want to create a single field index on a field book_name, we can use the following query:

Single Field Index

db.books.createIndex( { book_name: 1 } )

In above query, the number 1 specifies the order of the index (which actually doesn’t matter because of MongoDB’s capability of reading an index in either directions).

3.2 Compound Index

It is often that we need to search a table/collection on the basis of multiple fields and this is very frequent. If this is the case, we might consider making Compound Indexes in MongoDB. Compound Index supports indexing on the basis of multiple fields which expand the idea of indexes and spread them to a larger domain in an index.

An important thing to note while making a Compound Index is that order of fields matters. So, if we run the following query:

Compound Index

db.books.createIndex( { price: 1, book_name: 1 } )

In this Compound index, values are first sorted by price field and then within each price value, they are sorted by book_name field. This also means that the order of fields decides if the keys of this index can support a sort operation or not. This also means that we run the following query:

Compound Index

db.books.createIndex( { book_name: 1, price: 1 } )

In this case, this will create yet another index even when fields are the same and won’t reuse the index we made with the last query. This also means that if there is a new record inserted in this collection, both of these indexes will be recalculated, which makes the write operations heavier and hence, slower.

3.3 Multikey Index

The two types of indexes we studied were simple and were using different keys for each index created. Those indexes were also applicable on all data-types. A Multikey Index is an index which is made on an array field and is used to index the content stored in an array.

When the content of an array is indexed, MongoDB explodes the array, create multiple fields with the same name with each field containing different values in that array:

MongoDB Index Strategies - Exploding an Array in MongoDB Index
Exploding an Array in MongoDB Index

This allows very efficient queries which try to match values passed in the query to a single array field or a collection of array fields. The good thing is that MongoDB can decide itself when to create a Multikey Index if specified field is an array.

One of the limitations we can run into when trying to fine-tune our database is that multikey indexes might not completely cover the filter specified in a query. Covering a query with the index means that we can get our result data entirely from the index without accessing the data in our database at all. This can result in dramatically increased performance as indexes are most likely to be stored in RAM.

3.4 Geospatial Index

MongoDB allows us to save Geospatial shapes in our databases by enabling us to store a collection of Geo-JSON in documents. For efficient querying of geospatial data, MongoDB provides two type of indexes internally:

  1. 2d Indexes which uses planar geometry when returning results
  2. 2d Sphere indexes that use spherical geometry to return results

Read more about how these indexes work here. With Geospatial shapes in our databases, we can easily run queries to find a burger joint near your current location andGeospatial indexes help to perform this search much faster.

3.5 Text Index

MongoDB also provides the capability to make indexes on text fields which also supports searching for some string content in a collection. It is to be noted that these indexes do not store stop words like “the”, “a”, “or”. Within a text index, the words are stemmed to only store the root words. We can use the following query to create a text index on a field:

Text Index

db.books.createIndex( { book_name: "text" } )

If you are using a language other than English to index a text field, we can use the query:

Text Index with Language

db.books.createIndex( { book_name: "text" }, { default_language: "french" } )

A text index is case- and diacritic-insensitive. Version 3 of the text index (the one that comes with version 3.4) supports common C, simple S, and the special T case foldings as described in Unicode Character Database 8.0 case folding. In addition to case insensitivity, version 3 of the text index supports diacritic insensitivity.

With the high performance measure of text-index, MongoDB gives a tight challenge to Elasticsearch which is a database majorly used for Text-search queries.

3.5 Hashed Index

The last type of index we will study is a Hashed Index. This type of index allows us to perform Hash-based sharding on our content. In this type of index, the value of the key is hashed. Due to this reason, these indexes can only support equality match filter queries and cannot work on range-based queries.

If we want to run range queries on an index, we might have to create multiple indexes on the same field, one of which can be a regular index and the other one can be a hashed index. Finally, Hashed indexes truncate floating point fields to integers. Floating points should be avoided for hashed fields wherever possible.

4. Properties of an Index in MongoDB

The behaviour of an index can be altered in MongoDB by specifying specific properties for that index. Some of these properties are:

4.1 Unique Index

These are indexes which can be made unique by a specification. This way, when a single field index is asked to remain unique, it will reject values which already exist in the collection for that key. Any index can be made unique in MongoDB.

In a compound index, the uniqueness of an index value is maintained with the combination of values corresponding to the keys which makes up the compound index.

4.2 Partial Index

If you know that only some of the documents need to be indexed for a specified key or set of keys, we can turn an index into a partial index by specifying a filter query. Only documents which pass through this filter will be indexed on the specified field. This way, Partial Indexes have much lower storage requirements and are much faster than normal indexes as the amount of data is lesser.

It is to be noted that a query will run against a partial index only if the complete query can be satisfied with the partial index.

4.3 Sparse Index

The sparse property of an index makes sure that the index only contains an entry for documents which actually contain the indexed field. The sparse index completely skips documents that do not have the indexed field.

It is to be noted that a partial index is preferred over a sparse index as all of the functionalities of a sparse index can be achieved with partial indexes with more additions.

4.4 TTL Index

If you want to remove documents from a collection after a specified period of time, we can make TTL indexes on a field. This is an important property of an index and can be applied for data which updates regularly, making older data stale and not useful for future purposes, like log data.

The data in MongoDB is removed by a background job running every 60 seconds (or in a specified time). As a result, there is no explicit guarantee as to how much longer documents will persist past their expiration date.

5. Limitations of Indexes

Although we have studied many advantages of indexes in the lesson till now, indexes also have some disadvantages or limitations associated with them. Let’s read them here:

  1. A single collection in MongoDB can have maximum 64 indexes only. This becomes an issue when the document size is large and we might have to break our documents to be covered across multiple collections.
  2. The fully-qualified index name in a document cannot consist of more than 128 characters. The FQN for an index consist of <db_name>.<collection-name>.$<index_name>.
  3. In a compound index, there cannot be more than 31 fields.
  4. A MongoDB query cannot use both text and geospatial indexes. We cannot combine $text operator with any other operator associated with a special index. For example, the $text operator and $near operator cannot be used together.
  5. Fields with 2d sphere indexes can only contain the geometry data. So, points on a planer co-ordinate system, [x, y] are allowed. For non-geometries, the data query operation will fail if any other type of data is saved in this index.
  6. As data in an index lives majorly in RAM when a MongoDB instance is running, they can consume major amount of memory on the machine. This also make MongoDB indexes extremely fast though.
  7. MongoDB indexes are made in foreground by default. This means that all operations on the collections are blocked until the index is built completely. However, this behaviour can be overridden by specifying background creation property in the query.

6. Conclusion

In this lesson, we studied various types of indexes which exists in MongoDB and how their behaviour can be altered and extended with many properties & constraints we can impose on them. We also described some limitations we have while working with Indexes which we should take care fo while making Indexes on a collection and also while inserting data into a MongoDB collection which contain many indexes to ensure that we do not make our database a write-heavy database, resulting in overall performance loss.

The major aim of Indexes is increasing the performance of a database by a major factor if special care is taken before indexes are created and memory is correctly managed for a MongoDB instance for your application.

Read about how you can get started with a Java application which integrates with MongoDB and performs various queries on it with this post. If you prefer Javascript, read this post.

Shubham Aggarwal

Shubham is a Java EE Engineer with about 3 years of experience in building quality products with Spring Boot, Spring Data, AWS, Kafka, PrestoDB.
Subscribe
Notify of
guest

This site uses Akismet to reduce spam. Learn how your comment data is processed.

1 Comment
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
5 years ago

Thank you. Very successful work. We are waiting for the following articles.

Back to top button