Bozhidar Bozhanov

About Bozhidar Bozhanov

Senior Java developer, one of the top stackoverflow users, fluent with Java and Java technology stacks - Spring, JPA, JavaEE. Founder and creator of Computoser and Welshare. Worked on Ericsson projects, Bulgarian e-government projects and large-scale online recruitment platforms.

Caveats With MySQL Pagination

We’ve all done pagination – it’s sensible for both lists in the UI and for processing data in batches. “SELECT (columns) FROM table LIMIT X, Y”. Possibly “SELECT (columns) FROM table ORDER BY some_column LIMIT X, Y”.

What’s the caveat? Especially when working with an ORM and when processing batches, where order doesn’t matter, you are likely to omit the ORDER BY clause. You would expect the results to be ordered by the primary key. And they are in most cases (though it is not guaranteed), but when you EXPLAIN the query, you can see that the index is not used for the query – a full table scan is performed to fetch the results, which is slow. Note that if we omit the LIMIT clause, again a full table scan is used, but this is something you’d rarely do – i.e. a query with no WHERE clause and

no pagination.

So, rule #1: always include an ORDER BY clause in order to use the index.

For lists in the UI you’d usually order by some column – date modified, name, etc. That also performs a full table scan, unless you have an index on that column.

So, rule #2: always have an index on the column you ORDER BY

Kind of obvious things, but they may be silent performance issues lurking in your project. And now onto something not so obvious, that surprised me. When using LIMIT X, Y, MySQL scans the whole table (up until X+Y), regardless of the index. Note that even if the “type” (in the result of EXPLAIN) is not ALL (full table scan), but “index”, MySQL may still scan most of the rows – check the “rows” column. But it can also be misleading (in case of LIMIT it shows that rows = everything, but stops when it fills the result with the desired number of rows). So if you have LIMIT 100000, 50 MySQL will scan 100050 rows. This makes the query slow (the query time reflects that – the larger the offset, the longer the query). This is due to MySQL not maintaining an index on the row number. It cannot use the primary key, because even if it is auto_increment, it has gaps. Luckily, in huge tables you normally have a WHERE clause which forces the use of an index and reduces the number of rows to scan. Also, in a UI you will rarely find yourself querying for the 100000th record. But anyway it is something to consider, especially in the case of batch processing jobs which need to go through the whole table. You do that in batches, because 1. ORMs may store everything in memory and just kill your application 2. the transaction will become really long.

So, rule #3: when performing batch operations on tables with a lot of rows, do not use LIMIT. Use WHERE id > X and id < Y (or BETWEEN)

This does not guarantee that each batch will be the same size, but that doesn’t matter.

The rules above are “rule of thumbs”, rather than “must dos”, but it’s good to have them in mind. The bottom-line here is that you should analyze your performance, find specifically slow queries, then EXPLAIN them and see how they can be optimized and what would be the impact of the optimization.

P.S. Here’s a presentation on the low level details of MySQL engines regarding indexes.
 

Reference: Caveats With MySQL Pagination from our JCG partner Bozhidar Bozhanov at the Bozho’s tech blog blog.

Do you want to know how to develop your skillset to become a Java Rockstar?

Subscribe to our newsletter to start Rocking right now!

To get you started we give you two of our best selling eBooks for FREE!

JPA Mini Book

Learn how to leverage the power of JPA in order to create robust and flexible Java applications. With this Mini Book, you will get introduced to JPA and smoothly transition to more advanced concepts.

JVM Troubleshooting Guide

The Java virtual machine is really the foundation of any Java EE platform. Learn how to master it with this advanced guide!

Given email address is already subscribed, thank you!
Oops. Something went wrong. Please try again later.
Please provide a valid email address.
Thank you, your sign-up request was successful! Please check your e-mail inbox.
Please complete the CAPTCHA.
Please fill in the required fields.

Leave a Reply


9 − eight =



Java Code Geeks and all content copyright © 2010-2014, Exelixis Media Ltd | Terms of Use | Privacy Policy
All trademarks and registered trademarks appearing on Java Code Geeks are the property of their respective owners.
Java is a trademark or registered trademark of Oracle Corporation in the United States and other countries.
Java Code Geeks is not connected to Oracle Corporation and is not sponsored by Oracle Corporation.
Do you want to know how to develop your skillset and become a ...
Java Rockstar?

Subscribe to our newsletter to start Rocking right now!

To get you started we give you two of our best selling eBooks for FREE!

Get ready to Rock!
You can download the complementary eBooks using the links below:
Close