Bozhidar Bozhanov

About Bozhidar Bozhanov

Senior Java developer, one of the top stackoverflow users, fluent with Java and Java technology stacks - Spring, JPA, JavaEE. Founder and creator of Computoser and Welshare. Worked on Ericsson projects, Bulgarian e-government projects and large-scale online recruitment platforms.

Caveats With MySQL Pagination

We’ve all done pagination – it’s sensible for both lists in the UI and for processing data in batches. “SELECT (columns) FROM table LIMIT X, Y”. Possibly “SELECT (columns) FROM table ORDER BY some_column LIMIT X, Y”.

What’s the caveat? Especially when working with an ORM and when processing batches, where order doesn’t matter, you are likely to omit the ORDER BY clause. You would expect the results to be ordered by the primary key. And they are in most cases (though it is not guaranteed), but when you EXPLAIN the query, you can see that the index is not used for the query – a full table scan is performed to fetch the results, which is slow. Note that if we omit the LIMIT clause, again a full table scan is used, but this is something you’d rarely do – i.e. a query with no WHERE clause and

no pagination.

So, rule #1: always include an ORDER BY clause in order to use the index.

For lists in the UI you’d usually order by some column – date modified, name, etc. That also performs a full table scan, unless you have an index on that column.

So, rule #2: always have an index on the column you ORDER BY

Kind of obvious things, but they may be silent performance issues lurking in your project. And now onto something not so obvious, that surprised me. When using LIMIT X, Y, MySQL scans the whole table (up until X+Y), regardless of the index. Note that even if the “type” (in the result of EXPLAIN) is not ALL (full table scan), but “index”, MySQL may still scan most of the rows – check the “rows” column. But it can also be misleading (in case of LIMIT it shows that rows = everything, but stops when it fills the result with the desired number of rows). So if you have LIMIT 100000, 50 MySQL will scan 100050 rows. This makes the query slow (the query time reflects that – the larger the offset, the longer the query). This is due to MySQL not maintaining an index on the row number. It cannot use the primary key, because even if it is auto_increment, it has gaps. Luckily, in huge tables you normally have a WHERE clause which forces the use of an index and reduces the number of rows to scan. Also, in a UI you will rarely find yourself querying for the 100000th record. But anyway it is something to consider, especially in the case of batch processing jobs which need to go through the whole table. You do that in batches, because 1. ORMs may store everything in memory and just kill your application 2. the transaction will become really long.

So, rule #3: when performing batch operations on tables with a lot of rows, do not use LIMIT. Use WHERE id > X and id < Y (or BETWEEN)

This does not guarantee that each batch will be the same size, but that doesn’t matter.

The rules above are “rule of thumbs”, rather than “must dos”, but it’s good to have them in mind. The bottom-line here is that you should analyze your performance, find specifically slow queries, then EXPLAIN them and see how they can be optimized and what would be the impact of the optimization.

P.S. Here’s a presentation on the low level details of MySQL engines regarding indexes.

Reference: Caveats With MySQL Pagination from our JCG partner Bozhidar Bozhanov at the Bozho’s tech blog blog.

Related Whitepaper:

Software Architecture

This guide will introduce you to the world of Software Architecture!

This 162 page guide will cover topics within the field of software architecture including: software architecture as a solution balancing the concerns of different stakeholders, quality assurance, methods to describe and evaluate architectures, the influence of architecture on reuse, and the life cycle of a system and its architecture. This guide concludes with a comparison between the professions of software architect and software engineer.

Get it Now!  

Leave a Reply

× 8 = sixteen

Java Code Geeks and all content copyright © 2010-2014, Exelixis Media Ltd | Terms of Use | Privacy Policy
All trademarks and registered trademarks appearing on Java Code Geeks are the property of their respective owners.
Java is a trademark or registered trademark of Oracle Corporation in the United States and other countries.
Java Code Geeks is not connected to Oracle Corporation and is not sponsored by Oracle Corporation.

Sign up for our Newsletter

20,709 insiders are already enjoying weekly updates and complimentary whitepapers! Join them now to gain exclusive access to the latest news in the Java world, as well as insights about Android, Scala, Groovy and other related technologies.

As an extra bonus, by joining you will get our brand new e-books, published by Java Code Geeks and their JCG partners for your reading pleasure! Enter your info and stay on top of things,

  • Fresh trends
  • Cases and examples
  • Research and insights
  • Two complimentary e-books