About Lukas Eder

Lukas is a Java and SQL enthusiast developer. He created the Data Geekery GmbH. He is the creator of jOOQ, a comprehensive SQL library for Java, and he is blogging mostly about these three topics: Java, SQL and jOOQ.

SQL Trick: row_number() is to SELECT what dense_rank() is to SELECT DISTINCT

The SQL:2003 standard ranking functions are awesome companions and useful tools every now and then. The ones that are supported in almost all databases are:

  • ROW_NUMBER(): This one generates a new row number for every row, regardless of duplicates within a partition.
  • RANK(): This one generates a new row number for every distinct row, leaving gaps between groups of duplicates within a partition.
  • DENSE_RANK(): This one generates a new row number for every distinct row, leaving no gaps between groups of duplicates within a partition.

As always, the above is much easier to understand by example. Let’s assume the following PostgreSQL schema containing a table with 8 records, some of which are duplicates:

CREATE TABLE t AS
SELECT 'a' v UNION ALL
SELECT 'a'   UNION ALL
SELECT 'a'   UNION ALL
SELECT 'b'   UNION ALL
SELECT 'c'   UNION ALL
SELECT 'c'   UNION ALL
SELECT 'd'   UNION ALL
SELECT 'e'

Now, let’s select each value, along with the three aforementioned ranking functions. And for kicks and giggles, we’ll use the SQL standard WINDOW clause! Yay, it saved us 15 characters of repetitive SQL code. Note that the WINDOW clause is hardly implemented apart by PostgreSQL and Sybase SQL Anywhere…

SELECT
  v,
  ROW_NUMBER() OVER (w) row_number,
  RANK()       OVER (w) rank,
  DENSE_RANK() OVER (w) dense_rank
FROM t
WINDOW w AS (ORDER BY v)
ORDER BY v

And the above results in:

+---+------------+------+------------+
| V | ROW_NUMBER | RANK | DENSE_RANK |
+---+------------+------+------------+
| a |          1 |    1 |          1 |
| a |          2 |    1 |          1 |
| a |          3 |    1 |          1 |
| b |          4 |    4 |          2 |
| c |          5 |    5 |          3 |
| c |          6 |    5 |          3 |
| d |          7 |    7 |          4 |
| e |          8 |    8 |          5 |
+---+------------+------+------------+

(See also this SQLFiddle)

How DENSE_RANK() can help when writing SELECT DISTINCT

No doubt, ROW_NUMBER() is the most useful ranking function among the above, specifically when you need to emulate LIMIT .. OFFSET clauses as in DB2, Oracle (11g or less), Sybase SQL Anywhere (prior to version 12), SQL Server (2008 and less). Read on here about how jOOQ emulates this SQL clause in various SQL dialects.

But using ROW_NUMBER() has a subtle problem when used along with DISTINCT or UNION. It prevents the database from being able to remove duplicates, because ROW_NUMBER will always produce distinct values within a partition. In the above example, duplicate values for T.V were added on purpose. How can we first remove the duplicates and only then enumerate row numbers? Clearly, we can no longer use ROW_NUMBER(). The following query:

SELECT DISTINCT
  v, 
  ROW_NUMBER() OVER (w) row_number
FROM t
WINDOW w AS (ORDER BY v)
ORDER BY v, row_number

… yields

+---+------------+
| V | ROW_NUMBER |
+---+------------+
| a |          1 |
| a |          2 |
| a |          3 |
| b |          4 |
| c |          5 |
| c |          6 |
| d |          7 |
| e |          8 |
+---+------------+

(See also this SQLFiddle)

But we can use DENSE_RANK(), instead! With DENSE_RANK() the ranking is applied in a way that duplicate records will receive the same rank. And there are no gaps between ranks. Hence:

SELECT DISTINCT
  v, 
  DENSE_RANK() OVER (w) row_number
FROM t
WINDOW w AS (ORDER BY v)
ORDER BY v, row_number

… which yields:

+---+------------+
| V | ROW_NUMBER |
+---+------------+
| a |          1 |
| b |          2 |
| c |          3 |
| d |          4 |
| e |          5 |
+---+------------+

(See also this SQLFiddle)

Thus, remember…

Thus, remember: ROW_NUMBER() is to SELECT what DENSE_RANK() is to SELECT DISTINCT

Caveats

In order for the above to be true, however, you must ensure that all expressions from the SELECT DISTINCT clause are used in the DENSE_RANK()‘s OVER(ORDER BY ...) clause. For example:

SELECT DISTINCT
  v1,
  v2,
  v3,
  DENSE_RANK() OVER (w) row_number
FROM t
WINDOW w AS (ORDER BY v1, v2, v3)

If any of v1, v2, v3 are other ranking functions or aggregate functions, or non-deterministic expressions, etc., the above trick won’t work. But it’s still a nice trick to keep up one’s sleeves for the odd corner-case query, where distinct rows need row numbers
 

Do you want to know how to develop your skillset to become a Java Rockstar?

Subscribe to our newsletter to start Rocking right now!

To get you started we give you two of our best selling eBooks for FREE!

JPA Mini Book

Learn how to leverage the power of JPA in order to create robust and flexible Java applications. With this Mini Book, you will get introduced to JPA and smoothly transition to more advanced concepts.

JVM Troubleshooting Guide

The Java virtual machine is really the foundation of any Java EE platform. Learn how to master it with this advanced guide!

Given email address is already subscribed, thank you!
Oops. Something went wrong. Please try again later.
Please provide a valid email address.
Thank you, your sign-up request was successful! Please check your e-mail inbox.
Please complete the CAPTCHA.
Please fill in the required fields.

Leave a Reply


three − 1 =



Java Code Geeks and all content copyright © 2010-2014, Exelixis Media Ltd | Terms of Use | Privacy Policy | Contact
All trademarks and registered trademarks appearing on Java Code Geeks are the property of their respective owners.
Java is a trademark or registered trademark of Oracle Corporation in the United States and other countries.
Java Code Geeks is not connected to Oracle Corporation and is not sponsored by Oracle Corporation.
Do you want to know how to develop your skillset and become a ...
Java Rockstar?

Subscribe to our newsletter to start Rocking right now!

To get you started we give you two of our best selling eBooks for FREE!

Get ready to Rock!
You can download the complementary eBooks using the links below:
Close