About Lukas Eder

Lukas is a Java and SQL enthusiast developer. He created the Data Geekery GmbH. He is the creator of jOOQ, a comprehensive SQL library for Java, and he is blogging mostly about these three topics: Java, SQL and jOOQ.

Counting Distinct Records in SQL

The SQL language and its depths… Some of you readers might be aware of MySQL’s capability of counting distinct records through the COUNT() aggregate function. The MySQL documentation reads:

COUNT(DISTINCT expr,[expr...])

Returns a count of the number of rows with different non-NULL expr values.

In other words, you can count distinct first and last names very easily:

SELECT COUNT(DISTINCT FIRST_NAME, LAST_NAME)
FROM CUSTOMERS

That’s quite useful, but MySQL-specific (although HSQLDB also supports this particular MySQL syntax). Most other databases, however, do not offer such a feature, even if the SQL-99 standard has specified it long ago:

6.16  <set function specification>

<set function specification> ::=
    COUNT <left paren> <asterisk> <right paren>
  | <general set function>

<general set function> ::=
    <set function type>
        <left paren> [ <set quantifier> ] 
            <value expression> <right paren>

<set function type> ::=
    <computational operation>

<computational operation> ::=
    AVG | MAX | MIN | SUM
  | EVERY | ANY | SOME
  | COUNT

<set quantifier> ::= DISTINCT | ALL

The above was later on merged into 10.9 <aggregate function>, in SQL:2011, with lots of other aggregate function types and features. Now, let’s have a look at 6.23 <value expression>:

6.23 <value expression>

<value expression> ::=
<numeric value expression>
| <string value expression>
| <datetime value expression>
| <interval value expression>
| <boolean value expression>
| <user-defined type value expression>
| <row value expression>                <-- RVE!
| <reference value expression>
| <collection value expression>

Interestingly, you can put a row value expression in some of your aggregate functions. There are additional restrictions, as you cannot SUM() or AVG() records, for instance. But with COUNT() and COUNT(DISTINCT ...) this makes perfect sense. So, according to the SQL standard (and according to HSQLDB’s alternative, and PostgreSQL’s main implementation, which unfortunately isn’t really documented), the following would be the correct way to count distinct first and last names in SQL:

SELECT COUNT(DISTINCT (FIRST_NAME, LAST_NAME))
FROM CUSTOMERS

… which makes perfect sense. Watch out when reading this. This isn’t a DISTINCT() function! This is the DISTINCT keyword applied to a row value expression. See some previous blog posts about row value expressions here:

jOOQ standardises various SQL dialects and hides the above behind the DSL.countDistinct() method.

When you’re not using HSQLDB, MySQL, or PostgreSQL

In other databases, you might resort to some manual SQL transformation. Our example is easy to transform:

SELECT COUNT(*)
FROM (
  SELECT DISTINCT FIRST_NAME, LAST_NAME
  FROM CUSTOMERS
) t

But such query transformation can prove to be tricky, once you have GROUP BY clauses or other aggregate functions involved. Future versions of jOOQ might do that sort of transformation for you.
 

Reference: Counting Distinct Records in SQL from our JCG partner Lukas Eder at the JAVA, SQL, AND JOOQ blog.

Do you want to know how to develop your skillset to become a Java Rockstar?

Subscribe to our newsletter to start Rocking right now!

To get you started we give you two of our best selling eBooks for FREE!

JPA Mini Book

Learn how to leverage the power of JPA in order to create robust and flexible Java applications. With this Mini Book, you will get introduced to JPA and smoothly transition to more advanced concepts.

JVM Troubleshooting Guide

The Java virtual machine is really the foundation of any Java EE platform. Learn how to master it with this advanced guide!

Given email address is already subscribed, thank you!
Oops. Something went wrong. Please try again later.
Please provide a valid email address.
Thank you, your sign-up request was successful! Please check your e-mail inbox.
Please complete the CAPTCHA.
Please fill in the required fields.

Leave a Reply


three × 8 =



Java Code Geeks and all content copyright © 2010-2014, Exelixis Media Ltd | Terms of Use | Privacy Policy | Contact
All trademarks and registered trademarks appearing on Java Code Geeks are the property of their respective owners.
Java is a trademark or registered trademark of Oracle Corporation in the United States and other countries.
Java Code Geeks is not connected to Oracle Corporation and is not sponsored by Oracle Corporation.
Do you want to know how to develop your skillset and become a ...
Java Rockstar?

Subscribe to our newsletter to start Rocking right now!

To get you started we give you two of our best selling eBooks for FREE!

Get ready to Rock!
You can download the complementary eBooks using the links below:
Close