About Lukas Eder

Lukas is a Java and SQL enthusiast developer. He created the Data Geekery GmbH. He is the creator of jOOQ, a comprehensive SQL library for Java, and he is blogging mostly about these three topics: Java, SQL and jOOQ.

Please, Run That Calculation in Your RDBMS

There’s one thing that you can do terribly wrong when working with RDBMS. And that thing is not running your calculations in the database, when you should.

We’re not advocating to blindly move all business logic into the database, but when I see a Stack Overflow question like this, I feel the urge to gently remind you of the second item in our popular 10 Common Mistakes Java Developers Make When Writing SQL.
 
 
 
 

Gently reminding you of the second item. Some background on this illustration here (in German).

Gently reminding you of the second item.
Some background on this illustration here (in German)

The Stack Overflow question essentially boils down to this (liberally quoted):

From the following medium-sized table, I wish to count the number of documents with status 0 or 1 per application ID:

AppID | DocID | DocStatus 
------+-------+----------
1     | 100   | 0
1     | 101   | 1    
2     | 200   | 0    
2     | 300   | 1
...   | ...   | ...

Should I use Hibernate for that?

And the answer: NO! Don’t use Hibernate for that (unless you mean native querying). You should use SQL for that. Es-Queue-El! You have so many trivial options to make your SQL Server help you run this query in a fraction of the time it would take if you loaded all that data into Java memory before aggregating!

For instance (using SQL Server):

Using GROUP BY

This is the most trivial one, but it might not return result in exactly the way you wanted, i.e. different aggregation results are in different rows:

SELECT [AppID], [DocStatus], count(*)
FROM [MyTable]
GROUP BY [AppID], [DocStatus]

Example on SQLFiddle, returning something like

| APPID | DOCSTATUS | COLUMN_2 |
|-------|-----------|----------|
|     1 |         0 |        2 |
|     2 |         0 |        3 |
|     1 |         1 |        3 |
|     2 |         1 |        2 |

Using nested selects

This is probably the solution that this particular user was looking for. They probably want each aggregation in a separate column, and one very generic way to achieve this is by using nested selects. Note that this solution might prove to be a bit slow in some databases that have a hard time optimising these things

SELECT [AppID],
       (SELECT count(*) FROM [MyTable] [t2]
        WHERE [t1].[AppID] = [t2].[AppID]
        AND [DocStatus] = 0) [Status_0],
       (SELECT count(*) FROM [MyTable] [t2]
        WHERE [t1].[AppID] = [t2].[AppID]
        AND [DocStatus] = 1) [Status_1]
FROM [MyTable] [t1]
GROUP BY [AppID]

Example on SQLFiddle, returning something like

| APPID | STATUS_0 | STATUS_1 |
|-------|----------|----------|
|     1 |        2 |        3 |
|     2 |        3 |        2 |

Using SUM()

This solution is probably the optimal one. It is equivalent to the previous one with nested selects, although it only works for simple queries, whereas the nested selects version is more versatile.

SELECT [AppID],
       SUM(IIF([DocStatus] = 0, 1, 0)) [Status_0],
       SUM(IIF([DocStatus] = 1, 1, 0)) [Status_1]
FROM [MyTable] [t1]
GROUP BY [AppID]

Example on SQLFiddle, same result as before

Using PIVOT

This solution is for the SQL Aficionados among yourselves. It uses the T-SQL PIVOT clause!

SELECT [AppID], [0], [1]
FROM (
    SELECT [AppID], [DocStatus]
    FROM [MyTable]
) [t]
PIVOT (
    count([DocStatus]) 
    FOR [DocStatus] 
    IN ([0], [1])
) [pvt]

SQL aficionados use PIVOT

Example on SQLFiddle, same result as before

Conclusion

You may freely choose your weapon among the above suggestions, and I’m sure there are more alternatives. All of them will outperform any Java-based aggregation implementation by orders of magnitude, even for trivially small data sets for sure. We’ll say this time and again, and we’ll quote Gavin King time and again for the same thing:

Just because you’re using Hibernate, doesn’t mean you have to use it for everything. A point I’ve been making for about ten years now.

And in our words:

Use SQL whenever appropriate! And that is much more often than you might think!


 

Do you want to know how to develop your skillset to become a Java Rockstar?

Subscribe to our newsletter to start Rocking right now!

To get you started we give you two of our best selling eBooks for FREE!

JPA Mini Book

Learn how to leverage the power of JPA in order to create robust and flexible Java applications. With this Mini Book, you will get introduced to JPA and smoothly transition to more advanced concepts.

JVM Troubleshooting Guide

The Java virtual machine is really the foundation of any Java EE platform. Learn how to master it with this advanced guide!

Given email address is already subscribed, thank you!
Oops. Something went wrong. Please try again later.
Please provide a valid email address.
Thank you, your sign-up request was successful! Please check your e-mail inbox.
Please complete the CAPTCHA.
Please fill in the required fields.

2 Responses to "Please, Run That Calculation in Your RDBMS"

  1. Iain Elder says:

    I agree completely with the setiment in this article: get the database engine to aggregate for you, because that’s what it’s good at.

    It would be great to see this article extednded with a discussion of execution plans, especially when comparing three different queries that produce the same result. Don’t just say “this solution might prove to be a bit slow” – prove it with a plan!

    SQL Server 2012 on sqlfiddle is down right now (“can’t connect to datasource [sqlfiddle_mssql2]“). Will try again later to reproduce your queries and plans.

    “SQL aficionados use PIVOT” – only as a last resort, in my case, and always with referring to the syntax manual! Sometimes it can be significantly faster than using the standard functions, but I find it hard to read and write.

    In most cases, I’d use your SUM method and replace the proprietary IIF with the standard CASE DocStatus WHEN 1 THEN 1 ELSE 0 END.

    • Lukas Eder says:

      It would be great to see this article extednded with a discussion of execution plans, especially when comparing three different queries that produce the same result.

      Yes, that’s a good idea! I’ll do this in a follow-up blog post. It’ll be good to compare SQL Server, Oracle and 1-2 others, maybe PostgreSQL and MySQL.

Leave a Reply


one + = 5



Java Code Geeks and all content copyright © 2010-2014, Exelixis Media Ltd | Terms of Use | Privacy Policy | Contact
All trademarks and registered trademarks appearing on Java Code Geeks are the property of their respective owners.
Java is a trademark or registered trademark of Oracle Corporation in the United States and other countries.
Java Code Geeks is not connected to Oracle Corporation and is not sponsored by Oracle Corporation.
Do you want to know how to develop your skillset and become a ...
Java Rockstar?

Subscribe to our newsletter to start Rocking right now!

To get you started we give you two of our best selling eBooks for FREE!

Get ready to Rock!
You can download the complementary eBooks using the links below:
Close