About Lukas Eder

Lukas is a Java and SQL enthusiast developer. He created the Data Geekery GmbH. He is the creator of jOOQ, a comprehensive SQL library for Java, and he is blogging mostly about these three topics: Java, SQL and jOOQ.

Please, Run That Calculation in Your RDBMS

There’s one thing that you can do terribly wrong when working with RDBMS. And that thing is not running your calculations in the database, when you should.

We’re not advocating to blindly move all business logic into the database, but when I see a Stack Overflow question like this, I feel the urge to gently remind you of the second item in our popular 10 Common Mistakes Java Developers Make When Writing SQL.
 
 
 
 

Gently reminding you of the second item. Some background on this illustration here (in German).

Gently reminding you of the second item.
Some background on this illustration here (in German)

The Stack Overflow question essentially boils down to this (liberally quoted):

From the following medium-sized table, I wish to count the number of documents with status 0 or 1 per application ID:

AppID | DocID | DocStatus 
------+-------+----------
1     | 100   | 0
1     | 101   | 1    
2     | 200   | 0    
2     | 300   | 1
...   | ...   | ...

Should I use Hibernate for that?

And the answer: NO! Don’t use Hibernate for that (unless you mean native querying). You should use SQL for that. Es-Queue-El! You have so many trivial options to make your SQL Server help you run this query in a fraction of the time it would take if you loaded all that data into Java memory before aggregating!

For instance (using SQL Server):

Using GROUP BY

This is the most trivial one, but it might not return result in exactly the way you wanted, i.e. different aggregation results are in different rows:

SELECT [AppID], [DocStatus], count(*)
FROM [MyTable]
GROUP BY [AppID], [DocStatus]

Example on SQLFiddle, returning something like

| APPID | DOCSTATUS | COLUMN_2 |
|-------|-----------|----------|
|     1 |         0 |        2 |
|     2 |         0 |        3 |
|     1 |         1 |        3 |
|     2 |         1 |        2 |

Using nested selects

This is probably the solution that this particular user was looking for. They probably want each aggregation in a separate column, and one very generic way to achieve this is by using nested selects. Note that this solution might prove to be a bit slow in some databases that have a hard time optimising these things

SELECT [AppID],
       (SELECT count(*) FROM [MyTable] [t2]
        WHERE [t1].[AppID] = [t2].[AppID]
        AND [DocStatus] = 0) [Status_0],
       (SELECT count(*) FROM [MyTable] [t2]
        WHERE [t1].[AppID] = [t2].[AppID]
        AND [DocStatus] = 1) [Status_1]
FROM [MyTable] [t1]
GROUP BY [AppID]

Example on SQLFiddle, returning something like

| APPID | STATUS_0 | STATUS_1 |
|-------|----------|----------|
|     1 |        2 |        3 |
|     2 |        3 |        2 |

Using SUM()

This solution is probably the optimal one. It is equivalent to the previous one with nested selects, although it only works for simple queries, whereas the nested selects version is more versatile.

SELECT [AppID],
       SUM(IIF([DocStatus] = 0, 1, 0)) [Status_0],
       SUM(IIF([DocStatus] = 1, 1, 0)) [Status_1]
FROM [MyTable] [t1]
GROUP BY [AppID]

Example on SQLFiddle, same result as before

Using PIVOT

This solution is for the SQL Aficionados among yourselves. It uses the T-SQL PIVOT clause!

SELECT [AppID], [0], [1]
FROM (
    SELECT [AppID], [DocStatus]
    FROM [MyTable]
) [t]
PIVOT (
    count([DocStatus]) 
    FOR [DocStatus] 
    IN ([0], [1])
) [pvt]

SQL aficionados use PIVOT

Example on SQLFiddle, same result as before

Conclusion

You may freely choose your weapon among the above suggestions, and I’m sure there are more alternatives. All of them will outperform any Java-based aggregation implementation by orders of magnitude, even for trivially small data sets for sure. We’ll say this time and again, and we’ll quote Gavin King time and again for the same thing:

Just because you’re using Hibernate, doesn’t mean you have to use it for everything. A point I’ve been making for about ten years now.

And in our words:

Use SQL whenever appropriate! And that is much more often than you might think!


 

Related Whitepaper:

Software Architecture

This guide will introduce you to the world of Software Architecture!

This 162 page guide will cover topics within the field of software architecture including: software architecture as a solution balancing the concerns of different stakeholders, quality assurance, methods to describe and evaluate architectures, the influence of architecture on reuse, and the life cycle of a system and its architecture. This guide concludes with a comparison between the professions of software architect and software engineer.

Get it Now!  

2 Responses to "Please, Run That Calculation in Your RDBMS"

  1. Iain Elder says:

    I agree completely with the setiment in this article: get the database engine to aggregate for you, because that’s what it’s good at.

    It would be great to see this article extednded with a discussion of execution plans, especially when comparing three different queries that produce the same result. Don’t just say “this solution might prove to be a bit slow” – prove it with a plan!

    SQL Server 2012 on sqlfiddle is down right now (“can’t connect to datasource [sqlfiddle_mssql2]“). Will try again later to reproduce your queries and plans.

    “SQL aficionados use PIVOT” – only as a last resort, in my case, and always with referring to the syntax manual! Sometimes it can be significantly faster than using the standard functions, but I find it hard to read and write.

    In most cases, I’d use your SUM method and replace the proprietary IIF with the standard CASE DocStatus WHEN 1 THEN 1 ELSE 0 END.

    • Lukas Eder says:

      It would be great to see this article extednded with a discussion of execution plans, especially when comparing three different queries that produce the same result.

      Yes, that’s a good idea! I’ll do this in a follow-up blog post. It’ll be good to compare SQL Server, Oracle and 1-2 others, maybe PostgreSQL and MySQL.

Leave a Reply


+ 6 = ten



Java Code Geeks and all content copyright © 2010-2014, Exelixis Media Ltd | Terms of Use | Privacy Policy
All trademarks and registered trademarks appearing on Java Code Geeks are the property of their respective owners.
Java is a trademark or registered trademark of Oracle Corporation in the United States and other countries.
Java Code Geeks is not connected to Oracle Corporation and is not sponsored by Oracle Corporation.

Sign up for our Newsletter

20,709 insiders are already enjoying weekly updates and complimentary whitepapers! Join them now to gain exclusive access to the latest news in the Java world, as well as insights about Android, Scala, Groovy and other related technologies.

As an extra bonus, by joining you will get our brand new e-books, published by Java Code Geeks and their JCG partners for your reading pleasure! Enter your info and stay on top of things,

  • Fresh trends
  • Cases and examples
  • Research and insights
  • Two complimentary e-books