About Lukas Eder

Lukas is a Java and SQL enthusiast developer. He created the Data Geekery GmbH. He is the creator of jOOQ, a comprehensive SQL library for Java, and he is blogging mostly about these three topics: Java, SQL and jOOQ.

GROUP BY ROLLUP / CUBE

Every now and then, you come across a requirement that will bring you to your SQL limits. Many of us probably give up early and calculate stuff in Java / [or your language]. Instead, it might’ve been so easy and fast to do with SQL. If you’re working with an advanced database, such as DB2, Oracle, SQL Server, Sybase SQL Anywhere, (and MySQL in this case, which supports the WITH ROLLUP clause), you can take advantage of the ROLLUP / CUBE / GROUPING SETS grouping functions.

Lets have a look at my fictional salary progression compared to that of a fictional friend, who has chosen a different career path (observe the salary boost in 2011):

select 'Lukas'      as employee, 

       'SoftSkills' as company, 

    80000        as salary, 

    2007         as year 

from dual

union all select 'Lukas', 'SoftSkills', 80000,  2008 from dual

union all select 'Lukas', 'SmartSoft',  90000,  2009 from dual

union all select 'Lukas', 'SmartSoft',  95000,  2010 from dual

union all select 'Lukas', 'jOOQ',       200000, 2011 from dual

union all select 'Lukas', 'jOOQ',       250000, 2012 from dual

union all select 'Tom',   'SoftSkills', 89000,  2007 from dual

union all select 'Tom',   'SoftSkills', 90000,  2008 from dual

union all select 'Tom',   'SoftSkills', 91000,  2009 from dual

union all select 'Tom',   'SmartSoft',  92000,  2010 from dual

union all select 'Tom',   'SmartSoft',  93000,  2011 from dual

union all select 'Tom',   'SmartSoft',  94000,  2012 from dual

Now we’re used to gathering statistics using simple grouping and simple aggregate functions. For instance, let’s calculate how much Lukas and Tom earned on average over the past few years:

with data as ([above select])

select employee, avg(salary)

from data

group by employee

This will show that Lukas has earned more:

+--------+-----------+
|EMPLOYEE|AVG(SALARY)|
+--------+-----------+
|Lukas   |     132500|
|Tom     |      91500|
+--------+-----------+

So it’s probably interesting to find out what they have earned on average in which company:

with data as (...)

select company, employee, avg(salary)

from data

group by company, employee

order by company, employee

And immediately, it becomes clear where the big bucks are and that Tom has made a bad decision ;-)

+----------+--------+-----------+
|COMPANY   |EMPLOYEE|AVG(SALARY)|
+----------+--------+-----------+
|jOOQ      |Lukas   |     225000|
|SmartSoft |Lukas   |      92500|
|SmartSoft |Tom     |      93000|
|SoftSkills|Lukas   |      80000|
|SoftSkills|Tom     |      90000|
+----------+--------+-----------+

ROLLUP

By adding grouping fields, we “lose” some aggregation information. In the above examples, the overall average salary per employee is no longer available directly from the result. That’s obvious, considering the grouping algorithm. But in nice-looking reports, we often want to display those grouping headers as well. This is where ROLLUP, CUBE (and GROUPING SETS) come into play. Consider the following query:

with data as (...)

select company, employee, avg(salary)

from data

group by rollup(company), employee

The above rollup function will now add additional rows to the grouping result set, holding useful aggregated values. In this case, when we “roll up the salaries of the company”, we will get the average of the remaining grouping fields, i.e. the average per employee:

+----------+--------+-----------+
|COMPANY   |EMPLOYEE|AVG(SALARY)|
+----------+--------+-----------+
|SmartSoft |Tom     |      93000|
|SoftSkills|Tom     |      90000|
|{null}    |Tom     |      91500|
|jOOQ      |Lukas   |     225000|
|SmartSoft |Lukas   |      92500|
|SoftSkills|Lukas   |      80000|
|{null}    |Lukas   |     132500|
+----------+--------+-----------+

Note how these rows hold the same information as the ones from the first query, where we were only grouping by employee… This becomes even more interesting, when we put more grouping fields into the rollup function:

with data as (...)

select company, employee, avg(salary)

from data

group by rollup(employee, company)

As you can see, the order of grouping fields is important in the rollup function. The result from this query now also adds the overall average salary paid to all employees in all companies

+----------+--------+-----------+
|COMPANY   |EMPLOYEE|AVG(SALARY)|
+----------+--------+-----------+
|SmartSoft |Tom     |      93000|
|SoftSkills|Tom     |      90000|
|{null}    |Tom     |      91500|
|jOOQ      |Lukas   |     225000|
|SmartSoft |Lukas   |      92500|
|SoftSkills|Lukas   |      80000|
|{null}    |Lukas   |     132500|
|{null}    |{null}  |     112000|
+----------+--------+-----------+

In order to identify the totals rows for reporting, you can use the GROUPING() function in DB2, Oracle, SQL Server and Sybase SQL Anywhere. In Oracle and SQL Server, there’s the even more useful GROUPING_ID() function:

with data as (...)

select grouping_id(employee, company) id, company, employee, avg(salary)

from data

group by rollup(employee, company)

It documents on what “grouping level” of the rollup function the current row was produced:

+----+----------+--------+-----------+
|  ID|COMPANY   |EMPLOYEE|AVG(SALARY)|
+----+----------+--------+-----------+
|   0|SmartSoft |Tom     |      93000|
|   0|SoftSkills|Tom     |      90000|
|   1|{null}    |Tom     |      91500|
|   0|jOOQ      |Lukas   |     225000|
|   0|SmartSoft |Lukas   |      92500|
|   0|SoftSkills|Lukas   |      80000|
|   1|{null}    |Lukas   |     132500|
|   3|{null}    |{null}  |     112000|
+----+----------+--------+-----------+

CUBE

The cube function works similar, except that the order of cube grouping fields becomes irrelevant, as all combinations of grouping are combined. This is a bit tricky to put in words, so lets put it in action:

with data as (...)

select grouping_id(employee, company) id, company, employee, avg(salary)

from data

group by cube(employee, company)

In the following result, you will get:

  • GROUPING_ID() = 0: Average per company and employee. This is the normal grouping result
  • GROUPING_ID() = 1: Average per employee
  • GROUPING_ID() = 2: Average per company
  • GROUPING_ID() = 3: Overall average
+----+----------+--------+-----------+
|  ID|COMPANY   |EMPLOYEE|AVG(SALARY)|
+----+----------+--------+-----------+
|   3|{null}    |{null}  |     112000|
|   2|jOOQ      |{null}  |     225000|
|   2|SmartSoft |{null}  |      92800|
|   2|SoftSkills|{null}  |      86000|
|   1|{null}    |Tom     |      91500|
|   0|SmartSoft |Tom     |      93000|
|   0|SoftSkills|Tom     |      90000|
|   1|{null}    |Lukas   |     132500|
|   0|jOOQ      |Lukas   |     225000|
|   0|SmartSoft |Lukas   |      92500|
|   0|SoftSkills|Lukas   |      80000|
+----+----------+--------+-----------+

In other words, using the CUBE() function, you will get grouping results for every possible combination of the grouping fields supplied to the CUBE() function, which results in 2^n GROUPING_ID()’s for n “cubed” grouping fields

Support in jOOQ

jOOQ 2.0 introduces support for these functions. If you want to translate the last select into jOOQ, you’d roughly get this Java code:

// assuming that DATA is an actual table...

create.select(

         groupingId(DATA.EMPLOYEE, DATA.COMPANY).as("id"),

         DATA.COMPANY, DATA.EMPLOYEE, avg(SALARY))

      .from(DATA)

      .groupBy(cube(DATA.EMPLOYEE, DATA.COMPANY));

With this powerful tool, you’re ready for all of those fancy reports and data overviews. For more details, read on about ROLLUP(), CUBE(), and GROUPING SETS() functions on the SQL Server documentation page, which explains it quite nicely:
http://msdn.microsoft.com/en-us/library/bb522495.aspx

Reference: GROUP BY ROLLUP / CUBE from our JCG partner Lukas Eder at the JAVA, SQL, AND JOOQ blog.

Related Articles :

Do you want to know how to develop your skillset to become a Java Rockstar?

Subscribe to our newsletter to start Rocking right now!

To get you started we give you two of our best selling eBooks for FREE!

JPA Mini Book

Learn how to leverage the power of JPA in order to create robust and flexible Java applications. With this Mini Book, you will get introduced to JPA and smoothly transition to more advanced concepts.

JVM Troubleshooting Guide

The Java virtual machine is really the foundation of any Java EE platform. Learn how to master it with this advanced guide!

Given email address is already subscribed, thank you!
Oops. Something went wrong. Please try again later.
Please provide a valid email address.
Thank you, your sign-up request was successful! Please check your e-mail inbox.
Please complete the CAPTCHA.
Please fill in the required fields.

Leave a Reply


one + 8 =



Java Code Geeks and all content copyright © 2010-2014, Exelixis Media Ltd | Terms of Use | Privacy Policy
All trademarks and registered trademarks appearing on Java Code Geeks are the property of their respective owners.
Java is a trademark or registered trademark of Oracle Corporation in the United States and other countries.
Java Code Geeks is not connected to Oracle Corporation and is not sponsored by Oracle Corporation.
Do you want to know how to develop your skillset and become a ...
Java Rockstar?

Subscribe to our newsletter to start Rocking right now!

To get you started we give you two of our best selling eBooks for FREE!

Get ready to Rock!
You can download the complementary eBooks using the links below:
Close