Enterprise Java

Java 8 Friday: Java 8 Will Revolutionize Database Access

At Data Geekery, we love Java. And as we’re really into jOOQ’s fluent API and query DSL, we’re absolutely thrilled about what Java 8 will bring to our ecosystem. For our Java 8 series, we’re honoured to host a very relevant guest post by Dr. Ming-Yee Iu.

Dr. Ming-Yee Iu completed a PhD on Database Queries in Java at EPFL. He has created the open source project Jinq to demonstrate some new techniques for supporting database queries in Java.

Our editorial note:
 
 
Ever since Erik Meijer has introduced LINQ to the .NET ecosystem, us Java folks have been wondering whether we could have the same. We’ve blogged about this topic before, a couple of times:

While most LINQesque APIs in the Java ecosystem operate as internal domain-specific languages like jOOQ, some try to tackle the integration on a bytecode level, like JaQu.

JINQ formalises runtime bytecode transformations through what Dr. Ming-Yee Iu calls symbolic execution. We find this very interesting to a point that we wonder if we should start building a JINQ-to-jOOQ JINQ provider, where the expressive power of the Java 8 Streams API could be combined with our great SQL standardisation and transformation features…?

Convince yourselves:

Java 8 Goodie: Java 8 Will Revolutionize Database Access

Java 8 is finally here! After years of waiting, Java programmers will finally get support for functional programming in Java. Functional programming support helps streamline existing code while providing powerful new capabilities to the Java language. One area that will be disrupted by these new features is how programmers work with databases in Java. Functional programming support opens up exciting new possibilities for simpler yet more powerful database APIs. Java 8 will enable new ways to access databases that are competitive with those of other programming languages such as C#’s LINQ.

The Functional Way of Working With Data

Java 8 not only adds functional-support to the Java language, but it extends the Java collection classes with new functional ways of working with data. Traditionally, working with large amounts of data in Java requires a lot of loops and iterators.

For example, suppose you have a collection of Customer objects:

Collection<Customer> customers;

If you were only interested in the customers from Belgium, you would have to iterate over all the customers and save the ones you wanted.

Collection<Customer> belgians = new ArrayList<>();
for (Customer c : customers) {
    if (c.getCountry().equals("Belgium"))
        belgians.add(c);
}

This takes five lines of code. It is also poorly abstracted. What happens if you have 10 million customers, and you want to speed up the code by filtering it in parallel using two threads? You would have to rewrite everything to use futures and a lot of hairy multi-threaded code.

With Java 8, you can write the same code in one line. With its support for functional programming, Java 8 lets you write a function saying which customers you are interested in (those from Belgium) and then to filter collections using that function. Java 8 has a new Streams API that lets you do this.

customers.stream().filter(
    c -> c.getCountry().equals("Belgium")
);

Not only is the Java 8 version of the code shorter, but the code is easier to understand as well. There is almost no boilerplate. The code calls the method filter(), so it’s clear that this code is used for filtering customers. You don’t have to spend your time trying to decipher the code in a loop to understand what it is doing with its data.

And what happens if you want to run the code in parallel? You just have to use a different type of stream.

customers.parallelStream().filter(
    c -> c.getCountry().equals("Belgium")
);

What’s even more exciting is that this functional-style of code works with databases as well!

The Functional Way of Working with Databases

Traditionally, programmers have needed to use special database query languages to access the data in databases. For example, below is some JDBC code for finding all the customers from Belgium:

PreparedStatement s = con.prepareStatement(
      "SELECT * "
    + "FROM Customer C "
    + "WHERE C.Country = ? ");
s.setString(1, "Belgium");
ResultSet rs = s.executeQuery();

Much of the code is in the form a string, which the compiler can’t check for errors and which can lead to security problems due to sloppy coding. There is also a lot of boilerplate code that makes writing database access code quite tedious. Tools such as jOOQ solve the problem of error-checking and security by providing a database query language that can be written using special Java libraries. Or you can use tools such as object-relational mappers to hide a lot of boring database code for common access patterns, but if you need to write non-trivial database queries, you will still need to use a special database query language again.

With Java 8, it’s possible to write database queries using the same functional-style used when working with the Streams API. For example, Jinq is an open source project that explores how future database APIs can make use of functional programming. Here is a database query written using Jinq:

customers.where(
    c -> c.getCountry().equals("Belgium")
);

This code is almost identical to the code using the Streams API. In fact, future versions of Jinq will let you write queries directly using the Streams API. When the code is run, Jinq will automatically translate the code into a database query like the JDBC query shown before.

So without having to learn a new database query language, you can write efficient database queries. You can use the same style of code you would use for Java collections. You also don’t need a special Java compiler or virtual machine. All of this code compiles and runs using the normal Java 8 JDK. If there are errors in your code, the compiler will find them and report them to you, just like normal Java code.

Jinq supports queries that can be as complicated as SQL92. Selection, projection, joins, and subqueries are all supported. The algorithm for translating Java code into database queries is also very flexible in what code it will accept and translate. For example, Jinq has no problem translating the code below into a database query, despite its complexity.

customers
    .where( c -> c.getCountry().equals("Belgium") )
    .where( c -> {
        if (c.getSalary() < 100000)
            return c.getSalary() < c.getDebt();
        else
            return c.getSalary() < 2 * c.getDebt();
        } );

As you can see, the functional programming support in Java 8 is well-suited for writing database queries. The queries are compact, and complex queries are supported.

Inner Workings

But how does this all work? How can a normal Java compiler translate Java code into database queries? Is there something special about Java 8 that makes this possible?

The key to supporting these new functional-style database APIs is a type of bytecode analysis called symbolic execution. Although your code is compiled by a normal Java compiler and run in a normal Java virtual machine, Jinq is able to analyze your compiled Java code when it is run and construct database queries from them. Symbolic execution works best when analyzing small functions, which are common when using the Java 8 Streams API.

The easiest way to understand how this symbolic execution works is with an example. Let’s examine how the following query is converted by Jinq into the SQL query language:

customers
    .where( c -> c.getCountry().equals("Belgium") )

Initially, the customers variable is a collection that represents this database query

SELECT *
  FROM Customers C

Then, the where() method is called, and a function is passed to it. In this where() method, Jinq opens the .class file of the function and gets the compiled bytecode for the function to analyze. In this example, instead of using real bytecode, let’s just use some simple instructions to represent the bytecode of the function:

  1. d = c.getCountry()
  2. e = “Belgium”
  3. e = d.equals(e)
  4. return e

Here, we pretend that the function has been compiled by the Java compiler into four instructions. This is what Jinq sees when the where() method is called. How can Jinq make sense of this code?

Jinq analyzes the code by executing it. Jinq doesn’t run the code directly though. It runs the code ‘abstractly’. Instead of using real variables and real values, Jinq uses symbols to represent all values when executing the code. This is why the analysis is called symbolic execution.

Jinq executes each instruction and keeps track of all the side-effects or all the things that the code changes in the state of the program. Below is a diagram showing all the side-effects that Jinq finds when it executes the four lines of code using symbolic execution.

Symbolic execution example
Symbolic execution example

In the diagram, you can see how after the first instruction runs, Jinq finds two side-effects: the variable d has changed and the method Customer.getCountry() has been called. With symbolic execution, the variable d is not given a real value like “USA” or “Denmark”. It is assigned the symbolic value of c.getCountry().

After all the instructions have been executed symbolically, Jinq prunes the side-effects. Since the variables d and e are local variables, any changes to them are discarded after the function exits, so those side-effects can be ignored. Jinq also knows that the methods Customer.getCountry() and String.equals() do not modify any variables or show any output, so those method calls can also be ignored. From this, Jinq can conclude that executing the function produces only one effect: it returns c.getCountry().equals("Belgium").

Once Jinq has understood what the function passed to it in the where() method does, it can then merge this knowledge with the database query underlying the customers collection to create a new database query.

Generating a database query
Generating a database query

And that’s how Jinq generates database queries from your code. The use of symbolic execution means that this approach is quite robust to the different code patterns outputted by different Java compilers. If Jinq ever encounters code with side-effects that can’t be emulated using a database query, Jinq will leave your code untouched. Since everything is written using normal Java code, Jinq can just run that code directly instead, and your code will produce the expected results.

This simple translation example should have given you an idea of how the query translation works. You should feel confident that these algorithms can correctly generate database queries from your code.

An Exciting Future

I hope I have given you a taste for how Java 8 enables new ways of working with databases in Java. The functional programming support in Java 8 allows you write database code in a similar way to writing code for working with Java collections. Hopefully, existing database APIs will soon be extended to support these styles of queries.

Lukas Eder

Lukas is a Java and SQL enthusiast developer. He created the Data Geekery GmbH. He is the creator of jOOQ, a comprehensive SQL library for Java, and he is blogging mostly about these three topics: Java, SQL and jOOQ.
Subscribe
Notify of
guest

This site uses Akismet to reduce spam. Learn how your comment data is processed.

0 Comments
Inline Feedbacks
View all comments
Back to top button