Writing Tests for Data Access Code – Don’t Forget the Database

Petri KainulainenAugust 7th, 2014Last Updated: August 6th, 2014

0 129 6 minutes read

When we write tests for our data access code, we must follow these three rules:

Our tests must use the real database schema.
Our tests must be deterministic.
Our tests must assert the right thing.

These rules are obvious.

That is why it is surprising that some developers break them (I have broken them too in the past).

This blog post describes why these rules are important and helps us to follow them.

Rule 1: We Must Use the Real Database Schema

The second part of this series taught us that we should configure our integration tests by using the same configuration which is used by our application. It also taught us that is ok to break this rule if we have a good reason for doing it.

Let’s investigate one quite common situation where our integration tests use a different configuration than our application.

We can create our database by following this approach:

We create the database of our application by using Liquibase. We use its Spring integration to make the required changes to the database when the application was started.
We let Hibernate to create the database used in our integration tests.

I have done this too, and this felt like a perfect solution because

I could enjoy the benefits of a versioned database.
Writing integration tests felt like a walk in a park because I could trust that Hibernate creates a working database for my integration tests.

However, after I started writing blog this tutorial (Writing Tests for Data Access Code), I realized that this approach has (at least) three problems:

If the database is created by Hibernate, we cannot test that our migration scripts create a working database.
The database created by Hibernate isn’t necessarily equal to the database created by our migration scripts. For example, if the database has tables which aren’t described as entities, Hibernate doesn’t (naturally) create these tables.
If we want to run performance tests in the integration test suite, we have configure the required indexes by using the @Index annotation. If we don’t do this, Hibernate doesn’t create these indexes. This means that we cannot trust the results of our performance tests.

Should we care about these problems?

Definitely.

We must remember that every test specific change creates a difference between our test configuration and production configuration. If this difference is too big, our tests are worthless.

If we don’t run our integration tests against the same database schema that is used when the application is deployed to development / testing / production environment, we face the following problems:

We cannot necessarily write integration tests for certain features because our database is missing the required tables, triggers, constraints, or indexes. This means that we must test these features manually before the application the deployed to the production environment. This is a waste of time.
The feedback loop is a lot longer than it could be because we notice some problems (such as problems caused by faulty migration scripts) after the application is deployed to the target environment.
If we notice a problem when an application is deployed to a production environment, the shit hits the fan and we are covered with it. I don’t like to be covered with poop. Do you?

If we want to avoid these problems and maximize the benefits of our data access tests, our integration tests must use the same database schema that is used when our application is deployed to the production environment.

Rule 2: Our Tests Must be Deterministic

Martin Fowler specifies non-deterministic test as follows:

A test is non-deterministic when it passes sometimes and fails sometimes, without any noticeable change in the code, tests, or environment. Such tests fail, then you re-run them and they pass. Test failures for such tests are seemingly random.

He also explains why non-deterministic tests are a problem:

The trouble with non-deterministic tests is that when they go red, you have no idea whether its due to a bug, or just part of the non-deterministic behavior. Usually with these tests a non-deterministic failure is relatively common, so you end up shrugging your shoulders when these tests go red. Once you start ignoring a regression test failure, then that test is useless and you might as well throw it away.

It should be clear to us that non-deterministic tests are harmful, and we should we should avoid them at all costs.

So, what is the most common cause of non-deterministic data access tests?

My experience has taught me that most common reason behind non-deterministic data access tests is the failure to initialize the database into a known state before each test case is run.

This is sad because this is a really easy problem to solve. In fact, we can solve it by using one of these options:

We can add information to the database by using the other methods of the tested repository.
We can write a library that initializes our database before each test is run.
We can use existing libraries such as DbUnit and NoSQLUnit.

However, we must be careful because only of these options makes sense.

The first option is the worst way to solve this problem. It clutters our test methods with unnecessary initialization code and makes them very fragile. For example, if we break the method which is used to save information to our database, every test which uses it will fail.

The second option is a bit better. However, why would we want to create a new library when we could use an existing library that is proven to work?

We should not reinvent the wheel. We should solve this problem by using the easiest and the best way. We must use an existing library.

Rule 3: We Must Assert the Right Thing

When we write tests for our data access code, we might have to write tests that

read information from the database.
write information to the information to the database.

What kind of assertions do we have to write?

First, if the write tests that read information from the database, we have to follow these rules:

If we are using a framework or a library (e.g. Spring Data) that maps the information found from the database to objects, it makes no sense to assert that every property value of the returned object is correct. In this situation we should ensure that value of the property, which identifies the returned object, is correct. The reason for this that we should only use frameworks or libraries which we trust. If we trust that our data access framework or library does its job, it makes no sense assert everything.
If we have implemented a repository that maps the information found from the database to objects, we should ensure that the every property value of the returned object is correct. If we don’t do this, we cannot be sure that our repository works correctly.

Second, if we write tests which writes information to the database, we should not add any assertions to our test method.

We must use a tool like DbUnit or NoSQLUnit to ensure that the correct information is stored to the database. This approach has two benefits:

We can write our assertions on the right level. In other words, we can be verify that the information is really saved to the used database.
We can avoid cluttering our test methods with code that finds the saved information from the used database and verifies that the correct information is found.

But what if we want to ensure that the method that saves information to the database returns the correct information?

Well, if we have implemented this method ourself, we have to write two tests for this method:

We must ensure that the correct information is stored to the database.
We must verify that the method returns the correct information.

On the other hand, if this method is provided to us by a framework or library, we should not write any tests for it.

We must remember that our goal is not to write assertions that ensure the used data access framework or library is working correctly.

Our goal is to write assertions which ensure that our code is working correctly.

Summary

This blog post has taught us four things:

If we want to maximize the benefits of our data access tests, our integration tests must use the same database schema that is used when our application is deployed to the production environment.
Getting rid of non-deterministic tests is easy. All we have to do is to initialize our database into a known state before each test case is run by using a library such as DbUnit or NoSQLUnit.
If we need to verify that the correct information is saved to the used database, we must use a library such as DbUnit or NoSQLUnit.
If we want to verify that the correct information is returned from the used database, we must write assertions that ensure that our code works.

Reference:

Writing Tests for Data Access Code – Don’t Forget the Database from our JCG partner Petri Kainulainen at the Petri Kainulainen blog.