Software Development

Stop Claiming that you’re Using a Schemaless Database

One of MongoDB’s arguments when evangelising MongoDB is the fact that MongoDB is a “schemaless” database:

Why Schemaless?

MongoDB is a JSON-style data store. The documents stored in the database can have varying sets of fields, with different types for each field.

 
 
And that’s true. But it doesn’t mean that there is no schema. There are in fact various schemas:

  • The one in your head when you designed the data structures
  • The one that your database really implemented to store your data structures
  • The one you should have implemented to fulfill your requirements

Every time you realise that you made a mistake (see point three above), or when your requirements change, you will need to migrate your data. Let’s review again MongoDB’s point of view here:

With a schemaless database, 90% of the time adjustments to the database become transparent and automatic. For example, if we wish to add GPA to the student objects, we add the attribute, resave, and all is well — if we look up an existing student and reference GPA, we just get back null. Further, if we roll back our code, the new GPA fields in the existing objects are unlikely to cause problems if our code was well written.

Everything above is true as well.

“Schema-less” vs. “Schema-ful”

But let’s translate this to SQL (or use any other “schema-ful” database instead):

ALTER TABLE student ADD gpa VARCHAR(10);

And we’re done! Gee, we’ve added a column, and we’ve added it to ALL rows. It was transparent. It was automatic. We “just get back null” on existing students. And we can even “roll back our code”:

ALTER TABLE student DROP gpa;

Not only are the existing objects unlikely to cause problems, we have actually rolled back our code AND database.

Let’s summarise:

  • We can do exactly the same in “schema-less” databases as we can in “schema-ful” ones
  • We guarantee that a migration takes place (and it’s instant, too)
  • We guarantee data integrity when we roll back the change

What about more real-world DDL?

Of course, at the beginning of projects, when they still resemble the typical cat/dog/pet-shop, book/author/library sample application, we’ll just be adding columns. But what happens if we need to change the student-teacher 1:N relationship into a student-teacher M:N relationship? Suddenly, everything changes, and not only will the relational data model prove far superior to a hierarchical one that just yields tons of data duplication, it’ll also be moderately easy to migrate, and the outcome is guaranteed to be correct and tidy!

CREATE TABLE student_to_teacher 
AS
SELECT id AS student_id, teacher_id
FROM student;

ALTER TABLE student DROP teacher_id;

… and we’re done! (of course, we’d be adding constraints and indexes)

Think about the tedious task that you’ll have transforming your JSON to the new JSON. You don’t even have XSLT or XQuery for the task, only JavaScript!

Let’s face the truth

Schemalessness is about a misleading term as much as NoSQL is:

https://twitter.com/edd/status/400190499585544192

And again, MongoDB’s blog post is telling the truth (and an interesting one, too):

Generally, there is a direct analogy between this “schemaless” style and dynamically typed languages. Constructs such as those above are easy to represent in PHP, Python and Ruby. What we are trying to do here is make this mapping to the database natural.

When you say “schemaless”, you actually say “dynamically typed schema” – as opposed to statically typed schemas as they are available from SQL databases. JSON is still a completely schema free data structure standard, as opposed to XML which allows you to specify XSD if you need, or operate on document-oriented, “schema-less” (i.e. dynamically typed) schemas.

(And don’t say there’s json-schema. That’s a ridiculous attempt to mimick XSD)

This is important to understand! You always have a schema, even if you don’t statically type it. If you’re writing JavaScript, you still have types, which you have to be fully aware of in your mental model of the code. Except that there’s no compiler (or IDE) that can help you infer the types with 100% certainty.

An example:

https://twitter.com/lukaseder/status/522311918384857088

… and more:

So, there’s absolutely nothing that is really easier with “schemaless” databases than with “schemaful” ones. You just defer the inevitable work of sanitising your schema to some other later time, a time when you might care more than today, or a time when you’re lucky enough to have a new job and someone else does the work for you. You might have believed MongoDB, when they said that “objects are unlikely to cause problems”.

But let me tell you the ugly truth:

Anything that can possibly go wrong, does

Murphy

We wish you good luck with your dynamically typed languages and your dynamically typed database schemas – while we’ll stick with type safe SQL.

Lukas Eder

Lukas is a Java and SQL enthusiast developer. He created the Data Geekery GmbH. He is the creator of jOOQ, a comprehensive SQL library for Java, and he is blogging mostly about these three topics: Java, SQL and jOOQ.
Subscribe
Notify of
guest

This site uses Akismet to reduce spam. Learn how your comment data is processed.

2 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
andre parodi
andre parodi
9 years ago

Not sure what size db you have but alter table in mysql is far from instant and copies the whole table. Whilst I get the point you are trying to make try to be accurate.

I also enjoy the simplified no downtime deployment you get with mongo for example. Schema migrations in mysql greatly complicate continuous delivery and can make downtime difficult to avoid.

Tiv Robo
Tiv Robo
9 years ago

no one cares :)

Back to top button