Using JSON Schema Validation to Map Sparse JSON

Ashley FriezeFebruary 21st, 2020Last Updated: February 19th, 2020

0 516 2 minutes read

In this post, we’ll look at a problem that comes up when you create APIs and share them. In particular, there’s a need to:

Express the structure of data
Provide for validation of that data
Allow for future changes of mind
Communicate with clients over a subset of the data you have
Fill in the blanks when data is missing

People generally solve this with version numbered APIs. Each version of the API is bound to a schema. The schema is often expressed in JSON Schema format.

Evolving between multiple versions of the same schema is not supported by the above supporting libraries.

Schema evolution can be supported quite well by Apache Avro. Avro can have its schemas defined using something akin to JSON Schema, and is able to read and write JSON, though you need JSON2AvroConverter to read normal-looking JSON if you’re using nullable fields via the union type in Avro.

However, Avro isn’t great at reading JSON with missing fields.

Draft 7 of the JSON Schema supports defaults and the everit json-schema library can substitute defaults into objects while validating.

Putting this together

Let’s say:

I have a schema which supplies defaults for anything that is not mandatory
I have a rule that no future version of a schema can add mandatory things
I have data which has whichever fields it has been given, regardless of whether they’re needed by a specific version of the schema
I wish to return data valid against a particular schema version

I will need to:

Filter out fields that are in the source, but not in the schema
Add defaults when a field is in the schema but not in the data

All of the above is explored in this POC on GitHub.

The POC

The POC uses the everit library to populate defaults, and demonstrates how to express defaults in schemas.

It’s a bit annoying to express JSON inside Java code, so the best thing to do is extract the test of the schema and explore it in a tool like JSONEditorOnline. (Hopefully your IDE will unescape the " characters when you copy and paste – IntelliJ does.)

The POC has a basic implementation for iterating over both the schema and the input JSON, removing fields in the JSON that are not known to the schema. The reason this is basic is that it does not cope with edge cases possible in JSON Schema, and it does not tolerate data being of a different type in the input JSON than described by the schema.

That said, it passes some useful tests, so it’s definitely a starting point for future investigation.

Conclusion

Schema evolution can be done very precisely. Apache Avro allows for modeling of multiple versions of the same schema, loading in one and transforming to another, but it’s not a great friend of JSON and requires the source JSON to be in the right format for one of the schemas.

JSON Schema is a fundamental building block of REST API definition. It can be coerced into operating as a filter on top of the ability to use it to supply defaults during validation if used with the right libraries.

I hope the POC code here proves useful to someone.

Published on Java Code Geeks with permission by Ashley Frieze, partner at our JCG program. See the original article here: Using JSON Schema Validation to Map Sparse JSON

Opinions expressed by Java Code Geeks contributors are their own.