Jackson Tips: sorting JSON using JsonNode

(aka “how to get almost canonical output” with custom JsonNodeFactory)

TL;DNR: quick way to get it “all sorted out”

If you just want to see the magic, here’s how you can make JsonNode automatically sort properties alphabetically by name (by default properties are stored in insertion order — and for reads it means order in which they are in the input document):

What happens here is that ObjectNode (JsonNode implementation for JSON Objects) is configured to use sorting TreeMap for storing properties instead of default LinkedHashMap. And since serializer simply iterates properties in order stored, output will be sorted. Note that this happen recursively and not just for the main level properties.

And here’s some background on where and why this might be useful.

On JSON canonicalization: tale of 2 specifications

Today I got into a discussion about how to canonicalize JSON for use cases like comparing logical equality of two JSON documents or calculating checksum/hash code over JSON content (f.ex to check against tampering, other security-related handling).
Since I have been asked about some aspects of canonicalization over past couple of years (mostly as feature requests for Jackson), I thought I should go read a bit more about the current state of things.

I found that while there isn’t necessarily just one “Canonical JSON” specification, there is RFC-8785 (JSON Canonicalization Scheme (JCS)) which in turn builds on RFC-7493 (I-JSON (“Internet JSON”) Message Format) which offers a way to canonicalize JSON documents.

First, I-JSON creates the foundation by defining a set of “more interoperable” JSON documents (from a set of all valid JSON documents) that:

  • MUST be encoded in UTF-8
  • MUST NOT contain duplicate properties
  • SHOULD have JSON Object or JSON Arrays as the root value
  • SHOULD NOT include numbers beyond those that can be expressed as IEEE 754–2008 (64-bit double-precision) (this mostly for Javascript compatibility)

and beyond this, JCS further requires that:

  • White space between JSON tokens MUST NOT be emitted (aka “no indentation”)
  • Properties of JSON Objects must be lexicographically (~= “alphabetically”) MUST be ordered (using UTF-16 code units, basis for Java, Javascript and .NET language String representations) on serialization

How to canonicalize JSON with Jackson?

If we are to output JSON in canonical form, how could we achieve that with Jackson? Some parts are easy:

  1. If only Objects and Arrays are to be serialized, we will only serialize values that are expressed as such (most likely POJOs)
  2. To prevent output of intervening white space characters we need to do nothing: just avoid enabling indentation
  3. To prevent output of duplicate properties we may not necessarily need to do anything if we use POJO model (as POJOs have no duplicates): but we can guarantee no duplicates by further enabling StreamWriteFeature.STRICT_DUPLICATE_DETECTION (or when reading, StreamReadFeature.STRICT_DUPLICATE_DETECTION)

This leaves 2 main requirements: limiting numbers and sorting properties.
First one is bit trickier and for now requires either modeling things (do not use floating-point numbers at all, for example; or at least avoid BigDecimal, BigInteger), or just avoid using values outside of range.
I hope to improve this handling in future; but at least there is a simple way to handle sorting (note: there are other existing mechanisms one could use with Jackson — but this one is not well documented or, I think, widely used).

So: if you wanted to canonicalize a JSON document, you could do it by:

That’s all, Folks!

Written by

Open Source developer, most known for Jackson data processor (nee “JSON library”), author of many, many other OSS libraries for Java, from ClassMate to Woodstox

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store