Handling non-compliant JSON with Jackson

aka How to read Broken JSON with Jackson

author’s note: (paraphrasing Tolstoy) all compliant JSON looks alike; but all non-compliant content is broken in its own way.

Background: why/whence invalid JSON?

But simplicity itself can be problematic: many developers feel JSON format is missing things they consider important; from missing native date/time types to inability to indicate type information and lack of support for cyclic data structures. This has led to development of various JSON derivative formats; some claiming to be extensions, other simply mentioning heritage.
As long as those formats do not claim to be JSON that is fine: they are not JSON and you need to use different tools for processing them. Right format for the job and so on.

But beyond clear forks there are also some “features”, ways to change definition of JSON only slightly; deviations that have proven popular enough to be used by significant part of developer population, despite obvious interoperability concerns (things that are not part of specification may or may not be supported by various tools).
Such features include:

  • Ability to include comments (something earliest JSON specification proposal actually included, before being removed by the author due to fear of likely abuse)
  • Ability to avoid “unnecessary” quoting — that is, to use unquoted Object keys — or to use alternate quote character like apostrophe.
  • Ability to avoid “unnecessary” values (omit null values in array)

but there are obviously many more potential tweaks in existence.

Background: Jackson reads/writes VALID JSON

But due to user requests (and existence and use of non-compliant “JSON” content) support has been added to work with a set of commonly seen “JSON extensions”: this support must be explicitly enabled for reading and/or writing.

Jackson support for non-compliant JSON: mechanisms

  • JsonReadFeatures for JsonParser (via JsonFactory): to allow/disallow reading of non-compliant constructs
  • JsonWriteFeatures for JsonGenerator (via JsonFactory): to change behavior of some write operations to use non-compliant constructs

There are, however, multiple ways to configure and override settings when using databinding: it is possible to specify defaults for JsonFactory, but also override settings with ObjectReader (for JsonParser features) and ObjectWriter (for JsonGenerator ).

So, for example: to allow use of “Java-style” comments (// or /*…*/) in “JSON” content being read you could use:

JsonFactory f = JsonFactory.builder()
.enable(JsonReadFeature.ALLOW_JAVA_COMMENTS)
.build();
ObjectMapper mapper = JsonMapper.builder(f)
.build();
JsonNode root = mapper.readTree("// Stuff!\n{ \"value\":42}");

or if you got a pre-constructed mapper, you can change setting:

ObjectMapper mapper = ...;
JsonNode root = mapper.readerFor(JsonNode.class)
.with(JsonReadFeature.ALLOW_JAVA_COMMENTS)
.readValue("[ 123, /* second */ 456 ]");

Non-compliant JSON content: Comments

  • Java-style (or, originally, C/C++ style) comments: either end-of-line starting with // or a potentially longer section that starts with /* and ends with */
  • “YAML-style” (or in general “scripting” comments): end of line starting with # character.

In both cases comments may only start at point where non-significant white-space would be allowed: and in fact the way Jackson parser works comments — if enabled — will be dealt with exactly as if they were just white space. There is no way currently for accessing content in comments and as such they are not meant to be processed in any way: this is different from formats like XML where comment sections may be exposed by API as distinct events.

But beyond allowing skipping of incoming comments (in one or both styles), is there any support for writing comments?
Currently, no; there is no explicit support for writing comments in JSON content. But it IS possible to write them using “raw” writes. So we should be able to do something like this:

JsonFactory f = JsonFactory.builder()
.enable(JsonReadFeature.ALLOW_JAVA_COMMENTS)
.enable(JsonReadFeature.ALLOW_YAML_COMMENTS)
.build();
StringWriter sw = new StringWriter();
try (JsonGenerator gen = f.createGenerator(sw)) {
// NOTE! not "writeRawValue()" but "writeRaw()"!
f.writeRaw("# File generated by Foobar\n");
f.writeStartArray();
f.writeNumber(123);
r.writeEndArray();
}
String json = sw.toString();
// We should have:
// # File generated by Foobar
// [123]
try (JsonParser p = f.createParser(json)) {
// note: comment quietly skipped instead of exception
assertEquals(JsonToken.START_ARRAY, p.nextToken());
assertEquals(JsonToken.VALUE_NUNBER_INT, p.nextToken());
assertEquals(123, p.getIntValue());
assertEquals(JsonToken.END_ARRAY, p.nextToken());
}

Non-compliant JSON content: extra commas/missing values

  • Allow leaving out values in JSON Arrays so that [123,,4] would basically be equivalent to [123,null,4] (handled same way) — that is, a “missing” value between two commas becomes null token.
    Handling can be enabled with JsonReadFeature.ALLOW_MISSING_VALUES.
  • Allow addition of one extra “trailing” comma after last Array element or Object property: [123, 35, ] and {"name":"Bob",} — usually so that for longer Arrays or Object literals all entries may have trailing comma and the last entry is not different from any other.
    Handling can be enabled with JsonReadFeature.ALLOW_TRAILING_COMMA.

As with most other features, there is currently no writer-side feature to skip null elements or to add trailing comma; although latter can theoretically be written by using writeRaw().

Non-compliant JSON content: alternate String quoting

  • Use of alternate quote character (single-quote aka apostrophe) for Object keys and all String values — usually to reduce need for escaping for String values that contain double quotes
  • Allow fully unquoted Object keys (but not String values): presumably just for … less typing?

Usage pattern is similar to other features:

JsonFactory f = JsonFactory.builder()
.enable(JsonReadFeature.ALLOW_SINGLE_QUOTES)
.enable(JsonReadFeature.ALLOW_UNQUOTED_FIELD_NAMES)
.build();
final String input = "{ value : 'great!' }";
try (JsonParser p = f.createParser(input)) {
assertEquals(JsonToken.START_OBJECT, p.nextToken());
assertEquals("value", p.nextFieldName());
assertEquals("great!", p.nextTextValue());
assertEquals(JsonToken.END_OBJECT, p.nextToken());
}

So how about output side? No support? Actually, there IS some support in this case: it is possible to:

  • Avoid quoting of Object keys by disabling JsonWriteFeature.QUOTE_FIELD_NAMES
  • Allow use of an alternate quote character (for both Object keys and String values) — with specific configuration of JsonFactory

And here’s an example:

JsonFactory f = JsonFactory.builder()
// leave Object keys without any quotes
enable(JsonWriteFeature.QUOTE_FIELD_NAMES)
// and with String values... Let's not settle for apostrophe...
// use asterisk for funsies!
.quoteChar('*')
.build();
StringWriter sw = new StringWriter();
try (JsonGenerator gen = f.createGenerator(sw)) {
gen.writeStartObject();
gen.writeFieldName("name");
gen.writeString("Leia");
gen.writeEndObject();
}
// And we shall have:
//
// {name:*Leia*}

Non-compliant JSON content: number variations

On reader side we have:

  • JsonReadFeature.ALLOW_LEADING_ZEROS_FOR_NUMBERS : enable to allow values like 0020 to be interpreted same as if leading zeroes were not included (that is, same as 20)
  • JsonReadFeature.ALLOW_LEADING_DECIMAL_POINT_FOR_NUMBERS: enable to allow otherwise invalid numeric values like .123 (to mean same as 0.123)
  • JsonReadFeature.ALLOW_NON_NUMERIC_NUMBERS: allow a set of “Not-a-Number” constants — positive and negative infinity (INF / Infinity, -INF / -Infinity), not-a-number (NaN) — to be decoded into matching floating-point value (usually double)

and on writer side

  • JsonWriteFeature.WRITE_NAN_AS_STRINGS: support serialization of above-mentioned “Not-a-Number” constants as JSON String values
  • JsonWriteFeature.WRITE_NUMBERS_AS_STRINGS: write ALL numeric values as JSON Strings instead of JSON Numbers

Non-compliant JSON content: other

  • JsonReadFeature.ALLOW_UNESCAPED_CONTROL_CHARS — by default all ASCII/Unicode control characters (code points under 32) must be escaped in String values; enable this feature to allow unescaped control characters (usually tabs, \t)
  • JsonReadFeature.ALLOW_BACKSLASH_ESCAPING_ANY_CHARACTER: by default only characters explicitly specified as requiring escaping can use single character escape sequence. Enable this feature to allow this for all other characters.
  • JsonWriteFeature.ESCAPE_NON_ASCII: enable this to force escaping of all Unicode characters above value 127.

Open Source developer, most known for Jackson data processor (nee “JSON library”), author of many, many other OSS libraries for Java, from ClassMate to Woodstox