Jackson Jr for casual JSON reading/writing from Java

Jackson jr is not very new library at this point — the first official version (2.3.0) was released 2 years ago. But little has been written about it so far; most of the documentation that exists is on README of the project home page.

Background

One of complaints or misgivings some Java developers have about Jackson is that it is perceived as heavy and somewhat hard to use. These are subjective assessments, and difficult to quantify or compare, but perception exists.

I think that the big reason for this perception is directly due to size of API, which itself is largely function of amount of functionality contained: Jackson does A LOT, provides wide range of functionality, and exposing it all through a single library and coherent API is neither trivial nor something that leads to minimal(istic) interface. And while practical subset most developers use is quite small it is easy to become overwhelmed with multiple tutorials or when browsing full javadocs.

There is also another aspect that is sometimes relevant; startup time. Startup time of full Jackson databind is indeed longer than that of simpler JSON processing libraries; especially due to extensive support of annotations and advanced introspection: time for the first read or write call is significantly higher than that of further calls. While the startup time is commonly not a big issue for services with long running times, it can be an issue for casual use (like reading a single config file) on platforms like Android.

So… I started wondering: what if I was building Jackson again, from ground up, to address these issues? Sort of similar to what I did originally with Jackson 0.x (streaming API only), based on what I had learnt from XML world, having written Woodstox XML parser.

Learnt lessons

Writing something for the first time tends to produce kinda-failures: interesting, creative, but at some level sub-optimal things. Second time around you will create something less flawed (and generally flawed in different ways), and third or fourth time you nail it. What you need to do is to make sure you don’t keep on writing the “first new X”, but aim for Nth iteration. Big part of this is both to figure out what you are trying to solve, but also which parts “ain’t broken”.

In case of Jackson, I figured that the core Streaming API works rather well, and the issues mentioned are due to data-binding. So there’s little point in rewriting `jackson-core`: in fact, it probably makes sense to use it exactly as-is. This gives us:

  • Mature, well-tested incremental/streaming decoder (“parser”) and encoder (“generator”)
  • High performance, low-overhead
  • Potential access to many other data formats (by latest count Jackson has dataformat modules for at least dozen formats from Avro to XML)

But I also do like some parts of Jackson databind API, especially parts that focus on immutability. One specific example is use of `ObjectReader` and `ObjectWriter`, instead of `ObjectMapper`: reader and writer use “mutant factory” approach that is similar to fluent-style or builders:

MyPojo value = mapper.readerFor(MyPojo.class).readValue(source);

where resulting (and intermediate) `ObjectReader` instance is immutable and may be freely shared across threads, retained references to.

Another thing I have come to realize is that reading and writing are fundamentally different operations from interface perspective: while they are conceptually two sides of the coin, approach to writing good reading (input) interface can vary a lot from that of writing good writing interface.
I specifically think that low-level writing is often very different from that of reading; and that this is an area where improvements are possible.

Approach

Given above thinking (some of it being cleansed with 2 years of perspective), I decided to do following:

The end result was a total uber-jar size of about 300kB.

Something old

So, `jackson-core` is used without modifications, and versioning is synchronized with “standard” Jackson, for simplicity. Two versions of jars are produced:

  • Modular components like `jackson-jr-objects`, which depend on `jackson-core`
  • Uber-jar version `jackson-jr-all`, which shades in `jackson-core` (in different Java package) and includes all jackson-jr components

where uber-jar is included both as convenience (single jar) and for avoidance of versioning conflicts: frameworks that have “casual” JSON processing needs could, for example, use `jackson-jr-all` for reading configuration settings and avoid dependency to standard Jackson components: this allows users to use whatever version they want without versioning conflicts or concerns.

Something borrowed

For databinding, jackson-jr supports Java Beans, along with basic JDK types (Lists, Maps, enums, primitives, wrappers). For our purposes Bean means:

Functionality is simple and straight-forward, leading to a small codebase and fast startup time; downside being lack of configurability.

Typical usage looks like:

String json = "{\"a\":[1,2,{\"b\":true},3],\"c\":3}";
Object ob = JSON.std.anyFrom(json); // will be `Map`
// or
Map<String,Object> map = JSON.std.mapFrom(json);
// or with bean that has properties 'a' and 'c':
MyBean bean = JSON.std.beanFrom(MyBean.class, INPUT);
String output = JSON.std.asString(map);
JSON.std.write(ob, new File("/tmp/stuff.json");
// and with indentation; but skip writing of null properties
byte[] bytes = JSON.std
.with(Feature.PRETTY_PRINT_OUTPUT)
.without(Feature.WRITE_NULL_PROPERTIES)
.asBytes(bean);

In addition to Bean types, enums, arrays, Collections and Maps are supported along with limited support for `java.util.Date` and `java.util.Calendar` For values of other types serialization simply uses `toString()` method; deserialization would require existence of single-String/int/long constructor.

Minor improvements and additions are possible, to perhaps allow very basic pluggability of code that could change naming or perhaps allow alternate instantiators. But the goal is to keep things simple and straight-forward.

Something new

Perhaps the only really new thing is the addition of “Composer” interface for generating JSON (etc) content. This was based on my thinking that the generation side need (and should) not simply try to simply emulate reader side.

From `README.md` of jackson-jr home page, we get:

String json = JSON.std
.with(JSON.Feature.PRETTY_PRINT_OUTPUT)
.composeString()
.startObject()
.put("a", 1)
.startArrayField("arr")
.add(1).add(2).add(3)
.end()
.startObjectField("ob")
.put("x", 3)
.put("y", 4)
.startArrayField("args").add("none").end()
.end()
.put("last", true)
.end()
.finish();

for producing output:

{
"a" : 1,
"arr" : [1,2,3],
"ob" : {
"x" : 3,
"y" : 4,
"args" : ["none"]
},
"last" : true
}

note, in particular, that there is no need to use `endArray()` or `endObject()` (type inferred from contextual composer); and that only applicable methods are included for auto-completion (try it!) — if you start an Array, only value addition methods are available; similarly for Object only “put” style methods may be called. In other words, composition is type safe within JSON content model.

Also worth noting is that Composer can be used to not only build a String or `byte[]`, but also stream output directly into `OutputStream` or `Writer`: or it may even be used to construct Java `List`s or `Map`s!

On configuration

We have already seen a few examples of usage, starting with `JSON` root object, and using fluent-style configuration. But there is one important thing about configurability: every `JSON` instance is FULLY IMMUTABLE, and thereby thread safe: you can create a configured instance, store it as a static singleton; pass it between threads, whatever. Instances may be used as a base, and further changes applied. Construction, reconfiguration are cheap operations as well; none of objects is costly to construct.

Full set of configuration options is defined as enumeration `JSON.Feature`, and I will not print them all, but couple of more useful ones (to enable using `JSON.with()`, disable using `JSON.without()`) are:

Word on performance

One of the things developers tend to over-optimize for (and/or worry too much about) is performance: most libraries you can choose are probably fast enough for most of the uses. Having said that there are cases where performance still matters, above and beyond startup time, and this is where Jackson itself has traditionally fared well — it consistently either tops the list, or is within measurement error of fastest contenders. So how does `jackson-jr` do?

According to benchmarks I use for evaluating optimizations, it does pretty well: for Beans, reading performance is typically within 10–20% of “full” Jackson (although that can be further improved by Afterburner); and writing Beans is not much worse off. But for reading `List`s and `Map`s, jackson-jr can actually run at same speed. So for “untyped” use (read JSON as Lists, Maps), there is no performance penalty form using jackson-jr.

Add-ons!

But wait! There’s more!

In addition to basic databinding, Jackson-jr 2.7 introduced couple of modular optional additions:

I should write more about first one, specifically, but for now I’ll just point to tests under build module.

Open Source developer, most known for Jackson data processor (nee “JSON library”), author of many, many other OSS libraries for Java, from ClassMate to Woodstox

Open Source developer, most known for Jackson data processor (nee “JSON library”), author of many, many other OSS libraries for Java, from ClassMate to Woodstox