Serializing Java POJO as JSON Array with Jackson 2: more compact output

4 min readJan 14, 2017

Writing Java POJOs out as simple JSON is a good general mechanism for serializing Java Objects, to be transferred over network: sent as input for REST services, returned as output, and sometimes stored as-is in data store.
Due to simplicity and general reliability, this output format has become de-facto default serialization format for public APIs on network system.

Challenge: verbosity of JSON

But although “simple JSON” approach is overall a good strategy, it uses somewhat verbose output. Consider a simple POJO,Point:

public class Point {
 int x, y;
}

and its default JSON output:

{ "x" : 16, "y" : 27 }

While this is not an overly verbose example (since we are using very short property names, for example), this is not the most compact textual representation.

Improvement idea: use JSON Arrays

Could we make it more compact, without losing information? One obvious improvement idea would be to use “column-oriented” serialization as JSON Array (similar to CSV format):

[ 16, 27 ]

which would work if (but only if) we can guarantee ordering, and associate position (first column, second column) with property names (x, y) unambiguously. So logically we could transform between this compact representation, and logical Object it matches.

Jackson Already Supports Doing This

Since this looks like potentially useful idea for some usage — especially cases where we have full control over both endpoints of communication (or, both reader and writer of storage), and where data structure changes rarely or not at all — how could we achieve this?

Turns out that Jackson databind already supports this: in fact, has supported since version 2.1, released 4 years ago!
The key is annotation @JsonFormat and its property shape which allows altering kind of JSON value that will be produced on serialization: it may be — for example — be used to choose between numeric (timestamp) and textual (ISO-8601) serialization of Date/Time values. In this case we want to indicate that POJO that would normally be serialized as JSON Object should instead be serialized as JSON Array. For earlier example we simply add:

@JsonFormat(shape=JsonFormat.Shape.ARRAY)
public class Point {
 int x, y;
}

and resulting output is changed.

Alternative ways to configure

Example above specified that all Point values should be serialized as JSON Arrays. But what if you only want to use this was specific property? This is possible by using property annotation:

public class Stuff {
  @JsonFormat(shape=JsonFormat.Shape.ARRAY)
  public Point origin;
  // ... other fields
}

This can also be useful when working with 3rd party datatypes, where you can not directly add annotations (although you could use “mix-in annotations”).

Even further: with Jackson 2.8 it is also possible to define “write as array” completely without annotations, using so-called config overrides:

ObjectMapper mapper = new ObjectMapper();
mapper.configOverride(Point.class)
   .setFormat(JsonFormat.Value.forShape(JsonFormat.Shape.ARRAY));

which effectively works same as using @JsonFormat annotations on class itself.

One thing to note for all of above: annotation does not work recursively — that is, if annotated type has properties, those properties are not automatically serialized as arrays, but need to be similarly configured.

Benefits

Our original thinking was that use of JSON Arrays instead of JSON Objects can reduce size of JSON payload. But beyond reducing data size there is also direct impact on processing performance: it turns out that shrinking size typically makes processing faster as well. Usually improvement is somewhat directly relative to size difference: reducing size by 30% tends to increase processing speed by 30%.

In this particular case improvement can be even bigger for one specific reason: processing of JSON property names is often more expensive (character-by-character basis) than processing of values — names need to be decoded, validated, whereas some values (like integral numbers) do not require as much work.

Still not impressed? Since Jackson supports multiple alternative data formats beyond JSON, this feature can be very helpful with “JSON-like” formats: in particular “binary JSON” formats CBOR and Smile benefit a lot from using this style of output. In some benchmarks (like jackson-benchmarks) throughput improvement is 50%, for already efficient read/write.

Limits?

Ability to “Arrayize” POJOs is not omnipotent: it can not be used in all cases. Limitations come from the fact that some other features require use of alternate forms: for example, polymorphic type handling may require output of variable number (and order) of properties, so as a general rule, polymorphic values (annotation @JsonTypeInfo) still get serialized as JSON Objects. Same is true when using Object Id.

Another type of limitation is that of filtering: when using @JsonView it is possible to serialize as array (as well as with @JsonFilter), but values can not be completely filtered out because positional nature of output (that is: removing of an element will shift values causing possible mismatch). As such, this output mode is not usually a good match for filtering output.

Note that as a user you do not have to enforce such limits: Jackson will only use “output as Array” in cases where it is possible. If it is not possible, default output as JSON Object will be used: similarly, when deserializing, expectation matches output side so that end-to-end functionality works.
But it is good to be aware of limitations to understand why sometimes annotations may not change output structure.

When To Use and When Not

As briefly mention above, use of this feature should probably be limited to where benefits of more compact representation are significant. But it is also important to consider the fact that positional (column-oriented) format is more fragile than name/value pairs: changes to format are difficult to support. Because of this, it is best to only use json-array output for systems where you control both sides of writing and reading data (client/server).