Line-delimited JSON with Jackson

(aka Streaming Read/Write of Long JSON data streams)

JSON specification defines “JSON Values” as just “[JSON] Objects and Arrays” (*), and most JSON usage examples focus on that: for example, how to read an HTTP Request where payload is a JSON Object, how to write matching HTTP Response where payload is similarly a JSON Object.
There are countless articles covering such usage: with Jackson you would usually use something like this:

final JsonMapper mapper = new JsonMapper();
MyRequest reqOb = mapper.readValue(httpRequest.getInputStream(), MyRequest.class);
// or: JsonNode reqAsTree = mapper.readTree(...)
// ... processing, and then
MyResponse respOb = new MyResponse(.....);
mapper.writeValue(httpRequest.getOutputStream(), respOb);

Streaming Large Data Sets as a Sequence of Values

While single-value input and output is sufficient for request/response patterns (as well as many other use cases) there are cases where a slightly different abstraction is useful: a sequence of values.
The most common representation is so-called “line-delimited JSON”:

{"id":1, "value":137 }
{"id":2, "value":256 }
{"id":3, "value":-89 }
  1. It is not possible to split entries without full JSON parsing, if content is JSON array — but when using separator like linefeed (and NOT using pretty-printing) it is simple to split values by separator, without further decoding
  2. When writing content stream there is no need to keep track of whether separating comma is needed, nor closing “]” marker; values can be simply appended at the end (possibly by concurrent producers)

Reading Line-Delimited JSON with Jackson

Using our earlier simple data example, we can create a MappingIterator and use that for reading:

final JsonMapper mapper = ...;
final File input = new File("data.ldjson"); // or Writer, OutputStream
try (MappingIterator<JsonNode> it = mapper.readerFor(JsonNode.class)
.readValues(input)) {
while (it.hasNextValue()) {
JsonNode node = it.nextValue();
int id = node.path("id").intValue(); // or "asInt()" to coerce
int value = node.path("value").intValue();
// ... do something with it
}
static class Value {
public int id;
public int value;
}
try (MappingIterator<Value> it = mapper.readerFor(Value.class)
.readValues(input)) {
while (it.hasNextValue()) {
Value v = it.nextValue();
int id = v.id;
int value = v.value;
// ... process
}
}

Writing Line-Delimited JSON with Jackson

How about producing line-delimited JSON? Quite similar: the abstraction to use is SequenceWriter (instead of MappingIterator) and so:

final File outputFile = new File("data.ldjson");
try (SequenceWriter seq = mapper.writer()
.withRootValueSeparator("\n") // Important! Default value separator is single space
.writeValues(outputFile)) {
IdValue value = ...;
seq.write(value);
}

Bonus: Error Recovery with Line-Delimited JSON

Another useful feature with streaming reads is that it is possible to catch and gracefully handle some of deserialization issues. For example:

final int MAX_ERRORS = 5;
int line = 0;
int failCount = 0;
try (MappingIterator<Value> it = mapper.readerFor(Value.class)
.readValues(input)) {
while (it.hasNextValue()) {
++line;
Value v;
try {
v = it.nextValue();
// process normally
} catch (JsonMappingException e) {
++failCount;
System.err.println("Problem ("+failCount+") on line "+line+": "+e.getMessage());
if (failCount == MAX_ERRORS) {
System.err.println("Too many errors: aborting processing");
break;
}
}
}
}

Convenience methods: Read It All

Although MappingIterator fundamentally exposes iterator-style interface, there are convenience methods to use if you really just want to read all available entries:

try (MappingIterator<Value> it = mapper.readerFor(Value.class)) {
List<Value> all = it.readAll()
}
SequenceWriter seq = ...;
List<IdValue> values = ...
seq.writeAll(valuesToWrite);

Line-Delimited [INSERT FORMAT] with Jackson?

As you may have noticed, the API exposed is not JSON-specific at all and could work on any and all other formats as well. Whether it does work (for reading and/or writing) does depend on format in question. At least following Jackson backends should work:

  1. Smile (jackson-dataformat-smile): since it has the same logical structure as JSON, will work exactly the same!
  2. CBOR (jackson-dataformat-cbor): similar to Smile, should work as well, as per “CBOR Sequences” (RFC-8742), but I need to verify (see jackson-dataformats-binary#238)
  3. YAML (jackson-dataformat-yaml): single physical file or stream may contain multiple documents so the same approach should work as well
  4. Avro (jackson-dataformat-avro): should also “just” work and has been tested to some degree

Open Source developer, most known for Jackson data processor (nee “JSON library”), author of many, many other OSS libraries for Java, from ClassMate to Woodstox