Reading CSV with Jackson

“How to use jackson-dataformat-csv”

First there was just JSON

Although Jackson started simply as a JSON parser library (see “Brief history of Jackson”), over the years it has grown to support dozen other data formats using extensions called (dataformat) Modules. At high level the API used is identical to that of reading and writing JSON — jackson-databind package and its API is format-agnostic — although there are typically some configuration settings that vary. So in theory it should be very easy to start using Jackson with one of supported (*) formats.

In practice, however, some formats are handled very similar to JSON (for example “native JSON” formats like CBOR, Smile, MessagePack and BSON); but other formats need a bit more care. CSV (Comma-Separated Values) is one such format.

This blog post covers a few ways you can read CSV with Jackson. I assume you are familiar with what CSV format is — if not (but are interested) please start with an article like “What is a CSV File” first.

(*) As of April 2021, there are known working format modules for at least following formats: Avro, BSON, CBOR, CSV, JSON, MessagePack, (Java) Properties, Protobuf, Smile, TOML, XML, YAML — and there are probably others I am not familiar with.

Why and how is CSV different from JSON

Put simply, CSV differs in a couple of dimensions:

Since Jackson API was designed to work with JSON data model, some common usage patterns are not directly applicable. For example, ObjectMapper.readValue() is less commonly used (it may be used but is not often) than variant than MappingIterator)

So, most of CSV reading uses “line-oriented” reading — you may want to read “Line-delimited JSON with Jackson” first, if you have not done so yet.
I will use similar approach for all examples here.

Simplest, “untyped” reading of CSV as List<List<String>>

The simplest API some Java CSV packages expose is to simply expose all column values of all rows as Strings, to read them as “Lists of Lists of Strings” (or arrays). While this is not commonly used with Jackson, it is one of supported mechanisms.

So assuming that we had CSV file with contents like this (no header row — explained later on)

1,2,true
2,9,false
-13,0,true

you can read it using:

final String CSV_DOC = "1,2,true\n2,9,false\n-13,0,true\n";
final CsvMapper mapper = new CsvMapper();
MappingIterator<List<String>> it = mapper
.readerForListOf(String.class)
.with(CsvParser.Feature.WRAP_AS_ARRAY) // !!! IMPORTANT
.readValues(CSV_DOC);
// If we want them all we use:
List<List<String>> all = it.readAll();
// or if not, we would instead:
while (it.hasNextValue()) {
List<String> row = it.nextValue();
// process
}

There’s a bit to unpack in there:

Anyway, this is probably not the way you want to do it but it is good to be aware of this method in case you see code using it.

Almost as simple: reading contents as Maps, POJOs

Using column positions for access is error-prone. In many cases columns have logical names — often in form of something called “header”, wherein the first row of the CSV document actually contains column names instead of values.

But before looking into this, let’s consider the case where we simply assign names to columns ourself. This is done by creating CsvSchema object, defined by CSV module:

CsvSchema schema = CsvSchema.builder()
.addColumn("x")
.addColumn("y")
.addColumn("visible")
.build();

With that, we can tell module that we want it to map the first column into property “x”, the second one into “y” and the third into “visible”. And then we can rewrite code like so:

MappingIterator<Map<String, String>> it = mapper
.readerForMapOf(String.class)
// NOTE: no wrapping needed
.with(schema)
.readValues(CSV_DOC);
Map<String, String> row = it.nextValue();
assertEquals("1", map.get("x"));
assertEquals("2", map.get("y"));
assertEquals("true", map.get("visible"));

This can be useful sometimes, as it now also lets use POJOs.
So let’s add this:

// CSV module defaults to alphabetic ordering so this is optional:
@JsonPropertyOrder({ "x", "y", "visible" })
public class Point {
public int x, y;
public boolean visible;
}

and we can read rows into POJOs like so:

MappingIterator<Point> it = mapper
.readerFor(Point.class)
.with(schema)
.readValues(CSV_DOC);
while (it.hasNextValue()) {
Point p = it.nextValue();
int x = p.x;
// do something!
}
// or, you could alternative slurp 'em all:
List<Point> points = it.readAll();

Note that here the conversion from String values into numbers and booleans happens automatically.

With a little help from The Header

So far so good: we can tell Jackson CSV module how the columns should be named and this lets us map rows into POJOs (or Maps, or even JsonNode if we wanted).
But do we need to build CsvSchema by hand?

No, usually not. In case of type Point we could actually ask for CsvSchema that matches its definition:

CsvSchema pointSchema = mapper.schemaFor(Point.class);

and as long as we have specified correct property order with @JsonPropertyOrder we are all set.

But perhaps more commonly, many CSV documents start with something called “header”; so we would actually have:

x,y,visible
1,2,true
2,9,false
-13,0,true

By default Jackson has no way of knowing if the first row has this special meaning so we need to indicate it does. This can be done by constructing a specifically configured column-less CsvSchema like so:

CsvSchema headerSchema = CsvSchema.emptySchema().withHeader();
String CSV_WITH_HEADER = ...; // see example above
MappingIterator<Map<String, String>> it = mapper
.readerFor(Point.class)
.with(headerSchema)
.readValues(CSV_WITH_HEADER);
// and read same as before

And That’s All For Now!

Above is actually the core foundation you need to read CSV with Jackson: the biggest remaining aspects on reader side are:

Come to think of it now, covering some of these settings is probably worth a separate post in future. :)

Open Source developer, most known for Jackson data processor (nee “JSON library”), author of many, many other OSS libraries for Java, from ClassMate to Woodstox