Reading CSV with Jackson

First there was just JSON

Although Jackson started simply as a JSON parser library (see “Brief history of Jackson”), over the years it has grown to support dozen other data formats using extensions called (dataformat) Modules. At high level the API used is identical to that of reading and writing JSON — jackson-databind package and its API is format-agnostic — although there are typically some configuration settings that vary. So in theory it should be very easy to start using Jackson with one of supported (*) formats.

In practice, however, some formats are handled very similar to JSON (for example “native JSON” formats like CBOR, Smile, MessagePack and BSON); but other formats need a bit more care. CSV (Comma-Separated Values) is one such format.

This blog post covers a few ways you can read CSV with Jackson. I assume you are familiar with what CSV format is — if not (but are interested) please start with an article like “What is a CSV File” first.

(*) As of April 2021, there are known working format modules for at least following formats: Avro, BSON, CBOR, CSV, JSON, MessagePack, (Java) Properties, Protobuf, Smile, TOML, XML, YAML — and there are probably others I am not familiar with.

Why and how is CSV different from JSON

Put simply, CSV differs in a couple of dimensions:

  • CSV is tabular format, with columns and rows; JSON is Tree-structured with more flexible structure at format level.
  • CSV is positional: columns are identified by index, whereas JSON mostly uses named properties (although JSON does have positional Arrays too). It may or may not contain logical names for columns (more on this bit later)

Since Jackson API was designed to work with JSON data model, some common usage patterns are not directly applicable. For example, ObjectMapper.readValue() is less commonly used (it may be used but is not often) than variant than MappingIterator)

So, most of CSV reading uses “line-oriented” reading — you may want to read “Line-delimited JSON with Jackson” first, if you have not done so yet.
I will use similar approach for all examples here.

Simplest, “untyped” reading of CSV as List<List<String>>

The simplest API some Java CSV packages expose is to simply expose all column values of all rows as Strings, to read them as “Lists of Lists of Strings” (or arrays). While this is not commonly used with Jackson, it is one of supported mechanisms.

So assuming that we had CSV file with contents like this (no header row — explained later on)

1,2,true
2,9,false
-13,0,true

you can read it using:

final String CSV_DOC = "1,2,true\n2,9,false\n-13,0,true\n";
final CsvMapper mapper = new CsvMapper();
MappingIterator<List<String>> it = mapper
.readerForListOf(String.class)
.with(CsvParser.Feature.WRAP_AS_ARRAY) // !!! IMPORTANT
.readValues(CSV_DOC);
// If we want them all we use:
List<List<String>> all = it.readAll();
// or if not, we would instead:
while (it.hasNextValue()) {
List<String> row = it.nextValue();
// process
}

There’s a bit to unpack in there:

  • We need to construct CsvMapper instead of general ObjectMapper (or JsonMapper) to make sure we read/write CSV encoded content
  • We use convenience method readerForListOf() to get ObjectReader for reading List<String> values
  • We MUST enable one specific CSV feature to force individual rows to be exposed as equivalent to JSON Arrays — this is only needed when we do not use CsvSchema (explained in following sections)
  • We have at least 2 ways to read contents: MappingIterator gives rows one by one, but also has convenience method readAll() for “just read them all” slurping of content

Anyway, this is probably not the way you want to do it but it is good to be aware of this method in case you see code using it.

Almost as simple: reading contents as Maps, POJOs

Using column positions for access is error-prone. In many cases columns have logical names — often in form of something called “header”, wherein the first row of the CSV document actually contains column names instead of values.

But before looking into this, let’s consider the case where we simply assign names to columns ourself. This is done by creating CsvSchema object, defined by CSV module:

CsvSchema schema = CsvSchema.builder()
.addColumn("x")
.addColumn("y")
.addColumn("visible")
.build();

With that, we can tell module that we want it to map the first column into property “x”, the second one into “y” and the third into “visible”. And then we can rewrite code like so:

MappingIterator<Map<String, String>> it = mapper
.readerForMapOf(String.class)
// NOTE: no wrapping needed
.with(schema)
.readValues(CSV_DOC);
Map<String, String> row = it.nextValue();
assertEquals("1", map.get("x"));
assertEquals("2", map.get("y"));
assertEquals("true", map.get("visible"));

This can be useful sometimes, as it now also lets use POJOs.
So let’s add this:

// CSV module defaults to alphabetic ordering so this is optional:
@JsonPropertyOrder({ "x", "y", "visible" })
public class Point {
public int x, y;
public boolean visible;
}

and we can read rows into POJOs like so:

MappingIterator<Point> it = mapper
.readerFor(Point.class)
.with(schema)
.readValues(CSV_DOC);
while (it.hasNextValue()) {
Point p = it.nextValue();
int x = p.x;
// do something!
}
// or, you could alternative slurp 'em all:
List<Point> points = it.readAll();

Note that here the conversion from String values into numbers and booleans happens automatically.

With a little help from The Header

So far so good: we can tell Jackson CSV module how the columns should be named and this lets us map rows into POJOs (or Maps, or even JsonNode if we wanted).
But do we need to build CsvSchema by hand?

No, usually not. In case of type Point we could actually ask for CsvSchema that matches its definition:

CsvSchema pointSchema = mapper.schemaFor(Point.class);

and as long as we have specified correct property order with @JsonPropertyOrder we are all set.

But perhaps more commonly, many CSV documents start with something called “header”; so we would actually have:

x,y,visible
1,2,true
2,9,false
-13,0,true

By default Jackson has no way of knowing if the first row has this special meaning so we need to indicate it does. This can be done by constructing a specifically configured column-less CsvSchema like so:

CsvSchema headerSchema = CsvSchema.emptySchema().withHeader();
String CSV_WITH_HEADER = ...; // see example above
MappingIterator<Map<String, String>> it = mapper
.readerFor(Point.class)
.with(headerSchema)
.readValues(CSV_WITH_HEADER);
// and read same as before

And That’s All For Now!

Above is actually the core foundation you need to read CSV with Jackson: the biggest remaining aspects on reader side are:

  1. Configuring CsvMapper with CsvParser.Feature — there are about a dozen settings for configuring aspects of possible comments, what to do with “extra” columns, whether to allow (and skip) empty lines and so on
  2. Configuring reading of specific documents by constructing and configuring CsvSchema instance used for reading: there are a few settings related to separator used (only defaults to Comma, can be changed), escape and/or quote character (if any) used and so forth

Come to think of it now, covering some of these settings is probably worth a separate post in future. :)

--

--

--

Open Source developer, most known for Jackson data processor (nee “JSON library”), author of many, many other OSS libraries for Java, from ClassMate to Woodstox

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Bifrost has successfully registered Rococo V1, awaiting for test-parachain, Weekly Report 40

Integrate Terraform with Jenkins

1/25/21–1/31/21

Installing Tensorflow GPU on Anaconda

5 things I learned in my first year as a 2nd Career Developer

Ops Scripting w. Bash: Frequency 3

Java Interview Questions

How to limit Firestore queries with strict rules

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
@cowtowncoder

@cowtowncoder

Open Source developer, most known for Jackson data processor (nee “JSON library”), author of many, many other OSS libraries for Java, from ClassMate to Woodstox

More from Medium

Obfuscate Spring Boot Applications with Proguard Maven Plugin

Spring SonarQube Integration

A sneak peek into Java8 Stream Transformation and Terminal Operations

Profile Java Applications using VisualVM