Reading CSV with Jackson

“How to use jackson-dataformat-csv”

First there was just JSON

Although Jackson started simply as a JSON parser library (see “Brief history of Jackson”), over the years it has grown to support dozen other data formats using extensions called (dataformat) Modules. At high level the API used is identical to that of reading and writing JSON — jackson-databind package and its API is format-agnostic — although there are typically some configuration settings that vary. So in theory it should be very easy to start using Jackson with one of supported (*) formats.

Why and how is CSV different from JSON

Put simply, CSV differs in a couple of dimensions:

  • CSV is positional: columns are identified by index, whereas JSON mostly uses named properties (although JSON does have positional Arrays too). It may or may not contain logical names for columns (more on this bit later)

Simplest, “untyped” reading of CSV as List<List<String>>

The simplest API some Java CSV packages expose is to simply expose all column values of all rows as Strings, to read them as “Lists of Lists of Strings” (or arrays). While this is not commonly used with Jackson, it is one of supported mechanisms.

1,2,true
2,9,false
-13,0,true
final String CSV_DOC = "1,2,true\n2,9,false\n-13,0,true\n";
final CsvMapper mapper = new CsvMapper();
MappingIterator<List<String>> it = mapper
.readerForListOf(String.class)
.with(CsvParser.Feature.WRAP_AS_ARRAY) // !!! IMPORTANT
.readValues(CSV_DOC);
// If we want them all we use:
List<List<String>> all = it.readAll();
// or if not, we would instead:
while (it.hasNextValue()) {
List<String> row = it.nextValue();
// process
}
  • We use convenience method readerForListOf() to get ObjectReader for reading List<String> values
  • We MUST enable one specific CSV feature to force individual rows to be exposed as equivalent to JSON Arrays — this is only needed when we do not use CsvSchema (explained in following sections)
  • We have at least 2 ways to read contents: MappingIterator gives rows one by one, but also has convenience method readAll() for “just read them all” slurping of content

Almost as simple: reading contents as Maps, POJOs

Using column positions for access is error-prone. In many cases columns have logical names — often in form of something called “header”, wherein the first row of the CSV document actually contains column names instead of values.

CsvSchema schema = CsvSchema.builder()
.addColumn("x")
.addColumn("y")
.addColumn("visible")
.build();
MappingIterator<Map<String, String>> it = mapper
.readerForMapOf(String.class)
// NOTE: no wrapping needed
.with(schema)
.readValues(CSV_DOC);
Map<String, String> row = it.nextValue();
assertEquals("1", map.get("x"));
assertEquals("2", map.get("y"));
assertEquals("true", map.get("visible"));
// CSV module defaults to alphabetic ordering so this is optional:
@JsonPropertyOrder({ "x", "y", "visible" })
public class Point {
public int x, y;
public boolean visible;
}
MappingIterator<Point> it = mapper
.readerFor(Point.class)
.with(schema)
.readValues(CSV_DOC);
while (it.hasNextValue()) {
Point p = it.nextValue();
int x = p.x;
// do something!
}
// or, you could alternative slurp 'em all:
List<Point> points = it.readAll();

With a little help from The Header

So far so good: we can tell Jackson CSV module how the columns should be named and this lets us map rows into POJOs (or Maps, or even JsonNode if we wanted).
But do we need to build CsvSchema by hand?

CsvSchema pointSchema = mapper.schemaFor(Point.class);
x,y,visible
1,2,true
2,9,false
-13,0,true
CsvSchema headerSchema = CsvSchema.emptySchema().withHeader();
String CSV_WITH_HEADER = ...; // see example above
MappingIterator<Map<String, String>> it = mapper
.readerFor(Point.class)
.with(headerSchema)
.readValues(CSV_WITH_HEADER);
// and read same as before

And That’s All For Now!

Above is actually the core foundation you need to read CSV with Jackson: the biggest remaining aspects on reader side are:

  1. Configuring reading of specific documents by constructing and configuring CsvSchema instance used for reading: there are a few settings related to separator used (only defaults to Comma, can be changed), escape and/or quote character (if any) used and so forth

Open Source developer, most known for Jackson data processor (nee “JSON library”), author of many, many other OSS libraries for Java, from ClassMate to Woodstox