Reading CSV with Jackson
“How to use jackson-dataformat-csv”
First there was just JSON
Although Jackson started simply as a JSON parser library (see “Brief history of Jackson”), over the years it has grown to support a dozen other data formats using extensions called (dataformat) Modules. At high level the API used is identical to that of reading and writing JSON — jackson-databind package and its API is format-agnostic — although there are typically some configuration settings that vary. So in theory it should be very easy to start using Jackson with one of supported (*) formats.
In practice, however, some formats are handled very similar to JSON (for example “native JSON” formats like CBOR, Smile, MessagePack and BSON); but other formats need a bit more care. CSV (Comma-Separated Values) is one such format.
This blog post covers a few ways you can read CSV with Jackson. I assume you are familiar with what CSV format is — if not (but are interested) please start with an article like “What is a CSV File” first.
(*) As of April 2021, there are known working format modules for at least following formats: Avro, BSON, CBOR, CSV, JSON, MessagePack, (Java) Properties, Protobuf, Smile, TOML, XML, YAML — and there are probably others I am not familiar with.
Why and how is CSV different from JSON
Put simply, CSV differs in a couple of aspects:
- CSV is tabular format, with columns and rows; JSON is Tree-structured with more flexible structure at format level.
- CSV is positional: columns are identified by index, whereas JSON mostly uses named properties (although JSON does have positional Arrays too). It may or may not contain logical names for columns (more on this a bit later)
Since Jackson API was designed to work with JSON data model, some common usage patterns are not directly applicable. For example, ObjectMapper.readValue()
is less commonly used (it may be used but is not used often) than ObjectMapper.readValues()
and other methods that return MappingIterator
.
So, most of CSV reading uses “line-oriented” reading — you may want to read “Line-delimited JSON with Jackson” first, if you have not done so yet.
I will use similar approach for all examples here.
Simplest, “untyped”, reading of CSV as List<List<String>>
The simplest API some Java CSV packages expose is to just expose all column values of all rows as Strings, to read them as “Lists of Lists of Strings” (or arrays). While this is not commonly used with Jackson, it is one of supported mechanisms.
Assuming that we had CSV file with contents like this (no header row — explained later on):
1,2,true
2,9,false
-13,0,true
you can read it using:
String CSV_DOC = "1,2,true\n2,9,false\n-13,0,true\n";
CsvMapper mapper = new CsvMapper();MappingIterator<List<String>> it = mapper
.readerForListOf(String.class)
.with(CsvParser.Feature.WRAP_AS_ARRAY) // !!! IMPORTANT
.readValues(CSV_DOC);// If we want them all we use:
List<List<String>> all = it.readAll();// or if not, we would instead:
while (it.hasNextValue()) {
List<String> row = it.nextValue();
// process
}
There’s quite a bit to unpack here:
- We need to construct
CsvMapper
instead of generalObjectMapper
(orJsonMapper
) to make sure we read/write CSV encoded content - We use convenience method
readerForListOf()
to getObjectReader
for readingList<String>
values - We MUST enable one specific CSV feature to force individual rows to be exposed as equivalent to JSON Arrays — this is only needed when we do not use
CsvSchema
(explained in following sections) - We have at least 2 ways to read contents:
MappingIterator
gives rows one by one, but also has convenience methodreadAll()
for “slurping” (“just read them all”) of content
Anyway, this is probably not the way you want to do it but it is good to be aware of this method in case you see code using it.
Almost as simple: reading contents as Maps, POJOs
Using column positions for access is error-prone. In many cases columns have logical names — often in form of something called “header”, wherein the first row of the CSV document actually contains column names instead of values.
But before looking into this, let’s consider the case where we simply assign names to columns. This is done by creating a CsvSchema
object, defined by CSV module:
CsvSchema schema = CsvSchema.builder()
.addColumn("x")
.addColumn("y")
.addColumn("visible")
.build();
With that, we can tell module that we want it to map the first column into property “x”, the second one into “y” and the third into “visible”. And then we can rewrite code like so:
MappingIterator<Map<String, String>> it = mapper
.readerForMapOf(String.class)
// NOTE: no wrapping needed
.with(schema)
.readValues(CSV_DOC);
Map<String, String> row = it.nextValue();
assertEquals("1", map.get("x"));
assertEquals("2", map.get("y"));
assertEquals("true", map.get("visible"));
This can be useful sometimes, as it now also lets us use POJOs.
So let’s add this:
// CSV module defaults to alphabetic ordering so this is optional:
@JsonPropertyOrder({ "x", "y", "visible" })
public class Point {
public int x, y;
public boolean visible;
}
and we can read rows into POJOs like so:
MappingIterator<Point> it = mapper
.readerFor(Point.class)
.with(schema)
.readValues(CSV_DOC);
while (it.hasNextValue()) {
Point p = it.nextValue();
int x = p.x;
// do something!
}
// or, you could alternative slurp 'em all:
List<Point> points = it.readAll();
Note that here the conversion from String values into numbers and boolean
s happens automatically.
With a little help from The Header
So far so good: we can tell Jackson CSV module how the columns should be named and this lets us map rows into POJOs (or Maps, or even JsonNode
if we wanted).
But do we need to build CsvSchema
by hand?
No, usually we do not. In the case of type Point
we could actually ask for CsvSchema
that matches its definition:
CsvSchema pointSchema = mapper.schemaFor(Point.class);
and as long as we have specified correct property order with @JsonPropertyOrder
we are all set.
But perhaps more commonly, many CSV documents start with something called “header row”; so we would actually have:
x,y,visible
1,2,true
2,9,false
-13,0,true
By default Jackson has no way of knowing if the first row has this special meaning so we need to indicate it does. This can be done by constructing a specifically configured column-less CsvSchema
like so:
CsvSchema headerSchema = CsvSchema.emptySchema().withHeader();
String CSV_WITH_HEADER = ...; // see example aboveMappingIterator<Map<String, String>> it = mapper
.readerFor(Point.class)
.with(headerSchema)
.readValues(CSV_WITH_HEADER);
// and read same as before
And That’s All For Now!
Above is actually the core foundation you need to read CSV with Jackson: the biggest remaining aspects on reader side are:
- Configuring
CsvMapper
withCsvParser.Feature
— there are about a dozen settings for configuring aspects of possible comments, what to do with “extra” columns, whether to allow (and skip) empty lines and so on - Configuring reading of specific documents by constructing and configuring
CsvSchema
instance used for reading: there are a few settings related to separator used (only defaults to Comma, can be changed), escape and/or quote character (if any) used and so forth
Come to think of it now, covering some of these settings is probably worth a separate post in future. :)