Jackson 2.12: improved XML module

(continuation of “Deeper Dive on Jackson 2.12” mini-series — see “Jackson 2.12 Features” for context)

Aside from “big 5” features discussed so far, another major area of work this time around was that of Jackson XML dataformat module: no fewer than 26 XML-specific issues were resolved for this release.
As usual, you can see the full list of changes on 2.12 Release Notes; here we’ll dig deeper into most notable fixes and improvements.

Nested Lists in POJOs should work reliably (esp. unwrapped)

Before 2.12 content models with / valued properties worked well enough for single-level cases, but various combinations of unwrapped (see ) s, nested deeper tended to have edge cases for deserialization (serialization did not have issues); especially in presence of attributes and polymorphic value handling.

During 2.12 development all accumulated failure cases were resolved; see f.ex [dataformat-xml#257], [dataformat-xml#307], [dataformat-xml#314], [dataformat-xml#390].

Empty elements (esp. for POJOs) s

Before 2.12, content like

could be matched to, for example

and would result in of ; similarly for other value types.
With 2.12 this instead becomes “empty” POJO, or , which is usually more logical setting for most users.
You can still enable to keep earlier logic if you want, but the default was changed and handling should in general be more robust.

In addition, many cases where empty element (or start+end element with possible whitespace in-between) simply failed with an exception are now covered similar to “empty element” usage (see [dataformat-xml#318]).

Root value deserialization works similar to property values

Although serialization of non-POJO values directly is generally discouraged — instead, recommendation is to always have a POJO as the root value — it is now possible to serialize most scalar values as root values in XML, too.

So now following works

even if such usage is discouraged; types supported includes Enums (dataformat-xml#121) as well as numbers, wrappers, booleans and Strings. Even third-party types work (see [dataformat-xml#380]) as long as deserializers in question are updated to support this (there is a particular fix that scalar deserializers require); if this fails for a 3rd party datatype please file an issue against Github repo for the module that contains deserializer the datatype.

Additional XML Attributes allowed for Scalars

XML content sometimes contains optional metadata as attribute, using specific namespace like:

in this case, for example, information on language of the description is specified as sort of orthogonal aspect; something that may or may not be used by reading application.
If user is not interested in language metadata, they might map it to POJO like:

Prior to 2.12 this caused a failure since 2 properties are found and scalar types like really only expect one “embedded” text value.
Jackson 2.12 handles this case in a way that allows ignoring of “unexpected” attributes, while still supporting ability to alternatively map such attributes if needed (main content would need (or similar ) annotation; other attributes fine as-is).

May write (as well as read)

Jackson XML module already understood incoming to indicate “this is null” concept via special attribute, like so:

but now (see dataformat-xml#360) it is also possible to make Jackson write attribute, usually since other tools might expect it. This is not done by default (for backwards-compatibility), but can be enabled by:

JsonNode/Object[] now support Repeated Elements (as Arrays)

When getting XML content like:

and reading it as (name is due to historical reasons, not really JSON-specific), formerly only the last value for was included.
This is because content would essentially be seen by as equivalent to JSON:

which, if not throwing exception (which depends on whether is configured to detect duplicates), would just replace entry for every time another element is encountered, leaving the last value seen.

But with Jackson 2.12, a new capability — — was defined to indicate that underlying format may have seeming duplicates: and deserializer for will in turn use secondary logic to convert these into s, to effectively make content appear like

which is likely more usable representation (see dataformat-xml#403 for details).

Similar change was also made for “untyped” deserializer (one that matches type and by definition also ) — as per dataformat-xml#205 — so that it would also be possible to use:

This change should make more usable with XML content.
One limitation is that this does NOT change handling for POJO-binding cases (except where nominal type of property is , or ) — this because change is not to actual token stream — so it does not necessarily solve all potential use cases.

NOTE: there is a remaining issue with serialization of (see dataformat-xml#441) which needs to be resolved in the next minor version — it will add unnecessary wrapping for serialization — but until then at least reading should work better.

Some support for “Mixed Content”

One specific type of XML content that is typically not support by data-binding XML libraries, called mixed content, is quite common for textual markup use cases like XHTML:

The challenge here is that whereas most data-oriented XML only has leaf-level Strings (“CDATA”) and all branches are elements, mixed content freely mixes text segments and elements. But since text segments do not have logical property name, they are not easy to represent in a way to make sense for Object bindings.

Because of this difficulty, before 2.12 Jackson simply ignored any text segments that were between start elements (like “Hello, “ in above example) or between end elements (like “!”), only exposing “world” as textual value contained within element (between start and end elements).
So effectively what databind would see was equivalent of:

But with 2.12 the underlying streaming parser was improved (see dataformat-xml#405) to expose “mixed” textual segments with nominal name of empty String; so as a token sequence equivalent to:

And this in turn may be read as or (as per earlier notes on allowing handling of duplicate entries).

This does not necessarily solve the whole “how do I handle mixed content” use case because

  • it is not quite clear how this would map to POJO (no way to specify properties with no name)
  • if binding to , order is not fully retained: content would look like:

Nonetheless since the content is now at least exposed within token stream, custom deserializers can access all content and use it the way it makes sense.
It should also be possible to build further functionality based on ideas submitted as well — feel free to file an issue with suggestions for improvements!

Miscellaneous other fixes

Aside from bigger changes, there are other notable particular fixes:

Future Work

While a lot of progress was made with 2.12, there remain many challenges regarding handling of XML content.
Aside from gaps mentioned already we have challenges like:

  1. “Attribute-ness” is not preserved by or buffering (`TokenBuffer`) — ideally it should ([dataformat-xml#217])
  2. Property names may collide in cases where they should not: either due to namespace being ignored (i.e. local name collision occurs even if namespaces differ), or due to list-wrapper name not being used to avoid collisions, or even element-vs-attribute distinction
  3. Custom escaping of output not yet supported (see [dataformat-xml#75])
  4. Polymorphic type id can not be used as property name (“flattening”, see [dataformat-xml#197])
  5. Customizing output of document aspects: XML Schema attribute ([dataformat-xml#90]), DOCTYPE declaration ([dataformat-xml#150]), custom namespace prefixes ([dataformat-xml#207]), encoding other than “UTF-8” in XML declaration ([dataformat-xml#315]), use of for output ([dataformat-xml#324])
  6. Handling of Maps, and especially Map keys, is… challenging, wrt XML name validity rules ([dataformat-xml#244])

So we’ll see what can be tackled with 2.13 and later.

Written by

Open Source developer, most known for Jackson data processor (nee “JSON library”), author of many, many other OSS libraries for Java, from ClassMate to Woodstox

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store