Another way to configure: profiles
As mentioned earlier, the standard Stax way of configuring anything is through factories, using
setProperty(name, value) method. This applies to Stax2 as well.
But there is also another mechanism for applying “profiles”: group of settings aimed at setting configuration defaults meant to optimize specific aspect. These methods are named as
configureFor[Goal], for example “configureForSpeed”.
XMLInputFactory2 has following profile-configuration methods:
configureForConvenience: enable features that should simplify handling: enable coalescing, report all text segments as
configureForLowMemUsage: try to reduce amount of memory retained during processing by: disabling coalescing (allows parser to report smaller segments), disable
configureForRoundTripping: try preserving event information as much as possible such that direct writes would not alter physical aspects of XML — disable coalescing, preserve distinction between
CDATA, disable automatic entity expansion (so entities may be written out)
configureForSpeed: try minimizing performance overhead of options: disable coalescing, disable
intern()ing of both element/attribute names and namespace URIs
configureForXmlConformance: enable features required to conform to XML 1.x specification — namespaces, DTD processing
XMLOutputFactory2 has following profile-configuration methods:
configureForRobustness: enable both validation and repairing options to try to ensure that output is valid, even if changes are needed (for example, in rare cases comment contents may need to be split, if caller tries to output sequence of two hyphens; or, for CDATA, two
configureForXmlConformance: enable all validation options to try to prevent any potential well-formedness problems (f.ex wrt namespace bindings) — but not all repairing options
configureForSpeed: optimizes for output performance: will disable validation operations that require scanning over contents; in a way opposite of conformance/robustness profiles.
Stax2 configuration properties
Use of profiles sets values for multiple properties (sometimes both plain Stax and Stax2 properties). But it is always possible to also set individual properties directly. Let’s have a look at what Stax2-extension properties exist and are supported by Woodstox. Note: most are
Boolean valued: I only mention type if it is something other than Boolean.
XMLInputFactory2 specifies following Stax2 properties (along with default values Woodstox uses):
false): if enabled,
XMLStreamReaderwill automatically close underlying input source when reader is closed; if disabled will not do so. Stax 1.0 specification mandates that the default behavior is “disabled”, often leading to unintended “dangling” input streams.
null, value type
DTDValidationSchema): property that may be set if specific DTD instance is to be used instead of what
DOCTYPEdeclaration specifies (if anything).
DTDValidationSchemais worth its own article, but basically entry point is `XMLValidationSchemaFactory.newInstance(XMLValidationSchema.SCHEMA_ID_DTD))`
true): Whether element and attribute names (“local name” part) returned will be
String.intern()‘ed first or not — usually doing so saves memory and helps speed, but occasionally it may be necessary to disable this feature if number of distinct names is unbounded: for example, if names are randomly generated (like UUIDs)
true): similar to above, but applies to namespace URIs.
true): Controls whether parsing is “lazy” or “eager”: “eager” meaning that each event is completely parsed when
XMLStreamReader.next()is called; “lazy” that only small part is parsed at that point, and rest is only parsed if and as needed. Benefits of lazy parsing included much faster skipping of unneeded content (esp. textual content, comments and processing instructions); possible downside is that sometimes error reporting may occur later than expected (during actual content access or skipping, that is, when calling
next()for following event).
true): Controls whether
XMLStreamLocationinformation is included in
XMLEventinstances or not. Disabling this feature reduces memory usage and improves processing speed modestly, but only when using “Event API” (
true): Whether XML
CDATAsections are reported as
CDATAStax event (
true) or as general
false): When disabled (`false`), white-space outside XML root element is skipped and not reported; only possible
PROCESSING_INSTRUCTIONs are reported. But if enabled, additional
SPACEevents are reported — this is mostly (only) useful if trying to fully replicate document indentation outside of root element
XMLOutputFactory2 specifies following Stax2 properties:
null, value type
EscapingWriterFactory): By default, default escaping rules for attribute values: minimal escaping is used. It is possibly to fully customize escaping details, however. Value to assign has to be of type
EscapingWriterFactorywhich contains 2 methods for constructing
Writerused for output. Typically used to extend set of characters that are to be escaped, although may be used for advanced usage such as filtering or even replacing specific contents of attribute values — for example, could be used to obfuscate certain types of ids (credit-card numbers, SSN).
null, value type
EscapingWriterFactory): similar to
P_ATTR_VALUE_ESCAPERbut used for textual segments (“character data”, NOT included
CDATAsegments as they do not allow escaping). Similarly used either for changing escaping details, or for more advanced filtering/modifying textual content to output.
false): similar to
P_AUTO_CLOSE_INPUT, determines whether underlying
Writeris automatically closed when
XMLStreamWriteris closed — default is
falsedue to Stax 1.0 specification mandating this behavior.
true): When a sequence of
END_ELEMENTis output — with possible attributes in-between, but no child elements or textual content, it is possible to output either so-called empty element (like
<element />) or fully-written out pair (
<element></element). If set to
true, empty element is written; if
false, separate start/end tags are written.
"wstxns"): When using “repairing: writer mode in which namespace URIs are automatically bound, namespace prefixes are generated using this String as the beginning, followed by a sequence number to keep prefixes unique.
And last but not least, Woodstox-specific properties
Now that we have covered 2 out of 3 properties sets, we are almost ready to have a look at the largest set of properties: ones specific (for now) to Woodstox itself. But that’s worth its separate entry. Stay tuned!