Is Smile format Splittable? Yes

(that is, “Hadoop Splittable”)

What is “Smile format”?

General strong points of Smile format

What does “append content” mean here

But wait! It gets better: Smile data streams are splittable!

Smile settings to ensure splittability

Writing Smile 0xFF frame marker with Jackson

OutputStream out = ...;// Need to configure underlying stream factory:
SmileFactory f = SmileFactory.builder()
.enable(SmileGenerator.Feature.WRITE_END_MARKER)
.enable(SmileGenerator.Feature.ENCODE_BINARY_AS_7BIT)
.disable(StreamWriteFeature.AUTO_CLOSE_TARGET)
.build();
// and then construct mapper with it:
SmileMapper mapper = SmileMapper.builder(f).build();
for (InputValue value : valuesToWrite) {
mapper.writeValue(out, value);
// will create and close generator and write END_MARKER after every value
// There are other ways too by controlling creation of SmileGenerator and batching multiple values
}
OutputStream out = ...;
SmileFactory f = SmileFactory.builder()
.enable(SmileGenerator.Feature.ENCODE_BINARY_AS_7BIT)
// alas! Must avoid back references if we by-pass doc boundary
.disable(SmileGenerator.Feature.CHECK_SHARED_NAMES)
.disable(SmileGenerator.Feature.CHECK_SHARED_STRING_VALUES)
.build();
SmileMapper mapper = SmileMapper.builder(f).build();
try (SmileGenerator g = mapper.createGenerator(out)) {
for (InputValue value : valuesToWrite) {
mapper.writeValue(g, value);
// write after every value, or every Nth; whatever
g.writeByte(SmileConstants.BYTE_MARKER_END_OF_CONTENT); // 0xFF
}

Using this with Hadoop etc

Next time: is LZF compression codec splittable?

Open Source developer, most known for Jackson data processor (nee “JSON library”), author of many, many other OSS libraries for Java, from ClassMate to Woodstox