Jackson 2.18: fast Vector reads/writes!
(performance of FP reads/writes for Number-heavy content)
I have written about improved performance of reading/writing Floating-Point numbers with recent Jackson versions:
but one thing left as “to be investigated” is how much overall improvement to expect for “floating-point heavy” content — examples tried so far have had non-trivial FP number load, but are not dominated by number content.
So while useful, improvements have appeared to be in range of 10–30% due to other reading/writing not being optimized.
But how about the hottest new use case: reading/writing of Vector data — content that is 99% floating-point values. These are typically long JSON Arrays of Numbers expressing 32-bit IEEE-754 float
values, used for example for “Embeddings” (vector-representations of entities like text paragraph, images, audio). These use cases should be more sensitive to raw number encoding/decoding performance.
So let’s have a look…
Finding Data: Hugging Face!
The first thing we need is a Vector-heavy data set. Based on recommendation of a colleague (thank you Jonathan!), I looked into Hugging Face project’s datasets, and liked Cohere-beir embeddings one.
It is encoded as Parquet (convertible easily using f.ex DataConverter.io web UI), and consists of:
- 1000 rows of data where
- each row has a single 1024-element embedding (JSON Array value with 1024 floating-point numbers) vector
totaling to almost 14 megs of JSON data when converted (from ~2 megs of Parquet encoded). So that seems like a good dataset to use for a stress-test.
Micro-benchmark to use/update
I added earlier tests to existing jackson-benchmark Github repo.
But it felt that that repository is getting bit big, so I decided to instead add the new test to another Github repo, cowtowncoder/misc-micro-benchmarks.
Both repos use Awesome micro-benchmark tool, jmh, so testing methodology was not changed.
Testing: read and/or write Vector data with JDK 17
With that, I modified test setup code, utility classes a bit, and then added actual test: com.cowtowncoder.microb.jackson.vectors.VectorHeavyReadWrite
Test covers 3 basic scenarios:
- Read above-mentioned test data into POJO
- Write POJO out (into “no-op”
OutputStream
, that is, not aggregating output) - Read-as-POJO-then-write (i.e. both of above)
where:
- Test input is a single 13.9 meg Vector-heavy JSON file
- Data is read as/written from simple “record-style” (all
public
fields) POJO
and this is done using 2 different JsonMapper
configurations:
- “default” with default settings (no optimized Floating-Point read/write)
- “fastFP” with all 3 relevant settings (
StreamReadFeature.USE_FAST_BIG_NUMBER_PARSER
,StreamReadFeature.USE_FAST_DOUBLE_PARSER
,StreamWriteFeature.USE_FAST_DOUBLE_WRITER
) enabled
The initial test is run with:
./run-single.sh VectorHeavy
using JDK 17.0.13-tem (since it’s closest to version I use for production deployments — but more on this later on).
I also used the latest Jackson release, 2.18.0.
Vector Performance with JDK 17, Jackson 2.18.0
So, with JDK 17 (sdk use java 17.0.13-tem
), Jackson 2.18, we get results (where higher score (throughput) better) like so :
./run-single.sh VectorHeavy
Benchmark Mode Cnt Score Error Units
VectorHeavyReadWrite.defaultRead thrpt 9 6.185 ± 0.079 ops/s
VectorHeavyReadWrite.defaultWrite thrpt 9 6.507 ± 0.197 ops/s
VectorHeavyReadWrite.defaultWriteAndRead thrpt 9 3.201 ± 0.064 ops/s
VectorHeavyReadWrite.fastFPRead thrpt 9 12.872 ± 0.471 ops/s
VectorHeavyReadWrite.fastFPWrite thrpt 9 13.419 ± 0.080 ops/s
VectorHeavyReadWrite.fastFPWriteAndRead thrpt 9 6.270 ± 0.180 ops/s
which can be summarized roughly as: enabling Jackson 2.18 Floating-Point optimizations DOUBLE both read and write speeds for FP heavy use case (!!!)
Nice!
But how about older Jackson versions? Or newer JDKs; specifically JDK 21? Let’s see.
Vector Performance with JDK 17, Jackson 2.15.4
The earliest Jackson version with all 3 optimization settings (but without latest optimizations) is 2.15.
Results with this combination are:
./run-single.sh VectorHeavy
Benchmark Mode Cnt Score Error Units
VectorHeavyReadWrite.defaultRead thrpt 9 5.786 ± 0.081 ops/s
VectorHeavyReadWrite.defaultWrite thrpt 9 6.143 ± 0.110 ops/s
VectorHeavyReadWrite.defaultWriteAndRead thrpt 9 2.942 ± 0.045 ops/s
VectorHeavyReadWrite.fastFPRead thrpt 9 10.218 ± 0.484 ops/s
VectorHeavyReadWrite.fastFPWrite thrpt 9 12.176 ± 0.849 ops/s
VectorHeavyReadWrite.fastFPWriteAndRead thrpt 9 5.366 ± 0.166 ops/s
TL;DNR: Jackson 2.18 has:
- 10% faster default floating-point reads/writes than 2.15
- 20% faster optimized floating-point reads/writes than 2.15
also nice — it is worth upgrading if FP performance matters. But even 2.15 optimization significantly outperform “vanilla JDK” handling.
Vector Performance with JDK 21, Jackson 2.18.0
Ok. So let’s also see if newer JDK might improve things further: specifically, the latest (as of now) Java LTS release, 21.
This is where things get interesting…
With JDK 21.0.5-tem (sdk use java 21.0.5-tem
) we get:
Benchmark Mode Cnt Score Error Units
VectorHeavyReadWrite.defaultRead thrpt 9 5.853 ± 0.081 ops/s
VectorHeavyReadWrite.defaultWrite thrpt 9 14.915 ± 0.251 ops/s
VectorHeavyReadWrite.defaultWriteAndRead thrpt 9 4.171 ± 0.079 ops/s
VectorHeavyReadWrite.fastFPRead thrpt 9 10.458 ± 0.172 ops/s
VectorHeavyReadWrite.fastFPWrite thrpt 9 13.952 ± 0.368 ops/s
VectorHeavyReadWrite.fastFPWriteAndRead thrpt 9 5.931 ± 0.151 ops/s
which is… not what I expected. Basically, while Read throughput is similar to that of JDK 17, default JDK-provided floating-point number writing speed (conversion from float
to String
) has MORE THAN DOUBLED fromJDK 17 to JDK 21?! So much so that JDK-based serialization is slightly faster than one provided by Jackson (using “Schubfach” algorithm from https://github.com/c4f7fcce9cb06515/Schubfach, class FloatToDecimal.java
).
Contrasting to this, reader side by FastDoubleParser seems to have slightly better performance on JDK 21 (but within margin of error).
Cool. What next?
There is one more “trick” to try with Floating-Point encodings — trying to actually avoid the expensive translation between internal binary (2-based) float
and external decimal (10-based) String
— which is possible with 2-phase conversion. But more on this in another blog post, to be Written Really Soon!