Super-Fast “Packed Base64” Vectors
aka On benefits of using alternate Vector representation
After verifying performance improvements from Jackson 2.18 optimized floating-point number reading/writing (“Jackson 2.18: fast Vector reads/writes!”), there was one interesting tangent I wanted to check out: experimenting with an alternate JSON presentation for Vectors, for possible speedup.
Basic Idea: Vector data as packed “floats”, Base64 encoded
So. Instead of passing Vectors as regular JSON Arrays like:
{
"embeddings": [ 0.001203, -0.000093, 0.38000104, 0.119302 ]
}
how about using simple transformation of:
- Convert (“pack”) 32-bit IEEE-754
float
s into matching 4-byte Network-order (Big-Endian) byte sequences — so vector offloat[1024]
becomesbyte[4096]
- Encode resulting
byte[]
as Base64
which gets us something like:
{
"embeddings": "TWFueSBoYW5kcyBtYWtlIG=="
}
which can then be sent over as a Vector value of request and then decoded symmetrically (Base64-decode, “unpack” back into array of float
s) by the receiver for use.
The resulting encoded value would typically be slightly more compact (each float
value being expressed using 5.25 Base64 coding characters on average; no need for separating commas) but, more importantly, could be somewhat faster to process if transformations needed (mostly Base64 encoding) are faster than binary-FP-to/from-text transformation (which is my hypothesis). This expecation of speed up is based on earlier measurements (like “Jackson 2.15: yet faster floating-point reads”, see notes on Smile/CBOR speedup) which showed that binary formats — which send “packed” float
s — have significant performance advantage over JSON wrt floating-point number handling.
But is this true when we also need to use Base64 encoding? Let’s find out!
Test Setup: misc-micro-benchmarks
with Jackson (de)serializer
So, I modified misc-micro-bencmarks test VectorHeavyReadWrite
by adding new test cases — after implementing basic Base64FloatVectorSerializer
and Base64FloatVectorDeserializer
registered to handle reading and writing of float[]
values — that way, the only thing varying is JsonMapper
tests use.
With that it was just a matter of running ./run-single.sh VectorHeavy
and — voila! — we get results. And what a nice little results they are… :)
But before showing the results, let me state 2 things:
- Numbers are WAY different than I expected, so
- while I have added some tests to try to verify serializer/deserializer work as expected, there is always a chance something in tests might be wrong and skew the numbers.
With that caveat… Behold, The Numbers!
Unexpectedly Great Speedup By Using Packed+Base64-encoded Vectors
So, we got these numbers:
Benchmark Mode Cnt Score Error Units
VectorHeavyReadWrite.base64Read thrpt 9 73.306 ± 2.995 ops/s
VectorHeavyReadWrite.base64Write thrpt 9 119.410 ± 2.607 ops/s
VectorHeavyReadWrite.base64WriteAndRead thrpt 9 44.490 ± 2.045 ops/s
VectorHeavyReadWrite.defaultRead thrpt 9 6.237 ± 0.152 ops/s
VectorHeavyReadWrite.defaultWrite thrpt 9 6.275 ± 0.194 ops/s
VectorHeavyReadWrite.defaultWriteAndRead thrpt 9 3.084 ± 0.025 ops/s
VectorHeavyReadWrite.fastFPRead thrpt 9 12.516 ± 0.306 ops/s
VectorHeavyReadWrite.fastFPWrite thrpt 9 12.631 ± 0.804 ops/s
VectorHeavyReadWrite.fastFPWriteAndRead thrpt 9 6.329 ± 0.139 ops/s
where “default” means regular float[]
without read/write optimizations; “fastFP” means regular float[]
with read/write optimizations; and “base64” refers to “Pack+Base64-encode” alternative.
What we see is quite… spectacular:
- Read performance of Pack/Base64 variant is 6 times (!) that of optimized reads (and 12 times of non-optimized)
- Write performance is 10 times (!!) that of optimized writes (and 20 times that of non-optimized)
- Combined (write-then-read) performance averages to 7 times that of optimized
And accompanying serialization size change was:
Input length (array): 12865881
Input length (base64): 5605733
So Pack/Base64 compressed output size by ~ 57% as well.
This is — to put it mildly — unusually (and perhaps hard-to-believe level) big improvement.
Hence I am bit suspicious of it all. But until I can find alternative explanations, I consider it possible that for round-trip (write-send-read), use of “Base64-encoded packed floats” approach yields 7 times throughput of best comparable “plain array” approach.
Why so much faster?
So; the reason why I think this speedup might be real is due to 2 factors:
- Size reduction (over 50%) itself tends to speed things up linearly — this would give factor of 2x on its own (in general)
- Conversion from textual decimal floating-point numbers to and from binary floating-point values (especially if compliant with IEEE 754 spec) is incredibly complicated process — and hence very slow. Converting between
int32
representation offloat
(obtained withFloat.floatToIntBits
) is almost free in comparison.
which, combined, could be enough to explain speedup.
Show me the code!
Ok, so, code for reading/writing is included in https://github.com/cowtowncoder/misc-micro-benchmarks but here’s inline version to show how simple things are (plus maybe someone can spot a problem that I missed? :) ):
public class Base64FloatVectorSerializer extends StdScalarSerializer<float[]>
{
public void serialize(float[] value, JsonGenerator gen, SerializerProvider provider) throws IOException
{
// First: "pack" the floats into bytes
final int vectorLen = value.length;
final byte[] b = new byte[vectorLen << 2];
for (int i = 0, out = 0; i < vectorLen; i++) {
final int floatBits = Float.floatToIntBits(value[i]);
b[out++] = (byte) (floatBits >> 24);
b[out++] = (byte) (floatBits >> 16);
b[out++] = (byte) (floatBits >> 8);
b[out++] = (byte) (floatBits);
}
// Second: write packed bytes (for JSON, Base64 encoded)
gen.writeBinary(b);
}
}
public class Base64FloatVectorDeserializer extends StdScalarDeserializer<float[]>
{
@Override
public float[] deserialize(JsonParser p, DeserializationContext ctxt) throws IOException {
final JsonToken t = p.currentToken();
if (t == JsonToken.VALUE_EMBEDDED_OBJECT) {
Object emb = p.getEmbeddedObject();
if (emb instanceof byte[]) {
return unpack(ctxt, (byte[]) emb);
} else if (emb instanceof float[]) {
return (float[]) emb;
}
} else if (t == JsonToken.VALUE_STRING) {
return unpack(ctxt, p.getBinaryValue());
}
return (float[]) ctxt.handleUnexpectedToken(_valueClass, p);
}
private final float[] unpack(DeserializationContext ctxt, byte[] bytes) throws IOException {
final int bytesLen = bytes.length;
if ((bytesLen & 3) != 0) {
return (float[]) ctxt.reportInputMismatch(_valueClass,
"Vector length (%d) not a multiple of 4 bytes", bytesLen);
}
final int vectorLen = bytesLen >> 2;
final float[] floats = new float[vectorLen];
for (int in = 0, out = 0; in < bytesLen; ) {
int packed = (bytes[in++] << 24)
| ((bytes[in++] & 0xFF) << 16)
| ((bytes[in++] & 0xFF) << 8)
| (bytes[in++] & 0xFF);
floats[out++] = Float.intBitsToFloat(packed);
}
return floats;
}
}
and these are register with JsonMapper.addModule()
(see repo for details if interested).
What Next? Add support in Jackson 2.19?
Originally I was just curious about benefits, if any, without plans to necessarily do anything wrt OSS projects.
But if the speedup holds, this might well be something to explicitly support in Jackson (jackson-databind). To add support, I think allowing Deserializer to support alternate format (for float[]
and probably for double[]
as well) could be added by just improving existing deserializer to add handling as shown above — so both “old” default format (JSON Array) and the new Pack+Base64 format would be supported.
Adding support for serialization would be slightly more complicated as there has to be a way to indicate which representation to use.
This could be done with help of @JsonFormat
annotation, like so:
public class Request {
public int id;
// uses "new" Pack+Base64 representation
@JsonFormat(shape = Shape.BINARY) // or Shape.STRING
public float[] embedding
}
support which would also allow “Config Override”s to configure setting globally (use for every instance).