Super-Fast “Packed Base64” Vectors

aka On benefits of using alternate Vector representation

@cowtowncoder
5 min readOct 25, 2024

After verifying performance improvements from Jackson 2.18 optimized floating-point number reading/writing (“Jackson 2.18: fast Vector reads/writes!”), there was one interesting tangent I wanted to check out: experimenting with an alternate JSON presentation for Vectors, for possible speedup.

Basic Idea: Vector data as packed “floats”, Base64 encoded

So. Instead of passing Vectors as regular JSON Arrays like:

{
"embeddings": [ 0.001203, -0.000093, 0.38000104, 0.119302 ]
}

how about using simple transformation of:

  1. Convert (“pack”) 32-bit IEEE-754 floats into matching 4-byte Network-order (Big-Endian) byte sequences — so vector of float[1024] becomes byte[4096]
  2. Encode resulting byte[] as Base64

which gets us something like:

{
"embeddings": "TWFueSBoYW5kcyBtYWtlIG=="
}

which can then be sent over as a Vector value of request and then decoded symmetrically (Base64-decode, “unpack” back into array of floats) by the receiver for use.

The resulting encoded value would typically be slightly more compact (each float value being expressed using 5.25 Base64 coding characters on average; no need for separating commas) but, more importantly, could be somewhat faster to process if transformations needed (mostly Base64 encoding) are faster than binary-FP-to/from-text transformation (which is my hypothesis). This expecation of speed up is based on earlier measurements (like “Jackson 2.15: yet faster floating-point reads”, see notes on Smile/CBOR speedup) which showed that binary formats — which send “packed” floats — have significant performance advantage over JSON wrt floating-point number handling.

But is this true when we also need to use Base64 encoding? Let’s find out!

Test Setup: misc-micro-benchmarks with Jackson (de)serializer

So, I modified misc-micro-bencmarks test VectorHeavyReadWrite by adding new test cases — after implementing basic Base64FloatVectorSerializer and Base64FloatVectorDeserializer registered to handle reading and writing of float[] values — that way, the only thing varying is JsonMapper tests use.

With that it was just a matter of running ./run-single.sh VectorHeavy and — voila! — we get results. And what a nice little results they are… :)

But before showing the results, let me state 2 things:

  1. Numbers are WAY different than I expected, so
  2. while I have added some tests to try to verify serializer/deserializer work as expected, there is always a chance something in tests might be wrong and skew the numbers.

With that caveat… Behold, The Numbers!

Unexpectedly Great Speedup By Using Packed+Base64-encoded Vectors

So, we got these numbers:

Benchmark                                  Mode  Cnt    Score   Error  Units
VectorHeavyReadWrite.base64Read thrpt 9 73.306 ± 2.995 ops/s
VectorHeavyReadWrite.base64Write thrpt 9 119.410 ± 2.607 ops/s
VectorHeavyReadWrite.base64WriteAndRead thrpt 9 44.490 ± 2.045 ops/s
VectorHeavyReadWrite.defaultRead thrpt 9 6.237 ± 0.152 ops/s
VectorHeavyReadWrite.defaultWrite thrpt 9 6.275 ± 0.194 ops/s
VectorHeavyReadWrite.defaultWriteAndRead thrpt 9 3.084 ± 0.025 ops/s
VectorHeavyReadWrite.fastFPRead thrpt 9 12.516 ± 0.306 ops/s
VectorHeavyReadWrite.fastFPWrite thrpt 9 12.631 ± 0.804 ops/s
VectorHeavyReadWrite.fastFPWriteAndRead thrpt 9 6.329 ± 0.139 ops/s

where “default” means regular float[] without read/write optimizations; “fastFP” means regular float[] with read/write optimizations; and “base64” refers to “Pack+Base64-encode” alternative.

What we see is quite… spectacular:

  1. Read performance of Pack/Base64 variant is 6 times (!) that of optimized reads (and 12 times of non-optimized)
  2. Write performance is 10 times (!!) that of optimized writes (and 20 times that of non-optimized)
  3. Combined (write-then-read) performance averages to 7 times that of optimized

And accompanying serialization size change was:

Input length (array):  12865881
Input length (base64): 5605733

So Pack/Base64 compressed output size by ~ 57% as well.

This is — to put it mildly — unusually (and perhaps hard-to-believe level) big improvement.
Hence I am bit suspicious of it all. But until I can find alternative explanations, I consider it possible that for round-trip (write-send-read), use of “Base64-encoded packed floats” approach yields 7 times throughput of best comparable “plain array” approach.

Why so much faster?

So; the reason why I think this speedup might be real is due to 2 factors:

  1. Size reduction (over 50%) itself tends to speed things up linearly — this would give factor of 2x on its own (in general)
  2. Conversion from textual decimal floating-point numbers to and from binary floating-point values (especially if compliant with IEEE 754 spec) is incredibly complicated process — and hence very slow. Converting between int32 representation of float (obtained with Float.floatToIntBits) is almost free in comparison.

which, combined, could be enough to explain speedup.

Show me the code!

Ok, so, code for reading/writing is included in https://github.com/cowtowncoder/misc-micro-benchmarks but here’s inline version to show how simple things are (plus maybe someone can spot a problem that I missed? :) ):

public class Base64FloatVectorSerializer extends StdScalarSerializer<float[]>
{
public void serialize(float[] value, JsonGenerator gen, SerializerProvider provider) throws IOException
{
// First: "pack" the floats into bytes
final int vectorLen = value.length;
final byte[] b = new byte[vectorLen << 2];
for (int i = 0, out = 0; i < vectorLen; i++) {
final int floatBits = Float.floatToIntBits(value[i]);
b[out++] = (byte) (floatBits >> 24);
b[out++] = (byte) (floatBits >> 16);
b[out++] = (byte) (floatBits >> 8);
b[out++] = (byte) (floatBits);
}
// Second: write packed bytes (for JSON, Base64 encoded)
gen.writeBinary(b);
}
}
public class Base64FloatVectorDeserializer extends StdScalarDeserializer<float[]>
{
@Override
public float[] deserialize(JsonParser p, DeserializationContext ctxt) throws IOException {
final JsonToken t = p.currentToken();
if (t == JsonToken.VALUE_EMBEDDED_OBJECT) {
Object emb = p.getEmbeddedObject();
if (emb instanceof byte[]) {
return unpack(ctxt, (byte[]) emb);
} else if (emb instanceof float[]) {
return (float[]) emb;
}
} else if (t == JsonToken.VALUE_STRING) {
return unpack(ctxt, p.getBinaryValue());
}
return (float[]) ctxt.handleUnexpectedToken(_valueClass, p);
}

private final float[] unpack(DeserializationContext ctxt, byte[] bytes) throws IOException {
final int bytesLen = bytes.length;
if ((bytesLen & 3) != 0) {
return (float[]) ctxt.reportInputMismatch(_valueClass,
"Vector length (%d) not a multiple of 4 bytes", bytesLen);
}
final int vectorLen = bytesLen >> 2;
final float[] floats = new float[vectorLen];
for (int in = 0, out = 0; in < bytesLen; ) {
int packed = (bytes[in++] << 24)
| ((bytes[in++] & 0xFF) << 16)
| ((bytes[in++] & 0xFF) << 8)
| (bytes[in++] & 0xFF);
floats[out++] = Float.intBitsToFloat(packed);
}
return floats;
}
}

and these are register with JsonMapper.addModule() (see repo for details if interested).

What Next? Add support in Jackson 2.19?

Originally I was just curious about benefits, if any, without plans to necessarily do anything wrt OSS projects.

But if the speedup holds, this might well be something to explicitly support in Jackson (jackson-databind). To add support, I think allowing Deserializer to support alternate format (for float[] and probably for double[] as well) could be added by just improving existing deserializer to add handling as shown above — so both “old” default format (JSON Array) and the new Pack+Base64 format would be supported.

Adding support for serialization would be slightly more complicated as there has to be a way to indicate which representation to use.
This could be done with help of @JsonFormat annotation, like so:

public class Request {
public int id;

// uses "new" Pack+Base64 representation
@JsonFormat(shape = Shape.BINARY) // or Shape.STRING
public float[] embedding
}

support which would also allow “Config Override”s to configure setting globally (use for every instance).

--

--

@cowtowncoder
@cowtowncoder

Written by @cowtowncoder

Open Source developer, most known for Jackson data processor (nee “JSON library”), author of many, many other OSS libraries for Java, from ClassMate to Woodstox