Measuring performance of Java String.format() (or lack thereof)

aka “Why is String.format() SO slow?”

6 min readAug 30, 2021

Background

In the past I have found that the functionality in the JDK is usually well optimized, and performs reasonably well for the task, considering general nature and applicability of the functionality offered. So as general guideline I tend to trust that if the JDK has a method for doing something in a simple way it is perfectly fine for most use and likely performs well enough.

But I have also found that occasionally some specific classes or methods may perform surprisingly poorly. Knowing these outliers can be useful when dealing with performance-sensitive code (like tight loops for something called thousands of times per second or more etc.).

This is post about one such case: the surprisingly poor performance of String.format() for simple String concatenation

More Specifically…

The specific case that caught my eye — and that represents a wider class of similar use cases — is that of String concatenation for creating Strings like compound keys. For example:

String key = String.format("%s.%s", keyspace, tableName);

This is functionally equivalent to:

String key = keyspace + "." + tableName; // or
key = new StringBuilder().append(keyspace)
  .append('.').append(tableName).toString();

and one might expect these approaches to perform similarly, given that from the user perspective they perform the same task.

But Should String.format() Have Low Overhead?

One thing to note, however, is that while usage here is for the same task, underlying implementation of String.format() is essentially much, much more complicated — it is a quite powerful tool for all types of String formatting. We just happen use this power tool for a low-power use case.

The important part to note is that decoding of the format String to use is overhead that either occurs for each call, or, JDK has to use some elaborate caching for it. And unlike with something like Pattern.compile()
for regular expressions, there is no way to pre-process that.
Similarly it may be necessary to build secondary data structures for handling details of formatting, at least or some cases — and even if not for our case, the support and design has to allow for that.

Given this, it is reasonable to suspect that String.format() may have some overhead for our simple concatenation use case.

But does it? (did JDK folks pull another impressive feat here? :) ) And if so, how much?

So Let’s Ask Our Friend JMH To Have a Look

Similar to my earlier post ‘Measuring “String.indexOfAny(String)” performance’, I added a new JMH test class — StringConcatenation — on https://github.com/cowtowncoder/misc-micro-benchmarks repository.
You can run the test cases with:

java  -jar target/microbenchmarks.jar StringConcatenation

and you would get results that look something like what I see on my desktop (Mac Mini (2018) 3.2Ghz 6-core Intel Core i7):

m1_StringFormat           thrpt   15    61337.088 ±   654.370  ops/s
m2_StringBuilder          thrpt   15  2683849.107 ± 22092.481  ops/s
m3_StringBuilderPrealloc  thrpt   15  2654994.965 ± 36881.162  ops/s
m4_ManualConcatenation    thrpt   15  2700825.252 ± 27906.924  ops/s

Your numbers will obviously vary a bit (and I have omitted piece of identifiers to fit the output nicely).

Test cases

Before considering the numbers, let’s first introduce the test cases:

m1_StringFormat: this uses String.format("%s.%s", first, second);
m2_StringBuilder: equivalent concatenation using StringBuilder as shown earlier
m3_StringBuilderPrealloc: sames as m2 but calculates optimal initial size of StringBuilder to avoid need to re-allocate its buffers. This is an attempt to further optimize m2 case
m4_ManualConcatenation: use of + operator: String str = first+"."+second; — which should internally become same or similar to m2 case

For all test cases we use a permutation of 4 by 4 Strings (16), switching first/second once, that is, calling functionality 32 times per loop. Test Strings are somewhat trivial and may not be representative: we are not trying to emulate a specific use case here (but you should if you have useful data!) but hope this gives us at least a baseline for comparison.

So Let’s Unpack The Results

So, let’s have a look at numbers we saw earlier. Looks like cases m2, m3 and m4 are about equivalent: giving us about 2.5 million iterations per second. With 32 concatenations that would to about 80 million concatenations per second (as well as other overhead for things like Garbage Collection). Not bad.

But the first case — that of using String.format() — fares much much worse, giving us 62,000 iterations per second. While that is still almost 2 million concatenations per second (which is plenty for most uses), it is almost TWO orders of magnitude slower than use of StringBuilder directly or indirectly.
Put another way, using StringBuilder here is over 40 TIMES faster than String.format().
This is probably quite a bit bigger difference than most Java developers would have guessed.

Of the first three cases about the only interesting thing is that m3 (pre-allocating StringBuilder) is no faster than just using the default constructor (numbers for m2 and m3 overlap considering the variation so it is unclear if one of them is faster; essentially their performances are all but identical).
This may be due to our use of relatively short input Strings, wherein the default size (16) works well enough. If input Strings were longer results might be better for pre-allocation version — but probably not by a whole lot.

But What Does String.format() Do? (deeper dig)

Ok so it looks like String.format() takes its time in this case. But can we figure out what might be happening under the hood?

This is simple enough use case where JVM async-profiler tool’s CPU profiling might give some clues.

First of all, I will locally modify annotation of StringConcatenation.java test

// @Measurement(iterations = 5, time = 1)
@Measurement(iterations = 3600, time = 1)

to make jmh tests in the class run effectively indefinitely (nominally for 1 hour, instead of 5 seconds, per fork).
Then I will start the “indefinite” test run with:

java  -jar target/microbenchmarks.jar StringConcatenation.m1

(where that m1 matches just the first test, uniquely identifies it)

And when running, I look at process id (using top) of the test run, to start actual profiling for 30 seconds:

~/bin/async-profiler -e cpu -d 30 -f ~/profile-string-format.txt 67640

(here he indicate we wanted to use cpu profiling, run it for 30 seconds; write results in the specified file [as text; there are other formats like json too]; and profile java process with id of 67640 [you will need to use process id you have running)

After 30 seconds, we get an output file (mine has 2592 lines); at the end there is a summary like so:

ns  percent  samples  top
  ----------  -------  -------  ---
  3100000000   10.97%      310  java.util.regex.Pattern$Start.match
  2970000000   10.51%      297  java.util.regex.Pattern$GroupHead.match
  2370000000    8.39%      237  java.util.Formatter.format
  2110000000    7.47%      211  java.lang.AbstractStringBuilder.ensureCapacityInternal
  1590000000    5.63%      159  jshort_disjoint_arraycopy
  1420000000    5.02%      142  java.util.Formatter$FormatSpecifier.index
  1340000000    4.74%      134  java.util.Formatter.parse
  1270000000    4.49%      127  arrayof_jint_fill
  1240000000    4.39%      124  java.util.regex.Pattern$BmpCharProperty.match
  1100000000    3.89%      110  java.util.regex.Pattern.matcher
   990000000    3.50%       99  java.util.Formatter$FormatSpecifier.width
   980000000    3.47%       98  java.util.regex.Pattern$Branch.match
   8
...

Looking at the top 2 entries we that the Regular Expression functionality is being used to decode the format String %s.%s (the preparation phase I talked about). Third entry is a bit more difficult to reason about but there are a few other Pattern related methods which are probably similarly used for format String processing, to add up to maybe 40% of time that profiler shows.

There are a few other interesting things — internal StringBuilder reallocation seems to take quite a bit of time too, for some reason? — but this suggests that the biggest overhead would indeed be the Format string processing.

Too bad there does not seem to be a way to pre-process java.util.Formatter (class that actually implements formatting functionality: all String.format() does is new Formatter().format(...).toString()) instance so that we could avoid repeatedly re-creating internal data structures. If this was doable we would probably still have significant difference, but not a factor of 40.

So Does This Matter?

As usual, whether this performance difference really matters to your case…. depends. :)

As to my usage patterns, I do still use String.format() for things like:

Are only called once during initialization
Are used for Exception processing (which has inherent overhead to being with)
Are otherwise not often called

but avoid using it on perceived critical paths, including tight loops.