Measuring performance of Java String.format() (or lack thereof)
aka “Why is String.format() SO slow?”
In the past I have found that the functionality in the JDK is usually well optimized, and performs reasonably well for the task, considering general nature and applicability of the functionality offered. So as general guideline I tend to trust that if the JDK has a method for doing something in a simple way it is perfectly fine for most use and likely performs well enough.
But I have also found that occasionally some specific classes or methods may perform surprisingly poorly. Knowing these outliers can be useful when dealing with performance-sensitive code (like tight loops for something called thousands of times per second or more etc.).
This is post about one such case: the surprisingly poor performance of
String.format() for simple String concatenation
The specific case that caught my eye — and that represents a wider class of similar use cases — is that of String concatenation for creating Strings like compound keys. For example:
String key = String.format("%s.%s", keyspace, tableName);
This is functionally equivalent to:
String key = keyspace + "." + tableName; // or
key = new StringBuilder().append(keyspace)
and one might expect these approaches to perform similarly, given that from the user perspective they perform the same task.
But Should String.format() Have Low Overhead?
One thing to note, however, is that while usage here is for the same task, underlying implementation of
String.format() is essentially much, much more complicated — it is a quite powerful tool for all types of String formatting. We just happen use this power tool for a low-power use case.
The important part to note is that decoding of the format String to use is overhead that either occurs for each call, or, JDK has to use some elaborate caching for it. And unlike with something like
for regular expressions, there is no way to pre-process that.
Similarly it may be necessary to build secondary data structures for handling details of formatting, at least or some cases — and even if not for our case, the support and design has to allow for that.
Given this, it is reasonable to suspect that
String.format() may have some overhead for our simple concatenation use case.
But does it? (did JDK folks pull another impressive feat here? :) ) And if so, how much?
So Let’s Ask Our Friend JMH To Have a Look
Similar to my earlier post ‘Measuring “String.indexOfAny(String)” performance’, I added a new JMH test class —
StringConcatenation — on https://github.com/cowtowncoder/misc-micro-benchmarks repository.
You can run the test cases with:
java -jar target/microbenchmarks.jar StringConcatenation
and you would get results that look something like what I see on my desktop (Mac Mini (2018) 3.2Ghz 6-core Intel Core i7):
m1_StringFormat thrpt 15 61337.088 ± 654.370 ops/s
m2_StringBuilder thrpt 15 2683849.107 ± 22092.481 ops/s
m3_StringBuilderPrealloc thrpt 15 2654994.965 ± 36881.162 ops/s
m4_ManualConcatenation thrpt 15 2700825.252 ± 27906.924 ops/s
Your numbers will obviously vary a bit (and I have omitted piece of identifiers to fit the output nicely).
Before considering the numbers, let’s first introduce the test cases:
- m1_StringFormat: this uses
String.format("%s.%s", first, second);
- m2_StringBuilder: equivalent concatenation using
StringBuilderas shown earlier
- m3_StringBuilderPrealloc: sames as m2 but calculates optimal initial size of
StringBuilderto avoid need to re-allocate its buffers. This is an attempt to further optimize m2 case
- m4_ManualConcatenation: use of
String str = first+"."+second;— which should internally become same or similar to m2 case
For all test cases we use a permutation of 4 by 4 Strings (16), switching first/second once, that is, calling functionality 32 times per loop. Test Strings are somewhat trivial and may not be representative: we are not trying to emulate a specific use case here (but you should if you have useful data!) but hope this gives us at least a baseline for comparison.
So Let’s Unpack The Results
So, let’s have a look at numbers we saw earlier. Looks like cases m2, m3 and m4 are about equivalent: giving us about 2.5 million iterations per second. With 32 concatenations that would to about 80 million concatenations per second (as well as other overhead for things like Garbage Collection). Not bad.
But the first case — that of using
String.format() — fares much much worse, giving us 62,000 iterations per second. While that is still almost 2 million concatenations per second (which is plenty for most uses), it is almost TWO orders of magnitude slower than use of
StringBuilder directly or indirectly.
Put another way, using
StringBuilder here is over 40 TIMES faster than
This is probably quite a bit bigger difference than most Java developers would have guessed.
Of the first three cases about the only interesting thing is that
StringBuilder) is no faster than just using the default constructor (numbers for m2 and m3 overlap considering the variation so it is unclear if one of them is faster; essentially their performances are all but identical).
This may be due to our use of relatively short input Strings, wherein the default size (16) works well enough. If input Strings were longer results might be better for pre-allocation version — but probably not by a whole lot.
But What Does String.format() Do? (deeper dig)
Ok so it looks like String.format() takes its time in this case. But can we figure out what might be happening under the hood?
This is simple enough use case where JVM async-profiler tool’s CPU profiling might give some clues.
First of all, I will locally modify annotation of
// @Measurement(iterations = 5, time = 1)
@Measurement(iterations = 3600, time = 1)
jmh tests in the class run effectively indefinitely (nominally for 1 hour, instead of 5 seconds, per fork).
Then I will start the “indefinite” test run with:
java -jar target/microbenchmarks.jar StringConcatenation.m1
m1 matches just the first test, uniquely identifies it)
And when running, I look at process id (using
top) of the test run, to start actual profiling for 30 seconds:
~/bin/async-profiler -e cpu -d 30 -f ~/profile-string-format.txt 67640
(here he indicate we wanted to use cpu profiling, run it for 30 seconds; write results in the specified file [as text; there are other formats like json too]; and profile java process with id of
67640 [you will need to use process id you have running)
After 30 seconds, we get an output file (mine has 2592 lines); at the end there is a summary like so:
ns percent samples top
---------- ------- ------- ---
3100000000 10.97% 310 java.util.regex.Pattern$Start.match
2970000000 10.51% 297 java.util.regex.Pattern$GroupHead.match
2370000000 8.39% 237 java.util.Formatter.format
2110000000 7.47% 211 java.lang.AbstractStringBuilder.ensureCapacityInternal
1590000000 5.63% 159 jshort_disjoint_arraycopy
1420000000 5.02% 142 java.util.Formatter$FormatSpecifier.index
1340000000 4.74% 134 java.util.Formatter.parse
1270000000 4.49% 127 arrayof_jint_fill
1240000000 4.39% 124 java.util.regex.Pattern$BmpCharProperty.match
1100000000 3.89% 110 java.util.regex.Pattern.matcher
990000000 3.50% 99 java.util.Formatter$FormatSpecifier.width
980000000 3.47% 98 java.util.regex.Pattern$Branch.match
Looking at the top 2 entries we that the Regular Expression functionality is being used to decode the format String
%s.%s (the preparation phase I talked about). Third entry is a bit more difficult to reason about but there are a few other
Pattern related methods which are probably similarly used for format String processing, to add up to maybe 40% of time that profiler shows.
There are a few other interesting things — internal
StringBuilder reallocation seems to take quite a bit of time too, for some reason? — but this suggests that the biggest overhead would indeed be the Format string processing.
Too bad there does not seem to be a way to pre-process
java.util.Formatter (class that actually implements formatting functionality: all
String.format() does is
new Formatter().format(...).toString()) instance so that we could avoid repeatedly re-creating internal data structures. If this was doable we would probably still have significant difference, but not a factor of 40.
So Does This Matter?
As usual, whether this performance difference really matters to your case…. depends. :)
As to my usage patterns, I do still use
String.format() for things like:
- Are only called once during initialization
- Are used for Exception processing (which has inherent overhead to being with)
- Are otherwise not often called
but avoid using it on perceived critical paths, including tight loops.