Measuring “String.indexOfAny(String)” performance

(Java micro-benchmarking with jmh)

Background

String xmlToCheck = ...;
if (textToCheck.containsAnyOf("&<>'\"")) { // if JDK String had it
// do some escape/quote magic
}

Off to Benchmarking!

java -jar target/microbenchmarks.jar StringContainsChars 

Code as it was in the beginning

List<Character> charsToCheck = Arrays.asList('&', '<', '>', '\'', '"'); 
return charsToCheck.stream().anyMatch(ch -> str.indexOf(ch);

First custom alternative: loop without Stream

// charsToCheck = "&<>'\"";
public boolean containsAnyOf(String inputString, String charsToCheck)
{
for (int i = 0, len = toCheck.length(); i < len; ++i) {
if (inputString.indexOf(toCheck.charAt(i)) >= 0) {
return true;
}
}
return false;
}

Second and third custom alternatives

public boolean containsCharToQuote(String inputString)
{
for (int i = 0, len = str.length(); i < len; ++i) {
switch (str.charAt(i)) {
case '&':
case '<':
case '>':
case '\'':
case '"':
return true;
}
}
return false;
}

Or maybe JDK regexps could do it faster?

StringBuilder sb = new StringBuilder().append("[");
for (char c : new char[] { '&', '<', '>', '\'', '"' }) {
sb.append(Pattern.quote(String.valueOf(c)));
}
Pattern checker = Pattern.compile(sb.append("]").toString());
// used like:
if (checker.matcher(stringToCheck).find()) {
...
}

But isn’t this already written by someone?

  • Guava has CharMatcher.anyOf(String)
  • Apache Commons-lang has StringUtils.containsAny(str, chars) (similar to our utility method)
  • Guava CharMatcher only does 1.8M/s (similar to our initial implementation)
  • Commons-lang does much better with 5.3M/s

One final manual approach


final long mask = calculateMask(); // omitted here, check full test
final String stringToCheck = ....;
// loop over input String, check bitmask to see if we have match
for (int i = 0, len = str.length(); i < len; ++i) {
final int ch = str.charAt(i);
int offset = ch - 32;
if ((offset < 64) && (offset >= 0)
&& (CHECKED_CHARS_MASK & (1L << offset)) != 0) {
return true;
}
}
return false;

Results table

Benchmark                                              Mode  Cnt        Score        Error  Units
StringContainsChars.method1_streamWithIndexOf thrpt 15 1451222.313 ± 6683.463 ops/s
StringContainsChars.method2a_stringIndexOfTimesN thrpt 15 3485362.606 ± 147550.312 ops/s
StringContainsChars.method2b_scanStringAndIndexOf thrpt 15 6007080.938 ± 24865.942 ops/s
StringContainsChars.method2c_scanWithSwitch thrpt 15 9464115.514 ± 377737.768 ops/s
StringContainsChars.method3a_guavaBasedCheck thrpt 15 1840523.997 ± 12226.246 ops/s
StringContainsChars.method3b_commonsLang3ContainsAny thrpt 15 5335174.154 ± 19938.776 ops/s
StringContainsChars.method4a_jdkRegExpBasedCheck thrpt 15 1838346.183 ± 6484.875 ops/s
StringContainsChars.method4b_bitsetBasedCheck thrpt 15 9819436.570 ± 300016.151 ops/s

Thoughts?

--

--

Open Source developer, most known for Jackson data processor (nee “JSON library”), author of many, many other OSS libraries for Java, from ClassMate to Woodstox

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
@cowtowncoder

Open Source developer, most known for Jackson data processor (nee “JSON library”), author of many, many other OSS libraries for Java, from ClassMate to Woodstox