What is the Big-O of String.contains() in Java?
Solution 1
One of the best known algorithms is the Boyer-Moore string searching algorithm which is O(n) although it can give sublinear performance in the best case.
Which algorithm is used in Java depends on which implemetation you download. It seems that for example OpenJDK uses a naive algorithm that runs in O(nm) and linear performance in the best case. See lines 1770-1806 here.
Solution 2
.contains()
definitely uses naive approach and equivalent to O(nm)
time complexity.
-
Boyer-moore takes
O(nm)
time in the worst case. -
KMP takes
O(n)
time in the worst case.
In one of the problems related to pattern matching, I used .contains()
and it took 70 ms
whereas replacing that line with patternSearch() //KMP search
reduced the time to 14 ms
.
Java source code | KMP search vs .contains()
Jason
Full stack Python & Java web dev in sunny/snowy Maine. Other interests include geospatial data and puzzles
Updated on April 06, 2020Comments
-
Jason about 4 years
I'm working on a project, and need to optimize the running time. Is
String.contains()
runtime the same asTreeSet.contains()
, which is O(logN)?The reason I'm asking is I'm building a
TreeMap<String, TreeSet<Song>>
, where Songs contain a String of lyrics. Depending on the efficiency, I am considering including a Set of the lyric words inside the Song and running searches on that rather than the String. -
Nicholas White over 13 yearsthe article you linked to said it's O(n) as it makes at most 3n comparisons. "worst case O(n)" is a tautology - by definition O(n) is the worst case :)
-
rakslice almost 10 yearsThe jdk1.6.0_23 had the same
String.indexOf()
implementation as a contemporary OpenJDK, according to programmers.stackexchange.com/questions/65558/… . Someone could tell you if that's true forString.contains()
-
cellepo over 8 years
-
John Kugelman over 7 years@NicholasWhite O(n) is an upper bound. It could be an upper bound of the worst case, average case, or best case performance. Upper and lower bound are orthogonal to best/average/worst case. Another orthogonal dimension is loose vs. tight. O(n) is a loose upper bound. Θ(n) is a tight upper and lower bound.
-
coderz almost 5 yearsSeems Trie only supports "starts with", not "contains"?
-
sjkm over 3 yearsKMP worst case is O(n + m) since you need to create the lsp table beforehand.
-
Nishant Thapliyal over 3 yearsTrue but even in worst-case where
n==m
-> O(n+m) -> O(2n) -> O(n)