What's faster: Regex or string operations?
Solution 1
It depends
Although string manipulation will usually be somewhat faster, the actual performance heavily depends on a number of factors, including:
- How many times you parse the regex
- How cleverly you write your string code
- Whether the regex is precompiled
As the regex gets more complicated, it will take much more effort and complexity to write equivlent string manipulation code that performs well.
Solution 2
String operations will always be faster than regular expression operations. Unless, of course, you write the string operations in an inefficient way.
Regular expressions have to be parsed, and code generated to perform the operation using string operations. At best, the regular expression operation can do what's optimal to do the string manipulations.
Regular expressions are not used because they can do anything faster than plain string operations, it's used because it can do very complicated operations with little code, with reasonably small overhead.
Solution 3
I've done some benchmarks with two functions called FunctionOne (string operations) and FunctionTwo (Regex). They should both get all matches between '<' and '>'.
benchmark #1:
- times called: 1'000'000
- input: 80 characters
- duration (string operations // FunctionOne): 1.12 sec
- duration (regex operation //FunctionTwo) : 1.88 sec
benchmark #2:
- times called: 1'000'000
- input: 2000 characters
- duration (string operations): 27.69 sec
- duration (regex operations): 41.436 sec
Conclusion: String operations will almost always beat regular expressions, if programmed efficiently. But the more complex it gets, the harder it will be that string operations can keep up not only in performance matters but also regarding maintenance.
Code FunctionOne
private void FunctionOne(string input) {
var matches = new List<string>();
var match = new StringBuilder();
Boolean startRecording = false;
foreach( char c in input) {
if (c.Equals('<')) {
startRecording = true;
continue;
}
if (c.Equals('>')) {
matches.Add(match.ToString());
match = new StringBuilder();
startRecording = false;
}
if (startRecording) {
match.Append(c);
}
}
}
Code FunctionTwo
Regex regx = new Regex("<.*?>");
private void FunctionTwo(string input) {
Match m = regx.Match(input);
var results = new List<string>();
while (m.Success) {
results.Add(m.Value);
m = m.NextMatch();
}
}
Solution 4
I did some profiling in C# a while back, comparing the following:
1)LINQ to Objects.
2)Lambda Expressions.
3)Traditional iterative method.
All 3 methods were tested both with and without Regular Expressions. The conclusion in my test case was clear that Regular Expressions are quite a bit slower than non-Regex in all 3 cases when searching for strings in a large amount of text.
You can read the details on my blog: http://www.midniteblog.com/?p=72
Fabian Bigler
Passionate, professional developer, always glad to help other people out of their misery. If you're bored, check out these blogs. Really worth reading: http://www.joelonsoftware.com/ http://www.codinghorror.com/blog/
Updated on December 31, 2020Comments
-
Fabian Bigler over 3 years
When should I use Regex over string operations and vice versa only regarding performance?
-
SLaks almost 11 yearsThe actual answer is that it heavily depends what you're doing, how, and how often
-
SLaks almost 11 yearsYour regex benchmark is very wrong; you're re-compiling the regex every time. If you reuse a single instance, it will become much faster. If you pass
RegexOptions.Compiled
, it will become even faster. -
Fabian Bigler almost 11 yearsOk Thanks SLaks, I will post my new results here.
-
SLaks almost 11 yearsYour string code is also very wrong; you're throwing away large numbers of string instances. You can make it much faster using a StringBuilder. (and probably also by tracking indices and concatenating ranges)
-
SLaks almost 11 yearsAlso, C# has character literals:
'<'
. -
David Heffernan almost 11 yearsI think you got mixed up here. The code describing what operations you want to perform should be in the question rather than the answer.
-
Fabian Bigler almost 11 yearsI thought it's a good way to have benchmarks to illustrate the difference between regular expressions. Thanks for helping me out, guys!
-
David Heffernan almost 11 yearsIt's really a non question. Ask and answer a question if you are going to provide useful insight that can't be found elsewhere. This question is rather pointless, in my view.
-
Fabian Bigler almost 11 years@David: No. I wanted to answer my own question. I did not want to perform anything specifically, really.
-
JulianR almost 11 yearsI disagree that string operations will always be faster than Regex. A regex, when compiled, becomes a super specialized and optimized .NET function which will likely beat the string operation by a fair margin. For example, I've found that a compiled Regex is faster than a
IndexOf
call. -
SLaks almost 11 years@JulianR: I don't think that's always true, although I have no evidence to support either possibility.
-
SLaks almost 11 yearsHowever, it would be very interesting to see the performance of a compiled regex here.
-
SLaks almost 11 yearsAlso,
[^>]*
would probably be faster.