What's faster: Regex or string operations?

25,031

Solution 1

It depends

Although string manipulation will usually be somewhat faster, the actual performance heavily depends on a number of factors, including:

  • How many times you parse the regex
  • How cleverly you write your string code
  • Whether the regex is precompiled

As the regex gets more complicated, it will take much more effort and complexity to write equivlent string manipulation code that performs well.

Solution 2

String operations will always be faster than regular expression operations. Unless, of course, you write the string operations in an inefficient way.

Regular expressions have to be parsed, and code generated to perform the operation using string operations. At best, the regular expression operation can do what's optimal to do the string manipulations.

Regular expressions are not used because they can do anything faster than plain string operations, it's used because it can do very complicated operations with little code, with reasonably small overhead.

Solution 3

I've done some benchmarks with two functions called FunctionOne (string operations) and FunctionTwo (Regex). They should both get all matches between '<' and '>'.

benchmark #1:

  • times called: 1'000'000
  • input: 80 characters
  • duration (string operations // FunctionOne): 1.12 sec
  • duration (regex operation //FunctionTwo) : 1.88 sec

benchmark #2:

  • times called: 1'000'000
  • input: 2000 characters
  • duration (string operations): 27.69 sec
  • duration (regex operations): 41.436 sec

Conclusion: String operations will almost always beat regular expressions, if programmed efficiently. But the more complex it gets, the harder it will be that string operations can keep up not only in performance matters but also regarding maintenance.

Code FunctionOne

private void FunctionOne(string input) {
    var matches = new List<string>();
    var match = new StringBuilder();
    Boolean startRecording = false;
    foreach( char c in input) {
        if (c.Equals('<')) {
            startRecording = true;
            continue;
        }

        if (c.Equals('>')) {
            matches.Add(match.ToString());
            match = new StringBuilder();
            startRecording = false;
        }

        if (startRecording) {
            match.Append(c);
        }
    }
}

Code FunctionTwo

Regex regx = new Regex("<.*?>");
private void FunctionTwo(string input) {
    Match m = regx.Match(input);
    var results = new List<string>();
    while (m.Success) {
        results.Add(m.Value);
        m = m.NextMatch();
    }
}

Solution 4

I did some profiling in C# a while back, comparing the following:

1)LINQ to Objects.

2)Lambda Expressions.

3)Traditional iterative method.

All 3 methods were tested both with and without Regular Expressions. The conclusion in my test case was clear that Regular Expressions are quite a bit slower than non-Regex in all 3 cases when searching for strings in a large amount of text.

You can read the details on my blog: http://www.midniteblog.com/?p=72

Share:
25,031
Fabian Bigler
Author by

Fabian Bigler

Passionate, professional developer, always glad to help other people out of their misery. If you're bored, check out these blogs. Really worth reading: http://www.joelonsoftware.com/ http://www.codinghorror.com/blog/

Updated on December 31, 2020

Comments

  • Fabian Bigler
    Fabian Bigler over 3 years

    When should I use Regex over string operations and vice versa only regarding performance?

  • SLaks
    SLaks almost 11 years
    The actual answer is that it heavily depends what you're doing, how, and how often
  • SLaks
    SLaks almost 11 years
    Your regex benchmark is very wrong; you're re-compiling the regex every time. If you reuse a single instance, it will become much faster. If you pass RegexOptions.Compiled, it will become even faster.
  • Fabian Bigler
    Fabian Bigler almost 11 years
    Ok Thanks SLaks, I will post my new results here.
  • SLaks
    SLaks almost 11 years
    Your string code is also very wrong; you're throwing away large numbers of string instances. You can make it much faster using a StringBuilder. (and probably also by tracking indices and concatenating ranges)
  • SLaks
    SLaks almost 11 years
    Also, C# has character literals: '<'.
  • David Heffernan
    David Heffernan almost 11 years
    I think you got mixed up here. The code describing what operations you want to perform should be in the question rather than the answer.
  • Fabian Bigler
    Fabian Bigler almost 11 years
    I thought it's a good way to have benchmarks to illustrate the difference between regular expressions. Thanks for helping me out, guys!
  • David Heffernan
    David Heffernan almost 11 years
    It's really a non question. Ask and answer a question if you are going to provide useful insight that can't be found elsewhere. This question is rather pointless, in my view.
  • Fabian Bigler
    Fabian Bigler almost 11 years
    @David: No. I wanted to answer my own question. I did not want to perform anything specifically, really.
  • JulianR
    JulianR almost 11 years
    I disagree that string operations will always be faster than Regex. A regex, when compiled, becomes a super specialized and optimized .NET function which will likely beat the string operation by a fair margin. For example, I've found that a compiled Regex is faster than a IndexOf call.
  • SLaks
    SLaks almost 11 years
    @JulianR: I don't think that's always true, although I have no evidence to support either possibility.
  • SLaks
    SLaks almost 11 years
    However, it would be very interesting to see the performance of a compiled regex here.
  • SLaks
    SLaks almost 11 years
    Also, [^>]* would probably be faster.