c# Fastest way to compare strings

48,473

Solution 1

strings operator equals does the length check before comparing the chars. So you do not save the comparison of the contents with this trick. You might still save a few CPU cycles because your length check assumes that the strings are not null, while the BCL must check that. So if the lengths are not equal most of the time, you will short-circuit a few instructions.

I might just be wrong here, though. Maybe the operator gets inlined and the checks optimized out. Who knows for sure? (He who measures.)

If you care about saving every cycle you can you should probably use a different strategy in the first place. Maybe managed code is not even the right choice. Given that, I recommend to use the shorter form and not use the additional check.

Solution 2

String.Equality Operator or == internally calls string.Equals, so use string.Equals or == provided by the framework. It is already optimized enough.

It first compare references, then length and then actual characters.

You can find the source code here

Code: (Source: http://www.dotnetframework.org/default.aspx/4@0/4@0/DEVDIV_TFS/Dev10/Releases/RTMRel/ndp/clr/src/BCL/System/String@cs/1305376/String@cs)

// Determines whether two strings match.
[ReliabilityContract(Consistency.WillNotCorruptState, Cer.MayFail)]
public override bool Equals(Object obj) {
    if (this == null)                        //this is necessary to guard against reverse-pinvokes and
        throw new NullReferenceException();  //other callers who do not use the callvirt instruction

    String str = obj as String;
    if (str == null)
        return false;

    if (Object.ReferenceEquals(this, obj))
        return true;

    return EqualsHelper(this, str);
}

and

[System.Security.SecuritySafeCritical]  // auto-generated
[ReliabilityContract(Consistency.WillNotCorruptState, Cer.MayFail)]
private unsafe static bool EqualsHelper(String strA, String strB)
{
    Contract.Requires(strA != null);
    Contract.Requires(strB != null);
    int length = strA.Length;
    if (length != strB.Length) return false;

    fixed (char* ap = &strA.m_firstChar) fixed (char* bp = &strB.m_firstChar)
    {
        char* a = ap;
        char* b = bp;

        // unroll the loop
#if AMD64
        // for AMD64 bit platform we unroll by 12 and
        // check 3 qword at a time. This is less code
        // than the 32 bit case and is shorter
        // pathlength

        while (length >= 12)
        {
            if (*(long*)a     != *(long*)b) break;
            if (*(long*)(a+4) != *(long*)(b+4)) break;
            if (*(long*)(a+8) != *(long*)(b+8)) break;
            a += 12; b += 12; length -= 12;
        }
 #else
        while (length >= 10)
        {
            if (*(int*)a != *(int*)b) break;
            if (*(int*)(a+2) != *(int*)(b+2)) break;
            if (*(int*)(a+4) != *(int*)(b+4)) break;
            if (*(int*)(a+6) != *(int*)(b+6)) break;
            if (*(int*)(a+8) != *(int*)(b+8)) break;
            a += 10; b += 10; length -= 10;
        }
  #endif

        // This depends on the fact that the String objects are
        // always zero terminated and that the terminating zero is not included
        // in the length. For odd string sizes, the last compare will include
        // the zero terminator.
        while (length > 0)
        {
            if (*(int*)a != *(int*)b) break;
            a += 2; b += 2; length -= 2;
        }

        return (length <= 0);
    }
}

Solution 3

My test results

Compare 10000 strings to 10000 other strings all the same length (256)

Time (s1 == s2): 32536889 ticks

Time (s1.Length == s2.Length) && (s1 == s2): 37380529 ticks

Compare 10000 strings to 10000 other strings Random length max 256

Time (s1 == s2): 27223517 ticks

Time (s1.Length == s2.Length) && (s1 == s2): 23419529 ticks

Compare 10000 strings to 10000 other strings Random length min 256 max 512

Time (s1 == s2): 28904898 ticks

Time (s1.Length == s2.Length) && (s1 == s2): 25442710 ticks

What I find mind boggling is that a compare of 10000 equal length strings will take longer than comparing the same amount of data that is larger.

All these test have been done with exactly the same data.

Solution 4

For the geeks among us, here's a page which does a great job at benchmarking numerous ways to compare strings.

In a nutshell, the fastest method appears to be the CompareOrdinal:

if (string.CompareOrdinal(stringsWeWantToSeeIfMatches[x], stringsWeAreComparingAgainst[x]) == 0)
{
//they're equal
}

The second best way seems to be using either a Dictionary or Hashset with the "key" as the string you want to compare.

Makes for an interesting read.

Solution 5

According ILSpy, the string == operator is defined as:

public static bool operator ==(string a, string b)
{
    return string.Equals(a, b);
}

Which is defined as

public static bool Equals(string a, string b)
{
    return a == b || (a != null && b != null && a.Length == b.Length && string.EqualsHelper(a, b));
}

I assume that first a == b is actually a reference equality check (ILSpy is just rendering it as ==), otherwise this would be an infinitely recursive method.

This means that == already checks the lengths of the strings before actually comparing their characters.

Share:
48,473
CoolCodeBro
Author by

CoolCodeBro

Updated on July 21, 2022

Comments

  • CoolCodeBro
    CoolCodeBro almost 2 years

    I've noticed that

    string1.Length == string2.Length && string1 == string2
    

    on large strings is slightly faster than just

    string1 == string2
    

    Is this true? And is this a good practice to compare large string lengths before comparing actual strings?