Go compare strings

12,993

Solution 1

Even though this comparator function exists in the strings package (strings.Compare()), even its doc advises against using it:

Compare returns an integer comparing two strings lexicographically. The result will be 0 if a==b, -1 if a < b, and +1 if a > b.

Compare is included only for symmetry with package bytes. It is usually clearer and always faster to use the built-in string comparison operators ==, <, >, and so on.

Why is it not practical to use strings.Compare()?

There are several reasons.

First, in languages where this kind of Compare() is practical / common, usually those languages support sorting that builds exactly on a function with this signature.

For example in Java there is the Comparator interface which you may pass to Collections.sort(). So in Java you are forced to have / implement this kind of comparison (that returns -1, 0 or 1).

In Go, sorting is not based on such comparator function. In Go sorting is based on a single Less(i, j int) bool function which is basically a[i] < a[j] comparison, it's just "is it less?". For that, you don't need strings.Compare(), you only need a < b. For example, see sort.Slice().

Second reason: strings.Compare() is not optimized intentionally so you don't accustomed using it. The implementation of strings.Compare() has this comment:

// NOTE(rsc): This function does NOT call the runtime cmpstring function,
// because we do not want to provide any performance justification for
// using strings.Compare. Basically no one should use strings.Compare.
// As the comment above says, it is here only for symmetry with package bytes.
// If performance is important, the compiler should be changed to recognize
// the pattern so that all code doing three-way comparisons, not just code
// using strings.Compare, can benefit.

What this means is that a < b will be faster than calling strings.Compare(a, b).

Third, the return value of strings.Compare() is a single integer carrying the information whether a is less than b, or whether a is equal to b or whether a is greater than b. If you do need to use all 3 branches (not just the "less" or "equal" branch), you usually need to do further inspection on the return value of strings.Compare(), like in this simple example:

switch strings.Compare("a", "b") {
case -1:
    fmt.Println("less")
case 0:
    fmt.Println("equal")
case 1: // or default:
    fmt.Println("greater")
}

Now if you think about it: comparisons are first performed inside strings.Compare(), and then again in your code (comparing the return value). This is redundant, and again less performant.

The above can be written like this (which will be faster):

switch {
case a == b:
    fmt.Println("equal")
case a < b:
    fmt.Println("less")
default:
    fmt.Println("greater")
}

More about efficiency

As said before, strings.Compare() is not optimized for performance deliberately. But Go's sorting library does not need the -1, 0, 1 result for sorting strings, only the result of a < b, which can be acquired with the same efficiency as acquiring the result of Compare() in other languages.

Also note that strings.Compare() first checks for equality a == b, and only if they are not equal proceeds to check a < b. This is important because string values in Go store the length of the string (for details, see reflect.StringHeader), which means if 2 strings have different lengths, it can be decided immediately that they are not equal. C and C++ use \0-terminated string values, which means to tell if 2 strings are equal always requires to compare the whole strings, even if one is a thousand characters and the other is one less. Actually this is not completely true, because if a mismatch is detected when comparing the characters, the comparison ends, but this may still be a lot slower than comparing 2 integers.

Also see related question: Using the == symbol in golang and using a loop to compare if string a equals string b,which performance is better?

Solution 2

What about Compare function ?

Golang Doc

Solution 3

Go was designed by programmers for programmers. If you want a C strcmp function, write one in Go.

For example,

package main

import "fmt"

func strcmp(s1, s2 string) int {
    lens := len(s1)
    if lens > len(s2) {
        lens = len(s2)
    }
    for i := 0; i < lens; i++ {
        if s1[i] != s2[i] {
            return int(s1[i]) - int(s2[i])
        }
    }
    return len(s1) - len(s2)
}

func main() {
    tests := []struct {
        s1, s2 string
        cmp    int
    }{
        {"", "", 0},
        {"a", "a", 0},
        {"a", "b", -1},
        {"b", "a", +1},
        {"a", "aa", -1},
        {"aa", "a", 1},
    }
    for _, t := range tests {
        cmp := strcmp(t.s1, t.s2)
        fmt.Printf("%q %q %d %t\n", t.s1, t.s2, cmp, cmp == t.cmp)
    }
}

Playground: https://play.golang.org/p/EAzV5_ouDI2

Output:

"" "" 0 true
"a" "a" 0 true
"a" "b" -1 true
"b" "a" 1 true
"a" "aa" -1 true
"aa" "a" 1 true

GNU C Library (glibc): strcmp.c

Share:
12,993

Related videos on Youtube

Richard
Author by

Richard

I'm interested in mathematics, and i love it very much!

Updated on June 04, 2022

Comments

  • Richard
    Richard about 2 years

    Given two strings a and b, sometimes I want to determine which one of the three statements: a < b, a == b or a > b is true.

    In a language like C or C++, I will get an int value v after one invoking of the corresponding function or method. Then I can determine which of the above statements is true by examining whether v < 0, v == 0 or v > 0.

    But in Go I have to do at least two comparisons (e.g. first test a < b then test a == b) to find out which one of the three statements is true.

    My question is that is there a way in Go so that I can just do a single comparison?

    It turns out that this feature is called three way comparison.

    • Richard
      Richard over 5 years
      Those who down vote this question. Please read my question again. I know that there is a function strings.Compare(). The problem is that golang seems don't allow you to achieve the same goal with only one single comparison. In language like C/C++, you need only one comparison of strings then compare integer yourself, which is more cheap compared with string comparison.
    • Adrian
      Adrian over 5 years
      If you had the integer values -1, 0, or 1, you would still have to do two comparisons to find out which of the three statements is true. So it's not clear what you concern is regarding the built-in comparison operators.
    • Michael Hampton
      Michael Hampton over 5 years
      I also don't see how you can only do "one" comparison in C/C++. I suspect you are either confused, or haven't explained what you mean well enough for us to understand it. Either way, strings.Compare() is what you want.
  • Richard
    Richard over 5 years
    If you look at the source code, you'll notice that this function is implemented using those 3 operators I mentioned in the question.
  • Richard
    Richard over 5 years
    If you use comparison function in C or C++, you just have to compare these two strings once, then compare to integers. In the case of golang, you have to compare them twice (use strings.Compare() or not). So in theory, the running time of golang will be as twice as long as the case of C/C++.
  • icza
    icza over 5 years
    @Richard That statement is unjustified. strings.Compare() first checks for equality, which if the length of 2 strings are different can be decided immediately because string also stores the length (unlike strings in C/C++).
  • icza
    icza over 5 years
    @Richard And also as mentioned in the answer, sorting strings does not require the -1, 0, 1 result of strings.Compare(), the sorting algorithm only requires the result of a < b. So while other languages may sort based on Compare(), Go's sorting library sorts only based on a < b.