c string compare vs hash compare

15,078

Solution 1

Is the comparison going to be done once or many times? If the comparison is going to be done only once then you are likely better off doing a straight comparison. If you are going to need to compare very many strings to this set of constant strings, then you can probably save time in the long run by doing it with hashes.

This is a simple enough problem that you can easily write it both ways and see which works better for a representative set of input.

Solution 2

It's difficult to get ahead, string hashing functions are O(n). String comparison is O(n) as well, with a smaller Oh. You would only be ahead if you can store the hash values you compute and use them repeatedly. For both.

Simple sample C hash functions are here.

Solution 3

If you are trying to match a subject string against a set of other strings, you might consider using the Aho-Corasick String Matching Algorithm. It uses a trie to match the subject against all of the target strings in a single pass (it's also quite simple to implement).

Solution 4

Equality of a hash value does not guarantee equality - a mismatch will guarantee inequality, though. If you're going to need to compare a lot of strings against your collection the a hash would be great - if it's a one-off comparison (unlikely I guess) then strcmp will do nicely.

Solution 5

I think if you have a static list of strings, I would store them in a sorted array and then use bsearch to determine if a string is in that list. This returns NULL if it does not exist, or a pointer to the value should it exist and is probably faster than a linear search or hashing.

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

/* cmp function for qsort and bsearch */
static int pstrcmp(const void *a, const void *b)
{
  return strcmp(*(char * const *)a, *(char * const *)b);
}

/* check an input against the list of known strings */
static char *check_for_match(char *input)
{
  static char *static_list[] = { "one", "two", "three", "four", "five" };
  static int nelems;

  /* this sorts the list, for demonstration purposes, but if the list
     is static then it could be sorted prior to compiling */
  if (! nelems)
  {
    nelems = sizeof(static_list) / sizeof(*static_list);
    qsort(static_list, nelems, sizeof(*static_list), pstrcmp);
  }


  return bsearch(&input, static_list, nelems, sizeof(*static_list), pstrcmp);
}

int main(int argc, char *argv[])
{
  if (check_for_match("should_not_match"))
  {
    printf("Match found.\n");
  } else {
    printf("No match found.\n");
  }

  if (check_for_match("two"))
  {
    printf("Match found.\n");
  } else {
    printf("No match found.\n");
  }
  return EXIT_SUCCESS;
}
Share:
15,078
romejoe
Author by

romejoe

Updated on June 12, 2022

Comments

  • romejoe
    romejoe about 2 years

    I need to compare a string to multiple other constant strings in c. I am curious which is faster, to hash the string I am going to compare and compare it to all the other constant string hashes or just compare the strings as strings. thank you in advance

    thank you for the answers I am going to be doing many comparisons. can anyone give me a good, fast, low resource intensive algorithm to use? The only hash I know of is MD5 and I have a feeling that is over kill.

    I also want to add that the strings are maybe 20 or 30 characters long at the max with most being around 7.