Compare Strings Javascript Return %of Likely

javascript string algorithm comparison

72,603

Solution 1

Here's an answer based on Levenshtein distance https://en.wikipedia.org/wiki/Levenshtein_distance

function similarity(s1, s2) {
  var longer = s1;
  var shorter = s2;
  if (s1.length < s2.length) {
    longer = s2;
    shorter = s1;
  }
  var longerLength = longer.length;
  if (longerLength == 0) {
    return 1.0;
  }
  return (longerLength - editDistance(longer, shorter)) / parseFloat(longerLength);
}

For calculating edit distance

function editDistance(s1, s2) {
  s1 = s1.toLowerCase();
  s2 = s2.toLowerCase();

  var costs = new Array();
  for (var i = 0; i <= s1.length; i++) {
    var lastValue = i;
    for (var j = 0; j <= s2.length; j++) {
      if (i == 0)
        costs[j] = j;
      else {
        if (j > 0) {
          var newValue = costs[j - 1];
          if (s1.charAt(i - 1) != s2.charAt(j - 1))
            newValue = Math.min(Math.min(newValue, lastValue),
              costs[j]) + 1;
          costs[j - 1] = lastValue;
          lastValue = newValue;
        }
      }
    }
    if (i > 0)
      costs[s2.length] = lastValue;
  }
  return costs[s2.length];
}

Usage

similarity('Stack Overflow','Stack Ovrflw')

returns 0.8571428571428571

You can play with it below:

function checkSimilarity(){
  var str1 = document.getElementById("lhsInput").value;
  var str2 = document.getElementById("rhsInput").value;
  document.getElementById("output").innerHTML = similarity(str1, str2);
}

function similarity(s1, s2) {
      var longer = s1;
      var shorter = s2;
      if (s1.length < s2.length) {
        longer = s2;
        shorter = s1;
      }
      var longerLength = longer.length;
      if (longerLength == 0) {
        return 1.0;
      }
      return (longerLength - editDistance(longer, shorter)) / parseFloat(longerLength);
    }

    function editDistance(s1, s2) {
      s1 = s1.toLowerCase();
      s2 = s2.toLowerCase();

      var costs = new Array();
      for (var i = 0; i <= s1.length; i++) {
        var lastValue = i;
        for (var j = 0; j <= s2.length; j++) {
          if (i == 0)
            costs[j] = j;
          else {
            if (j > 0) {
              var newValue = costs[j - 1];
              if (s1.charAt(i - 1) != s2.charAt(j - 1))
                newValue = Math.min(Math.min(newValue, lastValue),
                  costs[j]) + 1;
              costs[j - 1] = lastValue;
              lastValue = newValue;
            }
          }
        }
        if (i > 0)
          costs[s2.length] = lastValue;
      }
      return costs[s2.length];
    }

<div><label for="lhsInput">String 1:</label> <input type="text" id="lhsInput" oninput="checkSimilarity()" /></div>
<div><label for="rhsInput">String 2:</label> <input type="text" id="rhsInput" oninput="checkSimilarity()" /></div>
<div>Match: <span id="output">No Input</span></div>

Solution 2

Using this library for string similarity worked like a charm for me!

Here's the Example -

var similarity = stringSimilarity.compareTwoStrings("Apples","apple");    // => 0.88

Solution 3

Here is a very simple function that does a comparison and returns a percentage based on equivalency. While it has not been tested for all possible scenarios, it may help you get started.

function similar(a,b) {
    var equivalency = 0;
    var minLength = (a.length > b.length) ? b.length : a.length;    
    var maxLength = (a.length < b.length) ? b.length : a.length;    
    for(var i = 0; i < minLength; i++) {
        if(a[i] == b[i]) {
            equivalency++;
        }
    }
    

    var weight = equivalency / maxLength;
    return (weight * 100) + "%";
}
alert(similar("test","tes"));   // 75%
alert(similar("test","test"));  // 100%
alert(similar("test","testt")); // 80%
alert(similar("test","tess"));  // 75%

Solution 4

To Find degree of similarity between two strings; we can use more than one or two methods but I am mostly inclined towards the usage of 'Dice's Coefficient' . which is better! well in my knowledge than using 'Levenshtein distance'

Using this 'string-similarity' package from npm you will be able to work on what I said above.

some easy usage examples are

var stringSimilarity = require('string-similarity');

var similarity = stringSimilarity.compareTwoStrings('healed', 'sealed'); 

var matches = stringSimilarity.findBestMatch('healed', ['edward', 'sealed', 'theatre']);

for more please visit the link given above. Thankyou.

Solution 5

Just one I quickly wrote that might be good enough for your purposes:

function Compare(strA,strB){
    for(var result = 0, i = strA.length; i--;){
        if(typeof strB[i] == 'undefined' || strA[i] == strB[i]);
        else if(strA[i].toLowerCase() == strB[i].toLowerCase())
            result++;
        else
            result += 4;
    }
    return 1 - (result + 4*Math.abs(strA.length - strB.length))/(2*(strA.length+strB.length));
}

This weighs characters that are the same but different case 1 quarter as heavily as characters that are completely different or missing. It returns a number between 0 and 1, 1 meaning the strings are identical. 0 meaning they have no similarities. Examples:

Compare("Apple", "Apple")    // 1
Compare("Apples", "Apple")   // 0.8181818181818181
Compare("Apples", "apple")   // 0.7727272727272727
Compare("a", "A")            // 0.75
Compare("Apples", "appppp")  // 0.45833333333333337
Compare("a", "b")            // 0

View more solutions

72,603

Brad Ruderman

Updated on February 16, 2022

Comments

Brad Ruderman about 2 years

I am looking for a JavaScript function that can compare two strings and return the likeliness that they are alike. I have looked at soundex but that's not really great for multi-word strings or non-names. I am looking for a function like:

    function compare(strA,strB){
    
    }
    
    compare("Apples","apple") = Some X Percentage.

The function would work with all types of strings, including numbers, multi-word values, and names. Perhaps there's a simple algorithm I could use?

Ultimately none of these served my purpose so I used this:

     function compare(c, u) {
            var incept = false;
            var ca = c.split(",");
            u = clean(u);
            //ca = correct answer array (Collection of all correct answer)
            //caa = a single correct answer word array (collection of words of a single correct answer)
            //u = array of user answer words cleaned using custom clean function
            for (var z = 0; z < ca.length; z++) {
                caa = $.trim(ca[z]).split(" ");
                var pc = 0;
                for (var x = 0; x < caa.length; x++) {
                    for (var y = 0; y < u.length; y++) {
                        if (soundex(u[y]) != null && soundex(caa[x]) != null) {
                            if (soundex(u[y]) == soundex(caa[x])) {
                                pc = pc + 1;
                            }
                        }
                        else {
                            if (u[y].indexOf(caa[x]) > -1) {
                                pc = pc + 1;
                            }
                        }
                    }
                }
                if ((pc / caa.length) > 0.5) {
                    return true;
                }
            }
            return false;
        }
        
        // create object listing the SOUNDEX values for each letter
        // -1 indicates that the letter is not coded, but is used for coding
        //  0 indicates that the letter is omitted for modern census archives
        //                              but acts like -1 for older census archives
        //  1 is for BFPV
        //  2 is for CGJKQSXZ
        //  3 is for DT
        //  4 is for L
        //  5 is for MN my home state
        //  6 is for R
        function makesoundex() {
            this.a = -1
            this.b = 1
            this.c = 2
            this.d = 3
            this.e = -1
            this.f = 1
            this.g = 2
            this.h = 0
            this.i = -1
            this.j = 2
            this.k = 2
            this.l = 4
            this.m = 5
            this.n = 5
            this.o = -1
            this.p = 1
            this.q = 2
            this.r = 6
            this.s = 2
            this.t = 3
            this.u = -1
            this.v = 1
            this.w = 0
            this.x = 2
            this.y = -1
            this.z = 2
        }
        
        var sndx = new makesoundex()
        
        // check to see that the input is valid
        function isSurname(name) {
            if (name == "" || name == null) {
                return false
            } else {
                for (var i = 0; i < name.length; i++) {
                    var letter = name.charAt(i)
                    if (!(letter >= 'a' && letter <= 'z' || letter >= 'A' && letter <= 'Z')) {
                        return false
                    }
                }
            }
            return true
        }
        
        // Collapse out directly adjacent sounds
        // 1. Assume that surname.length>=1
        // 2. Assume that surname contains only lowercase letters
        function collapse(surname) {
            if (surname.length == 1) {
                return surname
            }
            var right = collapse(surname.substring(1, surname.length))
            if (sndx[surname.charAt(0)] == sndx[right.charAt(0)]) {
                return surname.charAt(0) + right.substring(1, right.length)
            }
            return surname.charAt(0) + right
        }
        
        // Collapse out directly adjacent sounds using the new National Archives method
        // 1. Assume that surname.length>=1
        // 2. Assume that surname contains only lowercase letters
        // 3. H and W are completely ignored
        function omit(surname) {
            if (surname.length == 1) {
                return surname
            }
            var right = omit(surname.substring(1, surname.length))
            if (!sndx[right.charAt(0)]) {
                return surname.charAt(0) + right.substring(1, right.length)
            }
            return surname.charAt(0) + right
        }
        
        // Output the coded sequence
        function output_sequence(seq) {
            var output = seq.charAt(0).toUpperCase() // Retain first letter
            output += "-" // Separate letter with a dash
            var stage2 = seq.substring(1, seq.length)
            var count = 0
            for (var i = 0; i < stage2.length && count < 3; i++) {
                if (sndx[stage2.charAt(i)] > 0) {
                    output += sndx[stage2.charAt(i)]
                    count++
                }
            }
            for (; count < 3; count++) {
                output += "0"
            }
            return output
        }
        
        // Compute the SOUNDEX code for the surname
        function soundex(value) {
            if (!isSurname(value)) {
                return null
            }
            var stage1 = collapse(value.toLowerCase())
            //form.result.value=output_sequence(stage1);
        
            var stage1 = omit(value.toLowerCase())
            var stage2 = collapse(stage1)
            return output_sequence(stage2);
        
        }
        
        function clean(u) {
            var u = u.replace(/\,/g, "");
            u = u.toLowerCase().split(" ");
            var cw = ["ARRAY OF WORDS TO BE EXCLUDED FROM COMPARISON"];
            var n = [];
            for (var y = 0; y < u.length; y++) {
                var test = false;
                for (var z = 0; z < cw.length; z++) {
                    if (u[y] != "" && u[y] != cw[z]) {
                        test = true;
                        break;
                    }
                }
                if (test) {
        //Don't use & or $ in comparison
                    var val = u[y].replace("$", "").replace("&", "");
                    n.push(val);
                }
            }
            return n;
        }

Andreas almost 12 years

Efficient way of calculating likeness scores of strings when sample size is large?
Brad Ruderman almost 12 years

I am testing this out, still having trouble finding the perfect one. The classic example that breaks these. Say the question is "What are the first two presidents?" and someone answers "adams and washington". A string comparison to "george washington john adams" should be roughly 50%.
Shawn Whinnery almost 6 years

oof, depends on jQuery?

peresisUser over 8 years

The problem with this is that "atest" and "test" return 0%, which we know isn't true.
Kousha about 8 years

Not so accurate: Compare("Apple", "zApple") = 0.07 , while Compare("Apple", "Applez") = 0.84
Paul about 8 years

@Kousha , it's positional. "Apple" and "zApple" only have one letter in common (the second p).
Kousha about 8 years

@Paulpro Apple, and zApple have five letters in common logically. It is your implementation fault. Apple, zApple, Applez are similar.
Paul about 8 years

@Kousha, zApple isn't similar according to this algorithm, since it's positional. That doesn't make the algorithm incorrect.
infinito84 almost 7 years

An improvement for several words: var similarity2 = function(s1, s2){ var split1 = s1.split(' '); var split2 = s2.split(' '); var sum = 0; var max = 0; var temp = 0; for(var i=0; i<split1.length;i++){ max = 0; for(var j=0; j<split2.length;j++){ temp = similarity(split1[i], split2[j]); if(max < temp) max = temp; } console.log(max); sum += max / split1.length; } return sum; };
MarcoS almost 7 years

@Paulpro: That doesn't make your algorithm incorrect, but makes it a poor answer for this question...
QWERTY over 6 years

Is the similarity returned based on matching character? How does it evaluate the similarity
QWERTY over 6 years

@overlord1234 Does the method above works for string like this: 9e27dbb9ff6eea70821c02b4457cbc6b7eb8e12a64f46c192c3a05f1bc15‌19acd101193dac157c62‌33d9d773f9b364ca210d‌6287f9efa00bfc656746‌782905be ?
overlord1234 over 6 years

It does work with strings without a semantic attached to it. Please try and run the in-line code snippet (thanks to David). I get a similarity of 0.17857142857142858 when I input the aforementioned strings.
infinito84 about 6 years

@hyperfkcb he's implementing the edit distance algorithm, which counts how many characters are in the wrong position (more or less), so for calculating the percentage he is taking the longer possible edit distance value (longerLength) and doing (longerLength - editDistance ) / longerLength.
GrumpyCrouton almost 6 years

That's great, except stringSimilarity has a dependancy called lodash which contains over 1,000 files being put into your project just so you can get string similarity.
Tushar Walzade almost 6 years

Yeah, it happens while adding a package locally. But instead, we can use CDN to for a lesser bundle size. Here are the CDN links - jsdelivr.com/package/npm/lodash - jsdelivr.com/package/npm/string-similarity
upupming over 5 years

However, it is too slow for long strings.
Peter Warbo about 5 years

@overlord1234 If I would like to exclude chars like !,;. how could I do that? For instance matching Help me! should give 100% match of Help me
overlord1234 almost 5 years

@PeterWarbo Remove any special characters from the base strings s1 and s2 before doing the comparison. Whether you want to remove the special characters completely or introduce a blank space, that depends on your use case
Lovenkrands over 4 years

They have removed most dependencies, including lodash
David Buck over 4 years

A link to a solution is welcome, but please ensure your answer is useful without it: add context around the link so your fellow users will have some idea what it is and why it’s there, then quote the most relevant part of the page you're linking to in case the target page is unavailable. Answers that are little more than a link may be deleted.
KTibow almost 4 years

Why does it think "aabb" and "bbaa" are 0% of a match?
Admin over 2 years

But it doesn't work for caps. It's not case sensitive!
Nigrimmist about 2 years

It is a strange, but "KIA" and "Kia" will produce 0.0. ¯_(ツ)_/¯
Tushar Walzade about 2 years

@Nigrimmist, you may try with converting them toLowerCase in that case.
Nigrimmist about 2 years

@TusharWalzade it was i did, yes, thank you :)
Nigrimmist about 2 years

Not so good from performance side of view, check my post in this thread. Better to use string-similarity library, it twice faster