Compare Strings Javascript Return %of Likely

72,603

Solution 1

Here's an answer based on Levenshtein distance https://en.wikipedia.org/wiki/Levenshtein_distance

function similarity(s1, s2) {
  var longer = s1;
  var shorter = s2;
  if (s1.length < s2.length) {
    longer = s2;
    shorter = s1;
  }
  var longerLength = longer.length;
  if (longerLength == 0) {
    return 1.0;
  }
  return (longerLength - editDistance(longer, shorter)) / parseFloat(longerLength);
}

For calculating edit distance

function editDistance(s1, s2) {
  s1 = s1.toLowerCase();
  s2 = s2.toLowerCase();

  var costs = new Array();
  for (var i = 0; i <= s1.length; i++) {
    var lastValue = i;
    for (var j = 0; j <= s2.length; j++) {
      if (i == 0)
        costs[j] = j;
      else {
        if (j > 0) {
          var newValue = costs[j - 1];
          if (s1.charAt(i - 1) != s2.charAt(j - 1))
            newValue = Math.min(Math.min(newValue, lastValue),
              costs[j]) + 1;
          costs[j - 1] = lastValue;
          lastValue = newValue;
        }
      }
    }
    if (i > 0)
      costs[s2.length] = lastValue;
  }
  return costs[s2.length];
}

Usage

similarity('Stack Overflow','Stack Ovrflw')

returns 0.8571428571428571


You can play with it below:

function checkSimilarity(){
  var str1 = document.getElementById("lhsInput").value;
  var str2 = document.getElementById("rhsInput").value;
  document.getElementById("output").innerHTML = similarity(str1, str2);
}

function similarity(s1, s2) {
      var longer = s1;
      var shorter = s2;
      if (s1.length < s2.length) {
        longer = s2;
        shorter = s1;
      }
      var longerLength = longer.length;
      if (longerLength == 0) {
        return 1.0;
      }
      return (longerLength - editDistance(longer, shorter)) / parseFloat(longerLength);
    }

    function editDistance(s1, s2) {
      s1 = s1.toLowerCase();
      s2 = s2.toLowerCase();

      var costs = new Array();
      for (var i = 0; i <= s1.length; i++) {
        var lastValue = i;
        for (var j = 0; j <= s2.length; j++) {
          if (i == 0)
            costs[j] = j;
          else {
            if (j > 0) {
              var newValue = costs[j - 1];
              if (s1.charAt(i - 1) != s2.charAt(j - 1))
                newValue = Math.min(Math.min(newValue, lastValue),
                  costs[j]) + 1;
              costs[j - 1] = lastValue;
              lastValue = newValue;
            }
          }
        }
        if (i > 0)
          costs[s2.length] = lastValue;
      }
      return costs[s2.length];
    }
<div><label for="lhsInput">String 1:</label> <input type="text" id="lhsInput" oninput="checkSimilarity()" /></div>
<div><label for="rhsInput">String 2:</label> <input type="text" id="rhsInput" oninput="checkSimilarity()" /></div>
<div>Match: <span id="output">No Input</span></div>

Solution 2

Using this library for string similarity worked like a charm for me!

Here's the Example -

var similarity = stringSimilarity.compareTwoStrings("Apples","apple");    // => 0.88

Solution 3

Here is a very simple function that does a comparison and returns a percentage based on equivalency. While it has not been tested for all possible scenarios, it may help you get started.

function similar(a,b) {
    var equivalency = 0;
    var minLength = (a.length > b.length) ? b.length : a.length;    
    var maxLength = (a.length < b.length) ? b.length : a.length;    
    for(var i = 0; i < minLength; i++) {
        if(a[i] == b[i]) {
            equivalency++;
        }
    }
    

    var weight = equivalency / maxLength;
    return (weight * 100) + "%";
}
alert(similar("test","tes"));   // 75%
alert(similar("test","test"));  // 100%
alert(similar("test","testt")); // 80%
alert(similar("test","tess"));  // 75%

Solution 4

To Find degree of similarity between two strings; we can use more than one or two methods but I am mostly inclined towards the usage of 'Dice's Coefficient' . which is better! well in my knowledge than using 'Levenshtein distance'

Using this 'string-similarity' package from npm you will be able to work on what I said above.

some easy usage examples are

var stringSimilarity = require('string-similarity');

var similarity = stringSimilarity.compareTwoStrings('healed', 'sealed'); 

var matches = stringSimilarity.findBestMatch('healed', ['edward', 'sealed', 'theatre']);

for more please visit the link given above. Thankyou.

Solution 5

Just one I quickly wrote that might be good enough for your purposes:

function Compare(strA,strB){
    for(var result = 0, i = strA.length; i--;){
        if(typeof strB[i] == 'undefined' || strA[i] == strB[i]);
        else if(strA[i].toLowerCase() == strB[i].toLowerCase())
            result++;
        else
            result += 4;
    }
    return 1 - (result + 4*Math.abs(strA.length - strB.length))/(2*(strA.length+strB.length));
}

This weighs characters that are the same but different case 1 quarter as heavily as characters that are completely different or missing. It returns a number between 0 and 1, 1 meaning the strings are identical. 0 meaning they have no similarities. Examples:

Compare("Apple", "Apple")    // 1
Compare("Apples", "Apple")   // 0.8181818181818181
Compare("Apples", "apple")   // 0.7727272727272727
Compare("a", "A")            // 0.75
Compare("Apples", "appppp")  // 0.45833333333333337
Compare("a", "b")            // 0
Share:
72,603

Related videos on Youtube

Brad Ruderman
Author by

Brad Ruderman

Updated on February 16, 2022

Comments

  • Brad Ruderman
    Brad Ruderman about 2 years

    I am looking for a JavaScript function that can compare two strings and return the likeliness that they are alike. I have looked at soundex but that's not really great for multi-word strings or non-names. I am looking for a function like:

        function compare(strA,strB){
        
        }
        
        compare("Apples","apple") = Some X Percentage.
    

    The function would work with all types of strings, including numbers, multi-word values, and names. Perhaps there's a simple algorithm I could use?

    Ultimately none of these served my purpose so I used this:

         function compare(c, u) {
                var incept = false;
                var ca = c.split(",");
                u = clean(u);
                //ca = correct answer array (Collection of all correct answer)
                //caa = a single correct answer word array (collection of words of a single correct answer)
                //u = array of user answer words cleaned using custom clean function
                for (var z = 0; z < ca.length; z++) {
                    caa = $.trim(ca[z]).split(" ");
                    var pc = 0;
                    for (var x = 0; x < caa.length; x++) {
                        for (var y = 0; y < u.length; y++) {
                            if (soundex(u[y]) != null && soundex(caa[x]) != null) {
                                if (soundex(u[y]) == soundex(caa[x])) {
                                    pc = pc + 1;
                                }
                            }
                            else {
                                if (u[y].indexOf(caa[x]) > -1) {
                                    pc = pc + 1;
                                }
                            }
                        }
                    }
                    if ((pc / caa.length) > 0.5) {
                        return true;
                    }
                }
                return false;
            }
            
            // create object listing the SOUNDEX values for each letter
            // -1 indicates that the letter is not coded, but is used for coding
            //  0 indicates that the letter is omitted for modern census archives
            //                              but acts like -1 for older census archives
            //  1 is for BFPV
            //  2 is for CGJKQSXZ
            //  3 is for DT
            //  4 is for L
            //  5 is for MN my home state
            //  6 is for R
            function makesoundex() {
                this.a = -1
                this.b = 1
                this.c = 2
                this.d = 3
                this.e = -1
                this.f = 1
                this.g = 2
                this.h = 0
                this.i = -1
                this.j = 2
                this.k = 2
                this.l = 4
                this.m = 5
                this.n = 5
                this.o = -1
                this.p = 1
                this.q = 2
                this.r = 6
                this.s = 2
                this.t = 3
                this.u = -1
                this.v = 1
                this.w = 0
                this.x = 2
                this.y = -1
                this.z = 2
            }
            
            var sndx = new makesoundex()
            
            // check to see that the input is valid
            function isSurname(name) {
                if (name == "" || name == null) {
                    return false
                } else {
                    for (var i = 0; i < name.length; i++) {
                        var letter = name.charAt(i)
                        if (!(letter >= 'a' && letter <= 'z' || letter >= 'A' && letter <= 'Z')) {
                            return false
                        }
                    }
                }
                return true
            }
            
            // Collapse out directly adjacent sounds
            // 1. Assume that surname.length>=1
            // 2. Assume that surname contains only lowercase letters
            function collapse(surname) {
                if (surname.length == 1) {
                    return surname
                }
                var right = collapse(surname.substring(1, surname.length))
                if (sndx[surname.charAt(0)] == sndx[right.charAt(0)]) {
                    return surname.charAt(0) + right.substring(1, right.length)
                }
                return surname.charAt(0) + right
            }
            
            // Collapse out directly adjacent sounds using the new National Archives method
            // 1. Assume that surname.length>=1
            // 2. Assume that surname contains only lowercase letters
            // 3. H and W are completely ignored
            function omit(surname) {
                if (surname.length == 1) {
                    return surname
                }
                var right = omit(surname.substring(1, surname.length))
                if (!sndx[right.charAt(0)]) {
                    return surname.charAt(0) + right.substring(1, right.length)
                }
                return surname.charAt(0) + right
            }
            
            // Output the coded sequence
            function output_sequence(seq) {
                var output = seq.charAt(0).toUpperCase() // Retain first letter
                output += "-" // Separate letter with a dash
                var stage2 = seq.substring(1, seq.length)
                var count = 0
                for (var i = 0; i < stage2.length && count < 3; i++) {
                    if (sndx[stage2.charAt(i)] > 0) {
                        output += sndx[stage2.charAt(i)]
                        count++
                    }
                }
                for (; count < 3; count++) {
                    output += "0"
                }
                return output
            }
            
            // Compute the SOUNDEX code for the surname
            function soundex(value) {
                if (!isSurname(value)) {
                    return null
                }
                var stage1 = collapse(value.toLowerCase())
                //form.result.value=output_sequence(stage1);
            
                var stage1 = omit(value.toLowerCase())
                var stage2 = collapse(stage1)
                return output_sequence(stage2);
            
            }
            
            function clean(u) {
                var u = u.replace(/\,/g, "");
                u = u.toLowerCase().split(" ");
                var cw = ["ARRAY OF WORDS TO BE EXCLUDED FROM COMPARISON"];
                var n = [];
                for (var y = 0; y < u.length; y++) {
                    var test = false;
                    for (var z = 0; z < cw.length; z++) {
                        if (u[y] != "" && u[y] != cw[z]) {
                            test = true;
                            break;
                        }
                    }
                    if (test) {
            //Don't use & or $ in comparison
                        var val = u[y].replace("$", "").replace("&", "");
                        n.push(val);
                    }
                }
                return n;
            }
    
    • Andreas
      Andreas almost 12 years
    • Brad Ruderman
      Brad Ruderman almost 12 years
      I am testing this out, still having trouble finding the perfect one. The classic example that breaks these. Say the question is "What are the first two presidents?" and someone answers "adams and washington". A string comparison to "george washington john adams" should be roughly 50%.
    • Shawn Whinnery
      Shawn Whinnery almost 6 years
      oof, depends on jQuery?
  • peresisUser
    peresisUser over 8 years
    The problem with this is that "atest" and "test" return 0%, which we know isn't true.
  • Kousha
    Kousha about 8 years
    Not so accurate: Compare("Apple", "zApple") = 0.07 , while Compare("Apple", "Applez") = 0.84
  • Paul
    Paul about 8 years
    @Kousha , it's positional. "Apple" and "zApple" only have one letter in common (the second p).
  • Kousha
    Kousha about 8 years
    @Paulpro Apple, and zApple have five letters in common logically. It is your implementation fault. Apple, zApple, Applez are similar.
  • Paul
    Paul about 8 years
    @Kousha, zApple isn't similar according to this algorithm, since it's positional. That doesn't make the algorithm incorrect.
  • infinito84
    infinito84 almost 7 years
    An improvement for several words: var similarity2 = function(s1, s2){ var split1 = s1.split(' '); var split2 = s2.split(' '); var sum = 0; var max = 0; var temp = 0; for(var i=0; i<split1.length;i++){ max = 0; for(var j=0; j<split2.length;j++){ temp = similarity(split1[i], split2[j]); if(max < temp) max = temp; } console.log(max); sum += max / split1.length; } return sum; };
  • MarcoS
    MarcoS almost 7 years
    @Paulpro: That doesn't make your algorithm incorrect, but makes it a poor answer for this question...
  • QWERTY
    QWERTY over 6 years
    Is the similarity returned based on matching character? How does it evaluate the similarity
  • QWERTY
    QWERTY over 6 years
    @overlord1234 Does the method above works for string like this: 9e27dbb9ff6eea70821c02b4457cbc6b7eb8e12a64f46c192c3a05f1bc15‌​19acd101193dac157c62‌​33d9d773f9b364ca210d‌​6287f9efa00bfc656746‌​782905be ?
  • overlord1234
    overlord1234 over 6 years
    It does work with strings without a semantic attached to it. Please try and run the in-line code snippet (thanks to David). I get a similarity of 0.17857142857142858 when I input the aforementioned strings.
  • infinito84
    infinito84 about 6 years
    @hyperfkcb he's implementing the edit distance algorithm, which counts how many characters are in the wrong position (more or less), so for calculating the percentage he is taking the longer possible edit distance value (longerLength) and doing (longerLength - editDistance ) / longerLength.
  • GrumpyCrouton
    GrumpyCrouton almost 6 years
    That's great, except stringSimilarity has a dependancy called lodash which contains over 1,000 files being put into your project just so you can get string similarity.
  • Tushar Walzade
    Tushar Walzade almost 6 years
    Yeah, it happens while adding a package locally. But instead, we can use CDN to for a lesser bundle size. Here are the CDN links - jsdelivr.com/package/npm/lodash - jsdelivr.com/package/npm/string-similarity
  • upupming
    upupming over 5 years
    However, it is too slow for long strings.
  • Peter Warbo
    Peter Warbo about 5 years
    @overlord1234 If I would like to exclude chars like !,;. how could I do that? For instance matching Help me! should give 100% match of Help me
  • overlord1234
    overlord1234 almost 5 years
    @PeterWarbo Remove any special characters from the base strings s1 and s2 before doing the comparison. Whether you want to remove the special characters completely or introduce a blank space, that depends on your use case
  • Lovenkrands
    Lovenkrands over 4 years
    They have removed most dependencies, including lodash
  • David Buck
    David Buck over 4 years
    A link to a solution is welcome, but please ensure your answer is useful without it: add context around the link so your fellow users will have some idea what it is and why it’s there, then quote the most relevant part of the page you're linking to in case the target page is unavailable. Answers that are little more than a link may be deleted.
  • KTibow
    KTibow almost 4 years
    Why does it think "aabb" and "bbaa" are 0% of a match?
  • Admin
    Admin over 2 years
    But it doesn't work for caps. It's not case sensitive!
  • Nigrimmist
    Nigrimmist about 2 years
    It is a strange, but "KIA" and "Kia" will produce 0.0. ¯_(ツ)_/¯
  • Tushar Walzade
    Tushar Walzade about 2 years
    @Nigrimmist, you may try with converting them toLowerCase in that case.
  • Nigrimmist
    Nigrimmist about 2 years
    @TusharWalzade it was i did, yes, thank you :)
  • Nigrimmist
    Nigrimmist about 2 years
    Not so good from performance side of view, check my post in this thread. Better to use string-similarity library, it twice faster