Are there any Fuzzy Search or String Similarity Functions libraries written for C#?

39,666

Solution 1

Levenshtein distance implementation:

I have a .NET 1.1 project in which I use the latter. It's simplistic, but works perfectly for what I need. From what I remember it needed a bit of tweaking, but nothing that wasn't obvious.

Solution 2

you can also look at the very impressive library titled Sam's String Metrics https://github.com/StefH/SimMetrics.Net . this includes a host of algorithms.

  • Hamming distance
  • Levenshtein distance
  • Needleman-Wunch distance or Sellers Algorithm
  • Smith-Waterman distance
  • Gotoh Distance or Smith-Waterman-Gotoh distance
  • Block distance or L1 distance or City block distance
  • Monge Elkan distance
  • Jaro distance metric
  • Jaro Winkler
  • SoundEx distance metric
  • Matching Coefficient
  • Dice’s Coefficient
  • Jaccard Similarity or Jaccard Coefficient or Tanimoto coefficient
  • Overlap Coefficient
  • Euclidean distance or L2 distance
  • Cosine similarity
  • Variational distance
  • Hellinger distance or Bhattacharyya distance
  • Information Radius (Jensen-Shannon divergence)
  • Harmonic Mean
  • Skew divergence
  • Confusion Probability
  • Tau
  • Fellegi and Sunters (SFS) metric
  • TFIDF or TF/IDF
  • FastA
  • BlastP
  • Maximal matches
  • q-gram
  • Ukkonen Algorithms

Solution 3

They are not my own invention, but they are my favorites and I've just blogged about them and published my own tweaked versions of Dice Coefficient, Levenshtein Distance, Longest Common Subsequence and Double Metaphone in a blog post called Four Functions for Finding Fuzzy String Matches in C# Extensions.

Solution 4

Have you taken a look at Lucene.net? It is a port of the Java Lucene search engine API to the .Net platform. That library offers a lot of search functionality. I played around with it a year or so ago, so don't take my suggestion as based on tons of experience. I saw it in the book Windows Developer Power Tools and took it for a test drive. You might look through their API documentation to see if it offers something like the Fuzzy Search for which you are looking.

Solution 5

There is the following Levenshtein Distance Algorithm which assigns a value to the similarity of two strings (well, the difference actually), that could be used to build upon: http://www.merriampark.com/ldcsharp.htm

Share:
39,666
Luca Molteni
Author by

Luca Molteni

Functional Programmer. In love with Haskell since 2009. Feel free to contact me: [email protected]

Updated on July 05, 2022

Comments

  • Luca Molteni
    Luca Molteni almost 2 years

    There are similar question, but not regarding C# libraries I can use in my source code.

    Thank you all for your help.

    I've already saw lucene, but I need something more easy to search for similar strings and without the overhead of the indexing part.

    The answer I marked has got two very easy algorithms, and one uses LINQ too, so it's perfect.

  • Thomas Levesque
    Thomas Levesque over 14 years
    Why do you say "Using LINQ" ? None of these implementations uses Linq...
  • Abhishek
    Abhishek over 14 years
    Indeed you are correct. I could have sworn that there was some LINQ-love in it, or at least that the headline claimed it was LINQy or something.
  • Eugeniu Torica
    Eugeniu Torica over 14 years
    Could you please tell how to get the degree of similarity using Lucene?
  • Jason Jackson
    Jason Jackson over 14 years
    Sorry, I have not used it professionally. As I mentioned in my post, I just played around with it probably around 2007/2008.
  • Hamish Grubijan
    Hamish Grubijan about 13 years
    What if I have 100,000 entries to search, and I want to show the top 20 candidates each time?
  • Paul Ruane
    Paul Ruane almost 13 years
    The link in this answer is giving me a 403 error. You can use the Wayback Machine instead.
  • Roman
    Roman over 12 years
    Dead link may now be here dotnetperls.com/levenshtein
  • dalenewman
    dalenewman over 12 years
    I believe the .NET version of the library mentioned above is here. After I converted it to Visual Studio 2010, and updated NUnit references, it builds. It also passes 87 tests.
  • cjbarth
    cjbarth almost 12 years
    These are ready-made in a class that you can just drop into your project. This is the easy man's way to go.
  • AechoLiu
    AechoLiu about 10 years
    Maybe the book Lucene in Action, 2ed could tell how to get the degree of similarity.
  • Maneesh Babu M
    Maneesh Babu M about 9 years
  • Good Night Nerd Pride
    Good Night Nerd Pride about 9 years
    Updated link to the blog post: tsjensen.com/blog/post/2011/05/27/…
  • Spiralis
    Spiralis over 6 years
    I found a .net library version of this library on SimMetrics.Net on GitHub. The same as the suggestion from @dalenewman, just on github perhaps?