search List<string> for string .StartsWith()

16,428

Solution 1

1500 is usually too few:

  • you could search it in parallel with a simple divide and conquer of the problem. Search each half of the list in two (or divide into three, four, ..., parts) different jobs/threads.

  • Or store the strings in a (not binary) tree instead. Will be O(log n).

  • sorted in alphabetical order you can do a binary search (sort of the same as the previous one)

Solution 2

Thus 1500 is not really a huge number binary search on sorted list would be enough probably. Nevertheless most efficient algorithms for prefix search are based on the data structure named Trie or Prefix Tree. See: http://en.wikipedia.org/wiki/Trie

Following picture demonstrates the idea very briefly: enter image description here

For c# implementation see for instance .NET DATA STRUCTURES FOR PREFIX STRING SEARCH AND SUBSTRING (INFIX) SEARCH TO IMPLEMENT AUTO-COMPLETION AND INTELLI-SENSE

Solution 3

If you have the list in alpabetical order, you can use a variation of binary search to make it a lot faster.

As a starting point, this will return the index of one of the strings that match the prefix, so then you can look forward and backward in the list to find the rest:

public static int BinarySearchStartsWith(List<string> words, string prefix, int min, int max) {
  while (max >= min) {
    int mid = (min + max) / 2;
    int comp = String.Compare(words[mid].Substring(0, prefix.Length), prefix);
    if (comp < 0) {
      min = mid + 1;
    } else if (comp > 0) {
      max = mid - 1;
    } else {
      return mid;
    }
  }
  return -1;
}

int index = BinarySearchStartsWith(theList, "pre", 0, theList.Count - 1);
if (index == -1) {
  // not found
} else{
  // found
}

Note: If you use a prefix that is longer than any of the strings that are compared, it will break, so you might need to figure out how you want to handle that.

Solution 4

So many approches were analyzed to achive minimum data capacity and high performance. The first place is: all prefixes are stored in dictionary: key - prefix, values - items appropriate for prefix.

Here simple implementation of this algorithm:

public class Trie<TItem>
{
    #region Constructors

    public Trie(
        IEnumerable<TItem> items,
        Func<TItem, string> keySelector,
        IComparer<TItem> comparer)
    {
        this.KeySelector = keySelector;
        this.Comparer = comparer;
        this.Items = (from item in items
                      from i in Enumerable.Range(1, this.KeySelector(item).Length)
                      let key = this.KeySelector(item).Substring(0, i)
                      group item by key)
                     .ToDictionary( group => group.Key, group => group.ToList());
    }

    #endregion

    #region Properties

    protected Dictionary<string, List<TItem>> Items { get; set; }

    protected Func<TItem, string> KeySelector { get; set; }

    protected IComparer<TItem> Comparer { get; set; }

    #endregion

    #region Methods

    public List<TItem> Retrieve(string prefix)
    {
        return  this.Items.ContainsKey(prefix)
            ? this.Items[prefix]
            : new List<TItem>();
    }

    public void Add(TItem item)
    {
        var keys = (from i in Enumerable.Range(1, this.KeySelector(item).Length)
                    let key = this.KeySelector(item).Substring(0, i)
                    select key).ToList();
        keys.ForEach(key =>
        {
            if (!this.Items.ContainsKey(key))
            {
                this.Items.Add(key, new List<TItem> { item });
            }
            else if (this.Items[key].All(x => this.Comparer.Compare(x, item) != 0))
            {
                this.Items[key].Add(item);
            }
        });
    }

    public void Remove(TItem item)
    {
        this.Items.Keys.ToList().ForEach(key =>
        {
            if (this.Items[key].Any(x => this.Comparer.Compare(x, item) == 0))
            {
                this.Items[key].RemoveAll(x => this.Comparer.Compare(x, item) == 0);
                if (this.Items[key].Count == 0)
                {
                    this.Items.Remove(key);
                }
            }
        });
    }

    #endregion
}

Solution 5

You can use PLINQ (Parallel LINQ) to make the execution faster:

var newList = list.AsParallel().Where(x => x.StartsWith(prefixText)).ToList()
Share:
16,428

Related videos on Youtube

Scott Selby
Author by

Scott Selby

I make websites , mostly in c#

Updated on June 04, 2022

Comments

  • Scott Selby
    Scott Selby almost 2 years

    I have a

    List<string>
    

    with 1500 strings. I am now using the following code to pull out only string that start with the string prefixText.

    foreach(string a in <MYLIST>)
    {            
        if(a.StartsWith(prefixText, true, null))
        {
            newlist.Add(a);                   
        }            
    }
    

    This is pretty fast, but I'm looking for google fast. Now my question is if I arrange the List in alphabetical order, then compare char by char can I make this faster? Or any other suggestions on making this faster?

    • Scott Selby
      Scott Selby almost 12 years
      it comes from DB on page_Load then to List then to Cache, then I'm reading List from Cache here
    • walther
      walther almost 12 years
      You populate your list on every page load?
    • Scott Selby
      Scott Selby almost 12 years
      for right now yes, just testing to get it the fastest, this could be easily moved to Page_Init or any number of multiple solutions once I get this running fast
    • walther
      walther
      Why exactly do you have 1500 strings in your List? Where do you get your values from? DB?
    • Shimmy Weitzhandler
      Shimmy Weitzhandler
      Such a shame you asked the question on 1500. You should have asked it on 150000 so we get some more performance-friendly answers.
  • Scott Selby
    Scott Selby almost 12 years
    How to I go about putting List in alphabetical order? Also , it doesn't have to be a List, The strings originally come from DB in Select statement, so they can be in any format
  • Randy Minder
    Randy Minder almost 12 years
    There are a couple ways to do this. If they came from a DB, you could have the DB do the sorting, which would be the most efficient. Or you could use Ling as follows: MyList = UnsortedList.OrderBy( someField ).
  • vidstige
    vidstige almost 12 years
    a short example on how to invoke this would be convenient. For example the min and max parameters might not be superobvious?
  • L.B
    L.B almost 12 years
    .Net already has Array.BinarySearch
  • George Mamaladze
    George Mamaladze almost 12 years
    Agree. 1500 is to few. Nevertheless there are more efficient algorithms for prefix search then your three options. See my answer below.
  • Mattias Isegran Bergander
    Mattias Isegran Bergander almost 12 years
    Yes @achitaka-san and Douglas answer is O(1) even. A prefix tree is my second point above, but not nearly as well described as your solution (which I up voted early on). Should've been more clear perhaps. It's not really O(log n) I guess though as I stated, depends on the strings and key so not related to that (see achitaka-san's Wikipedia link or your favorite data structure text book). In a worst case the strings are long and only differ in the end for example etc etc.