search List<string> for string .StartsWith()
Solution 1
1500 is usually too few:
you could search it in parallel with a simple divide and conquer of the problem. Search each half of the list in two (or divide into three, four, ..., parts) different jobs/threads.
Or store the strings in a (not binary) tree instead. Will be O(log n).
sorted in alphabetical order you can do a binary search (sort of the same as the previous one)
Solution 2
Thus 1500 is not really a huge number binary search on sorted list would be enough probably. Nevertheless most efficient algorithms for prefix search are based on the data structure named Trie or Prefix Tree. See: http://en.wikipedia.org/wiki/Trie
Following picture demonstrates the idea very briefly:
For c# implementation see for instance .NET DATA STRUCTURES FOR PREFIX STRING SEARCH AND SUBSTRING (INFIX) SEARCH TO IMPLEMENT AUTO-COMPLETION AND INTELLI-SENSE
Solution 3
If you have the list in alpabetical order, you can use a variation of binary search to make it a lot faster.
As a starting point, this will return the index of one of the strings that match the prefix, so then you can look forward and backward in the list to find the rest:
public static int BinarySearchStartsWith(List<string> words, string prefix, int min, int max) {
while (max >= min) {
int mid = (min + max) / 2;
int comp = String.Compare(words[mid].Substring(0, prefix.Length), prefix);
if (comp < 0) {
min = mid + 1;
} else if (comp > 0) {
max = mid - 1;
} else {
return mid;
}
}
return -1;
}
int index = BinarySearchStartsWith(theList, "pre", 0, theList.Count - 1);
if (index == -1) {
// not found
} else{
// found
}
Note: If you use a prefix that is longer than any of the strings that are compared, it will break, so you might need to figure out how you want to handle that.
Solution 4
So many approches were analyzed to achive minimum data capacity and high performance. The first place is: all prefixes are stored in dictionary: key - prefix, values - items appropriate for prefix.
Here simple implementation of this algorithm:
public class Trie<TItem>
{
#region Constructors
public Trie(
IEnumerable<TItem> items,
Func<TItem, string> keySelector,
IComparer<TItem> comparer)
{
this.KeySelector = keySelector;
this.Comparer = comparer;
this.Items = (from item in items
from i in Enumerable.Range(1, this.KeySelector(item).Length)
let key = this.KeySelector(item).Substring(0, i)
group item by key)
.ToDictionary( group => group.Key, group => group.ToList());
}
#endregion
#region Properties
protected Dictionary<string, List<TItem>> Items { get; set; }
protected Func<TItem, string> KeySelector { get; set; }
protected IComparer<TItem> Comparer { get; set; }
#endregion
#region Methods
public List<TItem> Retrieve(string prefix)
{
return this.Items.ContainsKey(prefix)
? this.Items[prefix]
: new List<TItem>();
}
public void Add(TItem item)
{
var keys = (from i in Enumerable.Range(1, this.KeySelector(item).Length)
let key = this.KeySelector(item).Substring(0, i)
select key).ToList();
keys.ForEach(key =>
{
if (!this.Items.ContainsKey(key))
{
this.Items.Add(key, new List<TItem> { item });
}
else if (this.Items[key].All(x => this.Comparer.Compare(x, item) != 0))
{
this.Items[key].Add(item);
}
});
}
public void Remove(TItem item)
{
this.Items.Keys.ToList().ForEach(key =>
{
if (this.Items[key].Any(x => this.Comparer.Compare(x, item) == 0))
{
this.Items[key].RemoveAll(x => this.Comparer.Compare(x, item) == 0);
if (this.Items[key].Count == 0)
{
this.Items.Remove(key);
}
}
});
}
#endregion
}
Solution 5
You can use PLINQ (Parallel LINQ) to make the execution faster:
var newList = list.AsParallel().Where(x => x.StartsWith(prefixText)).ToList()
Related videos on Youtube
Comments
-
Scott Selby almost 2 years
I have a
List<string>
with 1500 strings. I am now using the following code to pull out only string that start with the string prefixText.
foreach(string a in <MYLIST>) { if(a.StartsWith(prefixText, true, null)) { newlist.Add(a); } }
This is pretty fast, but I'm looking for google fast. Now my question is if I arrange the List in alphabetical order, then compare char by char can I make this faster? Or any other suggestions on making this faster?
-
Scott Selby almost 12 yearsit comes from DB on page_Load then to List then to Cache, then I'm reading List from Cache here
-
walther almost 12 yearsYou populate your list on every page load?
-
Scott Selby almost 12 yearsfor right now yes, just testing to get it the fastest, this could be easily moved to Page_Init or any number of multiple solutions once I get this running fast
-
waltherWhy exactly do you have 1500 strings in your List? Where do you get your values from? DB?
-
Shimmy WeitzhandlerSuch a shame you asked the question on 1500. You should have asked it on 150000 so we get some more performance-friendly answers.
-
-
Scott Selby almost 12 yearsHow to I go about putting List in alphabetical order? Also , it doesn't have to be a List, The strings originally come from DB in Select statement, so they can be in any format
-
Randy Minder almost 12 yearsThere are a couple ways to do this. If they came from a DB, you could have the DB do the sorting, which would be the most efficient. Or you could use Ling as follows: MyList = UnsortedList.OrderBy( someField ).
-
vidstige almost 12 yearsa short example on how to invoke this would be convenient. For example the min and max parameters might not be superobvious?
-
L.B almost 12 years.Net already has Array.BinarySearch
-
George Mamaladze almost 12 yearsAgree. 1500 is to few. Nevertheless there are more efficient algorithms for prefix search then your three options. See my answer below.
-
Mattias Isegran Bergander almost 12 yearsYes @achitaka-san and Douglas answer is O(1) even. A prefix tree is my second point above, but not nearly as well described as your solution (which I up voted early on). Should've been more clear perhaps. It's not really O(log n) I guess though as I stated, depends on the strings and key so not related to that (see achitaka-san's Wikipedia link or your favorite data structure text book). In a worst case the strings are long and only differ in the end for example etc etc.