Getting unique items from a list
Solution 1
Use a HashSet<T>
. For example:
var items = "A B A D A C".Split(' ');
var unique_items = new HashSet<string>(items);
foreach (string s in unique_items)
Console.WriteLine(s);
prints
A B D C
Solution 2
You can use the Distinct
method to return an IEnumerable<T>
of distinct items:
var uniqueItems = yourList.Distinct();
And if you need the sequence of unique items returned as a List<T>
, you can add a call to ToList
:
var uniqueItemsList = yourList.Distinct().ToList();
Solution 3
You can use Distinct extension method from LINQ
Solution 4
In .Net 2.0 I`m pretty sure about this solution:
public IEnumerable<T> Distinct<T>(IEnumerable<T> source)
{
List<T> uniques = new List<T>();
foreach (T item in source)
{
if (!uniques.Contains(item)) uniques.Add(item);
}
return uniques;
}
Solution 5
Apart from the Distinct
extension method of LINQ, you could use a HashSet<T>
object that you initialise with your collection. This is most likely more efficient than the LINQ way, since it uses hash codes (GetHashCode
) rather than an IEqualityComparer
).
In fact, if it's appropiate for your situation, I would just use a HashSet
for storing the items in the first place.
Related videos on Youtube
Comments
-
domgreen almost 2 years
What is the fastest / most efficient way of getting all the distinct items from a list?
I have a
List<string>
that possibly has multiple repeating items in it and only want the unique values within the list.-
Eduardo Pignatelli almost 6 yearsThe title of this question is misleading. Selecting unique items is about selecting items that occur just once in the list, against selecting each distinct element,once. Given
["A", "B", "C", "C", "D", "D"]
, unique items would return["A","B"]
, whereas distinct items would return["A", "B", "C", "D"]
. -
Suncat2000 over 5 years@EduardoPignatelli Quite picky, but the question could be reworded unambiguously. The intent of this question as normally encountered means: "Given a list of values, how do I get a list of those values without duplicating any?"
-
-
Noon Silk over 14 yearsMust agree; others solve the problem, yours solves the cause :)
-
LukeH over 14 yearsA
HashSet
won't maintain any ordering, which may or may not be an issue for the OP. -
LukeH over 14 yearsA
HashSet
won't maintain any ordering, which may or may not be an issue for the OP. -
Noldorin over 14 years@Luke: Even so, ordering would have no meaning after calling
Distinct
... -
Noldorin over 14 yearsThe OP was looking for a fast/efficient method. This is not it. Calling
yourList.Distinct().ToList()
requires two full iterations over the enumerable, and additionally is based offIEqualityComparer
, which is slower thanGetHashCode
. -
Vinay Sajip over 14 years@Luke: The question asks about fastest/most efficient, and doesn't require ordering to be maintained.
-
Vinay Sajip over 14 yearsIs this faster/more efficient than a HashSet<T>? I don't think so. Not bothering to downvote, though :-)
-
LukeH over 14 years@Noldorin: Why not?
Distinct
should/does iterate the list in order (although I'm not sure if that's actually guaranteed in any spec). -
Noldorin over 14 years@Luke: Oh, I was thinking of indexing really. And anyway, efficiency was mentioned in the OP, while order wasn't (though that's open question) -
HashSet
is the way to go if you want good performance. -
domgreen over 14 yearsthanks guys, I don't require the items to be ordered. This works great.
-
LukeH over 14 years@Noldorin, @Vinay: If the OP needs the distinct items returned as a
List
then they'll need to callToList
, regardless of whether they useDistinct
or construct aHashSet
. Having said that, you're right that aHashSet
will probably have better performance thanDistinct
in most circumstances. -
reavowed almost 13 years@Noldorin: I know this is old, but it shows up easily on Google and you're wrong (at least, as of .NET 4 - I haven't checked in older versions). yourList.Distinct().ToList() performs one enumeration, new HashSet<T>(yourList).ToList() performs two. And the implementations of HashSet and Distinct's internal Set class are almost identical. They both use GetHashCode, and they both use IEqualityComparers (which they have to, as equal hashcodes don't (in general) guarantee equal objects).
-
reavowed almost 13 years@Noldorin: How would a performance benchmark make any argument for or against what I said? You can verify what I said by pulling up System.Linq.Enumerable.DistinctIterator<T> and System.Linq.Set<T> in Reflector (or other .NET decompiler), independent of relative performance.
-
Noldorin almost 13 years@IainM: Sorry, you're right. I was reading into your post and taking the implication that they are similar in speed. I am still very interested if they actually are. I suspect the difference is still there, though it has possibly gone down since .NET 4.0.
-
Timo over 8 yearsPlease use a collection with faster random access than List, such as a Dictionary or HashSet. Because currently, if
source
contains 100,000 items with many duplicates, then in every one of the 100,000 iterations you will be scanning a list on the order of 100,000 items, meaning you are scanning on the order of100,000 * 100,000
items. Quadratic time complexity can become quite slow. -
guneysus about 7 yearsNote that
Distinct
is an extension method and lives inSystem.Linq
namespace.public static IEnumerable<TSource> Distinct<TSource> (this IEnumerable<TSource> source)