C# Determine Duplicate in List

118,398

Solution 1

Unless I'm missing something, then you should be able to get away with something simple using Distinct(). Granted it won't be the most complex implementation you could come up with, but it will tell you if any duplicates get removed:

var list = new List<string>();

// Fill the list

if(list.Count != list.Distinct().Count())
{
     // Duplicates exist
}

Solution 2

According to Eric White's article on how to Find Duplicates using LINQ:

An easy way to find duplicates is to write a query that groups by the identifier, and then filter for groups that have more than one member. In the following example, we want to know that 4 and 3 are duplicates:

int[] listOfItems = new[] { 4, 2, 3, 1, 6, 4, 3 };
var duplicates = listOfItems
    .GroupBy(i => i)
    .Where(g => g.Count() > 1)
    .Select(g => g.Key);
foreach (var d in duplicates)
    Console.WriteLine(d); // 4,3

Solution 3

In order to allow short circuiting if the duplicate exists early in the list, you can add a HashSet<T> and check the return value of its .Add method.

By using .Any you can short circuit the enumeration as soon as you find a duplicate.

Here's a LINQ extension method in both C# and VB:

CSharp:

public static bool ContainsDuplicates<T>(this IEnumerable<T> enumerable)
{
    var knownKeys = new HashSet<T>();
    return enumerable.Any(item => !knownKeys.Add(item));
}

Visual Basic:

<Extension>
Public Function ContainsDuplicates(Of T)(ByVal enumerable As IEnumerable(Of T)) As Boolean
    Dim knownKeys As New HashSet(Of T)
    Return enumerable.Any(Function(item) Not knownKeys.Add(item))
End Function

Note: to check if there are no duplicates, just change Any to All

Solution 4

Place all items in a set and if the count of the set is different from the count of the list then there is a duplicate.

bool hasDuplicates<T>(List<T> myList) {
    var hs = new HashSet<T>();

    for (var i = 0; i < myList.Count; ++i) {
        if (!hs.Add(myList[i])) return true;
    }
    return false;
}

Should be more efficient than Distinct as there is no need to go through all the list.

Solution 5

You can use IEnumerable.GroupBy method.

var list = new List<string> {"1", "2","3", "1", "2"};
var hasDuplicates = list.GroupBy(x => x).Any(x => x.Skip(1).Any());
Share:
118,398

Related videos on Youtube

kakridge
Author by

kakridge

Updated on June 19, 2021

Comments

  • kakridge
    kakridge almost 3 years

    Requirement: In an unsorted List, determine if a duplicate exists. The typical way I would do this is an n-squared nested loop. I'm wondering how others solve this. Is there an elegant, high performance method in Linq? Something generic that takes a lambda or a comparer would be nice.

    • Peter Perháč
      Peter Perháč about 13 years
      i remember seeing this question on here before and people suggested some neat trick I can't remember what it was though... wait for it... jon skeet is around
    • Trinidad
      Trinidad about 13 years
      Your question seems to be answered, you should mark it accordingly, if not satisfied you can edit your question to explain it more clearly. ;)
  • BrokenGlass
    BrokenGlass about 13 years
    + 1 if I recall correctly Distinct() uses a Hashtable internally, so should be O(n)
  • Peter Perháč
    Peter Perháč about 13 years
    i wonder how fast distinct is... whether it's not doing the "n-square nested loop" as Kenny mentioned he'd like to avoid.
  • Justin Niessner
    Justin Niessner about 13 years
    This will definitely work but will take longer than necessary (the OP only needs to know if duplicates exist or not...not what the duplicate values are).
  • Trinidad
    Trinidad about 13 years
    HashSet seems more straight forward to use.
  • helloV
    helloV about 13 years
    Don't call list.Count() method. Use the Count property instead. I know LINQ is optimized and it'll use it internally but still I think it's better to use the property.
  • helloV
    helloV about 13 years
    Don't call list.Count() method. Use the Count property instead. I know LINQ is optimized and it'll use it internally but still I think it's better to use the property.
  • Ian P
    Ian P about 13 years
    Yeah that does make more sense.
  • B Bulfin
    B Bulfin about 13 years
    @Trinidad: but will not give you a count
  • Jim Mischel
    Jim Mischel about 13 years
    Granted that it will be more efficient if there are duplicates. But if there are no duplicates, then it does the same amount of work. Which one to use probably depends on whether the "normal" case is that there are aren't duplicates.
  • Trinidad
    Trinidad about 13 years
    @recursive, that's not part of the problem. See: In an unsorted List, determine if a duplicate exists
  • Jim Mischel
    Jim Mischel about 13 years
    @Petar Petrov: Good point. Probably should just use foreach. And make the parameter IEnumerable<T> rather than List<T>.
  • Justin Niessner
    Justin Niessner about 13 years
    @Petar - You're right. I just got a little parenthesis happy when I was writing the original post. Fixed.
  • kakridge
    kakridge about 13 years
    This was actually my first thought. Thanks to BrokenGlass for confirming that Distinct() is O(n).
  • liang
    liang almost 11 years
    This is more helpful if you need to know the duplicate values are.
  • CrnaStena
    CrnaStena almost 10 years
    This is ok, if you don't care about case-sensitivity. new List<string>{"Don","don"}; will not report duplicates. You will need to preprocess data before doing counts.
  • Drew Noakes
    Drew Noakes over 8 years
    This is nice an elegant, and is similar to the approach described here that returns duplicates as well.
  • Vincent Saelzler
    Vincent Saelzler over 7 years
    @PetarPetrov - with regards to .Count versus.Count() I need to use .Count(). If I do not, then I get an error that states Operator '!=' cannot be applied to operands of type 'method group' and 'method group'
  • Vincent Saelzler
    Vincent Saelzler over 7 years
    Edit to my above comment: My variable was an IEnumerable as opposed to a list. Never mind @PetarPetrov in the case asked by the poster you were totally right!
  • Jean-Charbel VANNIER
    Jean-Charbel VANNIER over 6 years
    This solution doesn't seems fast as you access the List 3 times. I would consider adding elements to an HasSet until it return false.
  • Fabiano Tarlao
    Fabiano Tarlao almost 6 years
    IMHO, I think your answer does not reply to the question. I have understood that the question is about, given an already existing list..finding out a duplicate. You are suggesting a way to manually populate a list, by online outputting when a duplicate is inserted. Also you named uniquelist the list BUT you permit the duplicate insertion that I suppose was not your intention (small bug). Am I right?
  • Fabiano Tarlao
    Fabiano Tarlao almost 6 years
    Edit: finding out a duplicate, -> .. finding out IF it contains a duplicate :-)
  • wilfy
    wilfy almost 6 years
    I take your point, however while doing that little exercise (I'm no expert, this is part of a course I am doing), I found this page while looking for a way of determining if the input would be a duplicate within a list. So, the answer stands for other people out there who may not know quite what they are looking for. I couldn't find the answer "if (uniqueNums.Contains(input))" while researching, so perhaps this might help somebody else in their early stages of coding life! :-) That might answer your other question, yes, the input wasn't prevented, that wasn't part of the exercise.
  • Fabiano Tarlao
    Fabiano Tarlao almost 6 years
    Got your point, but I still think that this is wrong. In the case a question doesn't fit exactly with the answer (you have), it is better to create a new question (for your answer) and simply self-answer to your same question. This is a perfectly legit way to do things. Furthermore, being this question different, it is a bit difficult for people to find out your code snippet that is for a different problem. Perhaps a small blog post makes more sense in your case. But this is only my opinion. Final remark: sometimes by writing answers that do not fit the question you risk downvotes. Regards
  • Luca Salzani
    Luca Salzani over 5 years
    First Google hit for my problem. Definitly a great piece of info to have here
  • Michael M
    Michael M almost 4 years
    This is perfect, as I am new to C#, and needed something to track the count for each instance within a set of values (e.g. 20,000+ filenames pulled from an http resource), and I want to know whether any duplicates exist before potentially overwriting files with duplicate filenames. A Dictionary is what I was considering, so it is heartening to see it recommended here.
  • nawfal
    nawfal almost 2 years
    For efficient count comparisons, you could make use of MoreLINQ. It got bunch of cool extensions like AtLeast, AtMost, Exactly etc. In this case its better to call: if (list.Distinct().Exactly(list.Count))