Distinct in Entity framework

21,134

Solution 1

This is nice and easy:

List<Post> posts = posts
.GroupBy(x => x.Id)
.Select(x => x.FirstOrDefault())

But if you want to write it the proper way, I'd advise you to write it like this:

public class PostComparer : IEqualityComparer<Post>
{
    #region IEqualityComparer<Post> Members

    public bool Equals(Post x, Post y)
    {
        return x.Id.Equals(y.Id);
    }

    public int GetHashCode(Post obj)
    {
        return obj.Id.GetHashCode();
    }

    #endregion
}

As it will give you more freedom when it comes to additional comparisons. having written this class you can use it like this:

List<Post> posts = postsFromDatabase.Distinct(new PostComparer()).ToList();

Solution 2

I think that write your own custom comparer is a good approach.

Here is an article in msdn that explains the topic very well: http://support.microsoft.com/kb/320727

The reason that the Distinct are not working its that Distinct() has no idea about how to detemine if there are equals, so it's using the reference to determine it it's the same "object". It's working like it's suposed to work. All the classes in the query are not the same object.

By writing your own comparer (it's easy) you can tell to Distinct() how to make the comparation to determine if they are equals.

Edit: If not using Distinct isn't a problem and the situation isn't frecuent, the first answer of Piotr Justyna it's simple and effective.

Share:
21,134
Javier Hertfelder
Author by

Javier Hertfelder

I am a Software engineer with more than 9 years of experience. In the last 4 years, I have assisted FXStreet - a leading Forex market information website - with the attainment of more than 4 million page views every month to recover from an unexpected “hiccup” 90% of the IT team left in less than 2 months. I was hired to accomplish two goals. Firstly, to rebuild the whole department from scratch. The second aim was to lead the reformation of the whole FXStreet Website and their 15 year old monolithic architecture. We switched from a monolithic to a microservices oriented architecture easy to maintain, monitor and develop. After one year and a half, thanks to an exceptional group of engineers, we achieved it, not without pain, not without effort. We now have a solid department where everybody has a voice, where everybody shares information and where everybody enjoys agile development.

Updated on October 06, 2020

Comments

  • Javier Hertfelder
    Javier Hertfelder over 3 years

    I have a List of objects that some of them have the same Ids, so I would like to remove those elements that are duplicated.

    I tried with something like this:

    List<post> posts = postsFromDatabase.Distinct().ToList();
    

    But it doesn't work!

    So I wrote this method in order to avoid the duplicates:

    public List<Post> PostWithOutDuplicates(List<Post> posts)
        {
            List<Post> postWithOutInclude = new List<Post>();
            var noDupes = posts.Select(x => x.Id).Distinct();
            if (noDupes.Count() < posts.Count)
            {
                foreach (int idPost in noDupes)
                {
                    postWithOutInclude.Add(posts.Where(x => x.Id == idPost).First());
                }
                return postWithOutInclude;
            }
            else
            {
                return posts;
            }
        }
    

    Any ideas of how to improve the performance??

    Thanx in advance.

  • Slauma
    Slauma over 12 years
    I think in GetHashCode you must use obj.Id.GetHashCode() because the hashcode must be the same for two objects which are equal according to the Equals method (at least MSDN says this).
  • Piotr Justyna
    Piotr Justyna over 12 years
    Well spotted! There should be Id.GetHashCode(), you're right. If anyone's interested: msdn.microsoft.com/en-us/library/ms132151.aspx
  • Captain Kenpachi
    Captain Kenpachi about 11 years
    I would suggest using a GroupBy() if at all possible.
  • Markus Knappen Johansson
    Markus Knappen Johansson about 11 years
    This will handle this when the data is in memory. Not good. Use the GroupBy-approch: stackoverflow.com/questions/8560884/…
  • Piotr Justyna
    Piotr Justyna about 11 years
    Thanks for your comment and the link, Markus. It's a pity the question is so old, because we could clarify with the OP if the objects (postsFromDatabase) are already in the memory. At the time I wrote this answer I think everybody assumed they are, hence I advised using the IEqualityComparer since (judging from my experience) it proves to be less expensive.