Entity Framework: Avoiding Inserting Duplicates

11,863

I 've had the same problem with EF. Here's what I ended up doing:

  1. Instead of doing story1.Tags.Add(new Tag { Name = ".net", }) yourself, routed all Tag creation through a helper method like this: story1.Tags.Add(GetTag(".net")).
  2. The GetTag method checks the tags in the context to see if it should return an existing entity, like you do. If it does, it returns that.
  3. If there is no existing entity, it checks the ObjectStateManager to see if there are Tag entities added to the context but not already written to the db. If it finds a matching Tag, it returns that.
  4. If it still has not found the Tag, it creates a new Tag, adds it to the context, and then returns it.

In essence this will make sure that no more than one instance of any Tag (be it already existing or just created) will be used throughout your program.

Some example code lifted from my project (uses InventoryItem instead of Tag, but you get the idea).

The check in step 3 is done like this:

// Second choice: maybe it's not in the database yet, but it's awaiting insertion?
inventoryItem = context.ObjectStateManager.GetObjectStateEntries(EntityState.Added)
    .Where(ose => ose.EntitySet == context.InventoryItems.EntitySet)
    .Select(ose => ose.Entity)
    .Cast<InventoryItem>()
    .Where(equalityPredicate.Compile())
    .SingleOrDefault();

if (inventoryItem != null) {
    return inventoryItem;
}

If the Tag is not found in step 3, here's the code for step 4:

inventoryItem = new InventoryItem();
context.InventoryItems.AddObject(inventoryItem);
return inventoryItem;

Update:

It should be used like this:

Story story1 = new Story();
story1.Title = "Introducing the Entity Framework";
story1.Tags.Add(GetTag(".net", category, db));
story1.Tags.Add(GetTag("database", category, db));
Share:
11,863
Mike Borozdin
Author by

Mike Borozdin

Updated on June 24, 2022

Comments

  • Mike Borozdin
    Mike Borozdin almost 2 years

    Say, I have the following conceptual model, there are strories that have tags (more than one, so it's a many-to-many relationship), plus each tag belongs to a particular category.

    My data comes from an external source and before inserting it I want to make sure that no duplicated tags are added.

    Updated code snippet:

    static void Main(string[] args)
        {
            Story story1 = new Story();
            story1.Title = "Introducing the Entity Framework";
            story1.Tags.Add(new Tag { Name = ".net",  });
            story1.Tags.Add(new Tag { Name = "database" });
    
            Story story2 = new Story();
            story2.Title = "Working with Managed DirectX";
            story2.Tags.Add(new Tag { Name = ".net" });
            story2.Tags.Add(new Tag { Name = "graphics" });
    
            List<Story> stories = new List<Story>();
            stories.Add(story1);
            stories.Add(story2);
    
            EfQuestionEntities db = new EfQuestionEntities();
    
            Category category = (from c in db.Categories
                                 where c.Name == "Programming"
                                 select c).First();
    
            foreach (Story story in stories)
            {
                foreach (Tag tag in story.Tags)
                {
                    Tag currentTag = tag;
                    currentTag = GetTag(tag.Name, category, db);
                }
    
                db.Stories.AddObject(story);
            }
    
            db.SaveChanges();
        }
    
        public static Tag GetTag(string name, Category category, EfQuestionEntities db)
        {
            var dbTag = from t in db.Tags.Include("Category")
                        where t.Name == name
                        select t;
    
            if (dbTag.Count() > 0)
            {
                return dbTag.First();
            }
    
            var cachedTag = db.ObjectStateManager.GetObjectStateEntries(EntityState.Added).
                Where(ose => ose.EntitySet == db.Tags.EntitySet).
                Select(ose => ose.Entity).
                Cast<Tag>().Where(x => x.Name == name);
    
            if (cachedTag.Count() != 0) 
            {
                return cachedTag.First();
            }
    
            Tag tag = new Tag();
            tag.Name = name;
            tag.Category = category;
    
            db.Tags.AddObject(tag);
    
            return tag;
        }
    

    However, I get an exception about an object with the same EntityKey that is already present in the ObjectContext.

    Also, if I remove the else statement I will get an exception about violating an FK constraint, so it seems like its Category attribute is set to null.

  • Mike Borozdin
    Mike Borozdin about 13 years
    Thank you! I will try that now. By the way, are you still using the EF after facing pit falls like that one?
  • Waihon Yew
    Waihon Yew about 13 years
    @Mike: I am. There was nothing that I couldn't find an acceptable way to solve given my modest needs, and what I was using before EF (SubSonic) was even worse.
  • Mike Borozdin
    Mike Borozdin about 13 years
    @Jon: Thanks also for that code snippet. Unfortunately, I'm still getting that FK violation exception that must mean there are tracked entities with Category = null
  • Waihon Yew
    Waihon Yew about 13 years
    @Mike: Please update your code snippet if you made changes. Also, don't assign EntityKey as Ladislav says.
  • Mike Borozdin
    Mike Borozdin about 13 years
    @Jon, I see the point now, although I haven't tried it. But do you think it's possible somehow to check for dupes while looping?
  • Waihon Yew
    Waihon Yew about 13 years
    @Mike: Have Tag implement IEquatable<Tag> and then LINQ tags.Distinct().ToArray() will filter out the dupes automatically.
  • Mike Borozdin
    Mike Borozdin about 13 years
    @Jon, all right. I went with your first solution. I just had to revamp my code a little bit. Thank you once again :).