SQL Table linking... is it better to have a linking table, or a delimited column?

13,791

Solution 1

If you have a delimited list, finding users with a given role is going to become very expensive: effectively, you need to do a FULL scan of that table, and look at all the values for that column in every row, trying to see if it contains a given role.

A separate table (normalized, many to many relation) is the way to go, and with proper indexing you will not have full scans happening.

eg:

User:  UserId, Name, ....
Role:  RoleId, Name, ....
UserRole:  UserRoleId, UserId, RoleId

(UserRoleId is optional, you could alternatively have the PK be UserId+RoleId, I won't get into the discussion here of surrogate vs compound keys here)

You'll want an index on (UserId, RoleId) that is UNIQUE, to enforce no duplicates. This will also help with any queries where you're trying to see if a specific user has a specific role (WHERE userId = x AND roleId = y)

If you are looking up all the roles a user has, you'll want an index on just UserId.

Conversely, if you are looking up all the users a given role has, an index on just roleId will speed that up. If you don't do this query, or do it very rarely, then not having this index will speed up performance slightly for insert/updates, as it is one less thing to do. This is the careful balancing act that is database tuning.

Solution 2

  1. A table scan means that you don't have any indexes, or your query doesn't allow them to be used. In a security database, you should rarely if ever have to download the entire list of users/roles, unless this is for an admin application. You need to address this in your design.

  2. Delimited lists violate first-normal-form (1NF) and almost always cause problems in the long term. What happens if you want to retrieve all users in a particular role? How do you write that query? Don't go down this road. Normalize it.

  3. If you're using correct column types (i.e. not a varchar(4000) or varchar(max) everywhere), disk space really shouldn't be an issue. Yes, it will grow "exponentially" - so what? Databases are good at this kind of scaling. Unless you're trying to run this on a 10 gig hard drive, it's not something to worry about. And if you are trying to run it on a 10 gig hard drive, you probably have bigger issues to worry about.

Short answer: Don't use a delimited list. Normalize.

Solution 3

The first option. It's called a many-to-many join table. This will perform fine if you create appropriate indexes.

Don't go with the second 'denormalised' option.

Solution 4

You could use a separate table or you could go back to cavemen with chisels. The choice is up to you.

Solution 5

A separate table is the way to go, otherwise you're trying to work around your database engine. A separate table is properly normalised - in general, as an application expands, the better it is normalised, the easier you'll find it to work with. What greg said above is also absolutely right.

Share:
13,791
Richard C
Author by

Richard C

Developer by day, by night, I have a life :) See my profiles on other SE communities to see what I'm about there. Started with Applesoft Basic on the Apple ][. From there I learned VB6 and followed the obvious progression to VB.NET and from there C#, where my focus is at the moment. Currently working on a long-term project, using C#, ASP.NET MVC, MS-SQL, Entity Framework, Bootstrap. I'm having to write a lot more JavaScript than I care for :p I also enjoy the micro-controller scene, Raspberri Pi, Arduino, ESP, IoT, etc.

Updated on June 08, 2022

Comments

  • Richard C
    Richard C almost 2 years

    My database has two tables, one contains a list of users, the other a list of roles. Each user will belong to one or more roles, and of course each role will have multiple users in it.

    I've come across two ways to link the information. The first is to add a third table which contains the ID's from both tables. A simple join will then return all the users that belong to a role, or all the roles to which a user belongs. However, as the database grows, the datasets returned by these simple queries will grow exponentially.

    The second method is to add a column to the users table in which a delimited list of roles is stored. This will eliminate the need for the third linking table, which may have a positive effect on database growth. The downside is that SQL does not have the ability to use delimited lists. The only way I've found to process that information is to use a temporary table and a custom function.

    Is viewing my execution plans, the "table scan" event is the one that takes the most resources. It makes sense that eliminating a table from the equation would speed things up. The function takes up less than 1% of the resources.

    These tests were done on a database with less than 20 records. As the size of the database grows, the table scans will take longer, so perhaps limiting them is the best choice.

    If using the delimited list is a good way to go, why is nobody doing it?

    Please tell me which is your preferred method (even if it's different from my two) and why.

    Thank you.