Many to Many Relation Design - Intersection Table Design

sql tsql database-design

12,760

Solution 1

The second version is the best for me, it avoids the creation of an extra index.

Solution 2

It's a topic of some debate. I prefer the first form because I consider it better to be able to look up the mapping row by a single value, and like to enforce a perhaps-foolish consistency of having a single-column primary key available in every table without fail. Others think that having that column is a silly waste of space.

We meet for a few rounds of bare-knuckle boxing at a bar in New Jersey every couple of months.

Solution 3

Use version-1 if your "intersection" actually IS an entity on its own, meaning:

it has additional properties
you may search for those objects (and not navigating a relation)

User version-2 if it is purely N-M relation table. In which case also make sure that:

you have your PK (CLUSTERED) with the first column relating to the table your search more often: for example if your tables are Person-Address, then I would assume that you would search for all-addresses-of-a-person more often then for all-people-at-this-address. So you should put your PK to include PersonID first
you can still have another one-column UNIQUE identifier, but just:
1. either do not make it a PK
2. or make it a PK, but specify NON CLUSTERED, so that you can use CLUSTERED for the UNIQUE index covering two referencing columns
3. unless you use GUIDs by your design, you could then stick to INT IDENTITY column type

In both cases you may want to create another INDEX covering the 2 columns, but in another order, if you search from the other side of the relation often.

Solution 4

if you go with the first, just use an IDENTITY on the PK, you don't need to waste the space (disk and memory cache) with a UNIQUEIDENTIFIER .

Solution 5

What you are building is called an "Intersection".

I have a very clear memory of my database professor in school saying that an intersection relationship is nearly always an entity in its own right, and so it's normally worth allocating space for it as such. This would indicate that former is more "correct".

That said, I personally tend to prefer the latter. It really comes down to whether you will ever retrieve one of these records directly or if you'll only use the table when joining on one of the original tables.

View more solutions

12,760

Dan McClain

Sr. Engineer at Twitch

Updated on June 12, 2020

Comments

Dan McClain about 4 years
I'm wondering what a better design is for the intersection table for a many-to-many relationship.

The two approaches I am considering are:
```
CREATE TABLE SomeIntersection 
(
     IntersectionId UNIQUEIDENTIFIER PRIMARY KEY,
     TableAId UNIQUEIDENTIFIER REFERENCES TableA NOT NULL,
     TableBId UNIQUEIDENTIFIER REFERENCES TableB NOT NULL,
     CONSTRAINT IX_Intersection UNIQUE(TableAId, TableBId )
) 
```
or
```
CREATE TABLE SomeIntersection 
(
     TableAId UNIQUEIDENTIFIER REFERENCES TableA NOT NULL,
     TableBId UNIQUEIDENTIFIER REFERENCES TableB NOT NULL,
     PRIMARY KEY(TableAId, TableBId )
) 
```
Are there benefits to one over the other?
EDIT 2:****Please Note: I plan to use Entity Framework to provide an API for the database. With that in mind, does one solution work better with EF than the other?

EDIT: On a related note, for a intersection table that the two columns reference the same table (example below), is there a way to make the two fields differ on a record?
```
CREATE TABLE SomeIntersection 
(
     ParentRecord INT REFERENCES TableA NOT NULL,
     ChildRecord INT REFERENCES TableA NOT NULL,
     PRIMARY KEY(TableAId, TableBId )
)
```
I want to prevent the following
```
ParentRecord          ChildRecord
=================================
      1                    1         --Cyclical reference! 
```
- van about 15 years
  
  To prevent your example (which is called "Self-reference"), it is enough to add a CHECK CONSTRAINT on the TABLE level "ALTER TABLE dbo.SomeIntersection ADD CONSTRAINT CHK_SomeIntersection_SelfRefNoNoNo CHECK (ParentRecord <> ChildRecord)" --- But you will not solve the case A->B->C->A in this simple way.
dance2die about 15 years

you brought up a great point... I think what the professor said makes sense to me.
Tom H about 15 years

Why should it have an ID because other columns are added?
Brian about 15 years

The first one does as well "CONSTRAINT IX_Intersection UNIQUE(TableAId, TableBId "
Tim Sullivan about 15 years

Because it's suddenly an item in its own right, and you may end up needing to refer to it easily. No table requires an ID, it's just a handy thing to have. As I said, rule of thumb.
Dan McClain about 15 years

I've accepted this answer since it works best with Entity Framework. The junction/intersection table folds nicely into the Entities generated in the model.
Gilbert Le Blanc about 14 years

You will need another index in the second version if you want to access the intersection table from Table B.