Is it good practice to have foreign keys in a datawarehouse (relationships)?

24,501

Solution 1

I have no idea. But nobody is answering, so I googled and found a best practises paper who seem to say the very helpful "it depends" :-)

While foreign key constraints help data integrity, they have an associated cost on all insert, update and delete statements. Give careful attention to the use of constraints in your warehouse or ODS when you wish to ensure data integrity and validation

Solution 2

I presume that you refer to FKs in fact tables. During DW loading, indexes and any foreign keys are dropped to speed up the loading -- the ETL process takes care of keys.

Foreign key constraint "activates" during inserts and updates (this is when it needs to check that the key value exists in the parent table) and during deletes of primary keys in parent tables. It does not play part during reads. Deleting records in a DW is (should) be a controlled process which scans for any existing relationships before deleting from dimension tables.

So, most DWs do not have foreign keys implemented as constraints.

Solution 3

FK constraints work well in Kimball dimensional models on SQL Server.

Typically, your ETL will need to lookup into the dimension table (usually on the business key to handle slowly changing dimensions) to determine dimension surrogate IDs, and the dimension surrogate id is usually an identity, and the PK on the dimension is usually the dimension surrogate id, which is already an index (probably clustered).

Having RI at this point is not a huge of overhead with the writes, since it can also help catch ETL defects during development. Also, having the PK of the fact table being a combination of all the FKs can also help trap potential data modeling problems and double-loading.

It can actually reduce overhead on selects if you like to make general-use flattened views or table-valued functions of your star models. Because extra inner joins to dimensions are guaranteed to produce one and only one row, so the optimizer can use these constraints very effectively to eliminate the need to look up into the table. Without FK constraints, these lookups may have to be done to eliminate facts where the dimension does not exist.

Solution 4

Using FK-constraints in a DW is like wearing a bicycle helmet. If the ETL is designed correctly, you technically don't need them. That said, if I had a million dollars for every time I've seen bug-free ETL, I'd have zero dollars.

Until you're at a point where FK-constraints are causing performance issues, I say leave'em. Cleaning up referential integrity problems can be much harder than adding them from the get-go ;-)

Solution 5

The quesiton is clear, but "good practice" seems the wrong question.

"Could have FK's" ?

Foreign keys are a mechanism to preserve integrity constraints during database modifications.

If your DW is read-only (accumulating data sources without writing back), there is no need for FK's.

If your DW supports writes, integrity constaints typically need to be coordinated across the participating data sources by the ETL (rather, it's Store equivalent). This process may or may not rely on FK's in the database.

So the right question would be: do you need them.

(The only other reason I can think of would be documentation of relationship - however, this can be done on paper / in a separate document, too.)

Share:
24,501

Related videos on Youtube

Lieven Cardoen
Author by

Lieven Cardoen

Minds to blow, places to go...

Updated on July 09, 2022

Comments

  • Lieven Cardoen
    Lieven Cardoen almost 2 years

    I think the question is clear enough. Some of the columns in my datawarehouse table could have a relationship to a primary key. But is it good practice? It is denormalized, so it should never be deleted again (data in datawarehouse). Hope question is somewhat clear enough.

  • Kent Pawar
    Kent Pawar over 10 years
    +1. "Foreign keys are a mechanism to preserve integrity constraints during database modifications. If your DW is read-only, there is no need for FK's..." - Bull's eye!
  • Chipmonkey
    Chipmonkey about 10 years
    Some databases have specific optimizations in places for star or snowflake structured data warehouses. In those cases, even on a read-only situation, the foreign keys can serve to alert the warehouse how the star is structured -- to tell it which are the fact and dimensions. Even in normalized databases foreign keys can affect the optimizer. I'm struggling to determine when and how much this matters myself right now, but it certainly does have SOME affect.
  • Eric Kramer
    Eric Kramer about 8 years
    My 20+ years of experience in data and data warehousing agree with you... Projects change/evolve and customers (and developers!) can easily introduce changes that break assumptions. Having FKs is indeed a great safety net--"bicycle helment" rocks as a simile! Failing that, I'd encourage a final "validate" phase of the load process that at least checks the constraints/uniqueness in the data. Great answer, Bill.
  • SherlockSpreadsheets
    SherlockSpreadsheets over 5 years
    A relationship with NOCHECK? That sounds like a good idea for a DW. You document the relationships as the system is designed, but you don't break ETL in the event there is an issue with the source system.
  • philipxy
    philipxy about 5 years
    Yes, read-only means enforcing constraints is unnecessary--if your warehouse is a snapshot of a constraint-enforced DB. But constraints allow the DBMS to optimize queries. So "no need" is wrong. As usual it's a tradeoff.