NULL permitted in Primary Key - why and in which DBMS?

38,550

Solution 1

Suppose you have a primary key containing a nullable column Kn.

If you want to have a second row rejected on the ground that in that second row, Kn is null and the table already contains a row with Kn null, then you are actually requiring that the system would treat the comparison "row1.Kn = row2.Kn" as giving TRUE (because you somehow want the system to detect that the key values in those rows are indeed equal). However, this comparison boils down to the comparison "null = null", and the standard already explicitly specifies that null doesn't compare equal to anything, including itself.

To allow for what you want, would thus amount to SQL deviating from its own principles regarding the treatment of null. There are innumerable inconsistencies in SQL, but this particular one never got past the committee.

Solution 2

I don't know whether older versions of MySQL differ on this, but as of modern versions a primary key must be on columns that are not null. See the manual page on CREATE TABLE: "A PRIMARY KEY is a unique index where all key columns must be defined as NOT NULL. If they are not explicitly declared as NOT NULL, MySQL declares them so implicitly (and silently)."

Solution 3

As far as relational database theory is concerned:

  • The primary key of a table is used to uniquely identify each and every row in the table
  • A NULL value in a column indicates that you don't konw what the value is
  • Therefore, you should never use the value of "I don't know" to uniquely identify a row in a table.

Depending upon the data you are modelling, a "made up" value can be used instead of NULL. I've used 0, "N/A", 'Jan 1, 1980', and similar values to represent dummy "known to be missing" data.

Most, if not all, DB engines do allow for a UNIQUE constraint or index, which does allow for NULL column values, though (ideally) only one row may be assigned the value null (otherwise it wouldn't be a unique value). This can be used to support the irritatingly pragmatic (but occasionally necessary) situations that don't fit neatly into relational theory.

Solution 4

Well, it could allow you to implement the Null Object Pattern natively within the database. So if you were using something similar in code, which interacted very closely with the DB, you could just look up the object corresponding to the key without having to special-case a null check.

Now whether this is worthwhile functionality I'm not sure, but it's really a question of whether the pros of disallowing null pkeys in absolutely all cases outweigh the cons of obstructing someone who (for better or worse) actually wants to use null keys. This would only be worth it if you could demonstrate some non-trivial improvement (such as faster key lookup) from being able to guarantee that keys are non-null. Some DB engines would show this, others might not. And if there aren't any real pros from forcing this, why artificially restrict your clients?

Solution 5

As discussed in other answers, NULL was intended to mean "the information that should go in this column is unknown". However, it is also frequently used to indicate an alternative meaning of "this attribute does not exist". This is a particularly useful interpretation when looking at timestamp fields that are interpreted as the time some particular event occurred, in which case NULL is often used to indicate that the event has not yet occurred.

It is a problem that SQL doesn't support this interpretation very well -- for this to work properly, it really needs to have a separate value (something like "never") that doesn't behave as null does ("never" should be equal to "never" and should compare as higher than all other values). But as SQL lacks this notion, and there is no convenient way to add it, using null for this purposes is often the best choice.

This leaves the problem that when a timestamp of an event that may have not occurred should be part of the primary key of a table (a common requirement perhaps being the use of a natural key along with a deletion timestamp when using soft deletion with a requirement for the ability to recreate the item after deletion) you really want the primary key to have a nullable column. Alas, this is not allowed in most databases, and instead you have to resort to an artificial primary key (e.g. a row sequence number) and a UNIQUE constraint for what should otherwise have been your actual primary key.

An example scenario, in order to clarify this: I have a users table. As I require each user to have a distinct username, I decide to use username as the primary key. I want to support user deletion, but as I need to track the existence of users historically for auditing purposes I use soft deletion (in the first version of the schema, I add a 'deleted' flag to the user, and ensure that the deleted flag is checked in all queries where only active users are expected).

An additional requirement, however, is that if a username is deleted, it should be available for new users to register. An attractive way to achieve this would be to have the deleted flag change to a nullable timestamp (where nulls indicate that the user has not been deleted) and put this in the primary key. Were primary keys to allow nullable columns, this would have the following effect:

  • Creating a new user with an existing username when that user's deleted column is null would be denied as a duplicate key entry
  • Deleting a user changes its key (which requires changes to cascade to foreign keys that reference the user, which is suboptimal but if deletions are rare is acceptable) so that the deleted column is a timestamp for the when the deletion occurred
  • Now a new user (which would have a null deleted timestamp) can be successfully created.

However, this cannot actually be achieved with standard SQL, so instead one must use a different primary key (probably a generated numeric user id in this case) and use a UNIQUE constraint to enforce the uniqueness of (username,deleted).

Share:
38,550
Gennady Vanin Геннадий Ванин
Author by

Gennady Vanin Геннадий Ванин

Business Analyst, ex- programmer, biz analyst, Q&A developer in Test, chemical cybernetics engineer, scientist, US patent attorney, translator, tourist guide, traveller, parachuter, tennis player, futebolista, military officer etc.

Updated on July 09, 2022

Comments

  • Gennady Vanin Геннадий Ванин
    Gennady Vanin Геннадий Ванин almost 2 years

    Further to my question "Why to use ´not null primary key´ in TSQL?"...

    As I understood from other discussions, some RDBMS (for example SQLite, MySQL) permit "unique" NULL in the primary key.

    Why is this allowed and how might it be useful?

    Background: I believe it is beneficial for communication with colleagues and database professionals to know the differences in fundamental concepts, approaches and their implementations in different DBMS.

    Notes

    • MySQL is rehabilitated and returned to the "NOT NULL PK" list.
    • SQLite has been added (thanks to Paul Hadfield) to "NULL PK" list:

    For the purposes of determining the uniqueness of primary key values, NULL values are considered distinct from all other values, including other NULLs.

    If an INSERT or UPDATE statement attempts to modify the table content so that two or more rows feature identical primary key values, it is a constraint violation. According to the SQL standard, PRIMARY KEY should always imply NOT NULL. Unfortunately, due to a long-standing coding oversight, this is not the case in SQLite.

    Unless the column is an INTEGER PRIMARY KEY SQLite allows NULL values in a PRIMARY KEY column. We could change SQLite to conform to the standard (and we might do so in the future), but by the time the oversight was discovered, SQLite was in such wide use that we feared breaking legacy code if we fixed the problem.

    So for now we have chosen to continue allowing NULLs in PRIMARY KEY columns. Developers should be aware, however, that we may change SQLite to conform to the SQL standard in future and should design new programs accordingly.

    SQL As Understood By SQLite: CREATE TABLE

  • Gennady Vanin Геннадий Ванин
    Gennady Vanin Геннадий Ванин over 13 years
    Thanks for putting me on the right track. Well, it seems I made stupid question because I read too much StackOverflow. It is because I believed comments in stackoverflow.com/questions/3876785/… and others. Well, I checked that PK in MySQL does not permit NULL.
  • Gennady Vanin Геннадий Ванин
    Gennady Vanin Геннадий Ванин over 13 years
    Plz exonerate me from [you should never use the value of "I don't know"]! I use SQL Server! I asked about others and other RDBMS
  • Philip Kelley
    Philip Kelley over 13 years
    This concept is not platform-specific, it is part of the specifications of a relational database system. SQL, Oracle, MySQL, Postgres, etc. are all just implementations of these specifications. The question of how correct and/or accurate they are has spawned any number of near-religious flame-wars across the internet.
  • Stephanie Page
    Stephanie Page over 13 years
    Using a bogus value that is well outside the range of valid values can screw with cardinality calculations and are a bad idea in general. NULL means NULL, JAN 1, 1990 does not mean NULL.
  • Philip Kelley
    Philip Kelley over 13 years
    @Stephanie, for dates, I agree. For things like status codes, having somthing like "0 = Status not yet assigned" is better than just leaving it null. (Cardinality would be unaffected, as you'd still have N rows whether it was null or 0.)
  • Stephanie Page
    Stephanie Page over 13 years
    @Philip, I didn't say cardinality is affected, I said cardinality calculations. Some RDBMS's store min and max column values and assume that there's an even distributions of values in between (in the absence of a histogram). If the rest of your keys are 1,2,3,4,5... you'll be fine. If they are 1000,1001,1002... the optimizer will assume that you have 1/1000 of the rows for a predicate of ID = n. In your example, that looks like a FK, in which case it won't matter since you'll be filtering on the Code = 'Status not yet assigned' as opposed to where Code_ID = 0. You'll only be joining on that 0
  • Gennady Vanin Геннадий Ванин
    Gennady Vanin Геннадий Ванин over 13 years
    "To allow for what you want" - you answered something I had not asked. I asked why some DBMS has and not why they should not
  • simpatico
    simpatico about 12 years
    summary: since null != null (by definition) all columns in PK must be non-NULL, for them to be comparable.
  • jave.web
    jave.web almost 8 years
    This answer made me think about one case when it could happen - suppose you are moving data from unreliable or hard-primary-key-detection source and must store them, for data parts you could uniquely identify you would specify a primary key, those you could not identify uniquely would stay there with NULL, until, you would decide how to generate primary key for them ...
  • jemartin80
    jemartin80 over 7 years
    Sorry to be so late to the party, but I just stumbled across this. I don't disagree with what you've said, but you can have a uniquely indexed field that allows nulls and it will not allow multiple rows to contain null in the field. This requires the comparison (null==null)=>true. In fact in MS SQL you can ignore null values from the comparison and then you are able to have multiple rows with null values in that column.
  • RS1
    RS1 over 5 years
    Whats your comment on MySQL, postgres and Oracle allowing unique key column to have multiple NULLs? I guess SQL standard allows only single NULL, right? But multiple NULLs are just not that good. Any more view on this?
  • Erwin Smout
    Erwin Smout over 5 years
    That comment won't fit in the space that is allowed for comments here. Those products you mention can allow that and still claim to be in line with the spirit of the SQL standard because a key is just a constraint and a constraint is never violated (i.e. always considered satisfied) if any NULL is involved, i.e. if the truth value of the constraint expression evaluates to UNKNOWN. Note that any such row-with-a-null will then always be accepted, and this clashes with my hypothesis "if you want such rows to be rejected on the grounds ...".