Using "varchar" as the primary key? bad idea? or ok?

43,915

Solution 1

It totally depends on the data. There are plenty of perfectly legitimate cases where you might use a VARCHAR primary key, but if there's even the most remote chance that someone might want to update the column in question at some point in the future, don't use it as a key.

Solution 2

If you are going to be joining to other tables, a varchar, particularly a wide varchar, can be slower than an int.

Additionally if you have many child records and the varchar is something subject to change, cascade updates can causes blocking and delays for all users. A varchar like a car VIN number that will rarely if ever change is fine. A varchar like a name that will change can be a nightmare waiting to happen. PKs should be stable if at all possible.

Next many possible varchar Pks are not really unique and sometimes they appear to be unique (like phone numbers) but can be reused (you give up the number, the phone company reassigns it) and then child records could be attached to the wrong place. So be sure you really have a unique unchanging value before using.

If you do decide to use a surrogate key, then make a unique index for the varchar field. This gets you the benefits of the faster joins and fewer records to update if something changes but maintains the uniquess that you want.

Now if you have no child tables and probaly never will, most of this is moot and adding an integer pk is just a waste of time and space.

Solution 3

I realize I'm a bit late to the party here, but thought it would be helpful to elaborate a bit on previous answers.

It is not always bad to use a VARCHAR() as a primary key, but it almost always is. So far, I have not encountered a time when I couldn't come up with a better fixed size primary key field.

VARCHAR requires more processing than an integer (INT) or a short fixed length char (CHAR) field does.

In addition to storing extra bytes which indicate the "actual" length of the data stored in this field for each record, the database engine must do extra work to calculate the position (in memory) of the starting and ending bytes of the field before each read.

Foreign keys must also use the same data type as the primary key of the referenced parent table, so processing further compounds when joining tables for output.

With a small amount of data, this additional processing is not likely to be noticeable, but as a database grows you will begin to see degradation.

You said you are using a GUID as your key, so you know ahead of time that the column has a fixed length. This is a good time to use a fixed length CHAR(36) field, which incurs far less processing overhead.

Solution 4

I think int or bigint is often better.

  1. int can be compared with less CPU instructions (join querys...)
  2. int sequence is ordered by default -> balanced index tree -> no reorganisation if you use an PK as clustered index
  3. index need potentially less space
Share:
43,915

Related videos on Youtube

001
Author by

001

Only questions with complete answers are accepted as solutions.

Updated on November 07, 2020

Comments

  • 001
    001 over 3 years

    Is it really that bad to use "varchar" as the primary key?

    (will be storing user documents, and yes it can exceed 2+ billion documents)

    • Martin Smith
      Martin Smith over 12 years
      What length varchar? Can you give a couple of examples of proposed keys also?
    • mmmmmm
      mmmmmm over 12 years
      How will the users choose a document for retrieval?
    • 001
      001 over 12 years
      the length of the varchar is 36, for guid. The Guid as varchar is used as the primary key.
    • Paul
      Paul over 3 years
      Just to further add to the pros and cons for this question; I would really carefully consider whether VARCHAR is the correct type to use - perhaps NVARCHAR would be more appropriate if multiple cultures are involved. This has a knock on effect, however, by doubling the number of characters used in the key.
  • Martin Smith
    Martin Smith over 12 years
    Please explain how adding an id will come in handy to show only 50. These are not guaranteed sequential and rollbacks and deletes leave gaps. Also by setting a unique constraint on the column this makes it a candidate key anyway in which case it is arbitrary which one you select as the primary key. If you have multiple candidate keys enforced by any type of unique index any of them can participate in foreign key relationships.
  • JNDPNT
    JNDPNT over 12 years
    When you are going to display these file-names you really don't want to select 2 billion records and than limit 50. Using an id you can select all files where id > some_position_you_saved. This will improve execution speed by... well... much :-).
  • Martin Smith
    Martin Smith over 12 years
    You can do that with varchar too. It will still seek into the correct place in the index then retrieve the TOP 50 rows from that point.
  • Martin Smith
    Martin Smith over 12 years
    @Mark - Primary keys should be stable. If you update them the changes may need to propagate to other tables using them as an FK, also by default they become the clustering key in SQL Server so are duplicated to act as the row locator in non clustered indexes too, that would also need to be updated.
  • mmmmmm
    mmmmmm over 12 years
    so that applies to any type not just varchar
  • ninesided
    ninesided over 12 years
    @Mark - yes, I guess it does apply to all datatypes, as I mentioned before it totally depends on the data, and the intent. I've seen cases where the PK on one table is also used in many other related tables, but not enforced through FK constraints. This makes updates to the PK really nasty.
  • mmmmmm
    mmmmmm over 12 years
    Well yes not using FK constraints does make a database unreliable
  • ninesided
    ninesided over 12 years
    @Mark - I wouldn't say "unreliable", you have to be more careful, sure, but there are plenty of good reasons for not using FKs in certain circumstances.
  • Mike Sherrill 'Cat Recall'
    Mike Sherrill 'Cat Recall' over 12 years
    @ninesided: Betting on users, developers, and DBAs to enforce foreign key constraints manually--by exercising great care in their inserts, updates, and deletes--is pretty much the definition of unreliable.
  • 001
    001 over 12 years
    Never in any case, the key will need to be updated. Once the key is created it is final.
  • 001
    001 over 12 years
    Might have parentId, such as related documents would have an Id, doc1 relates to doc2 etc... Will use VarChar(36) as the primary and set it to unique index.
  • 001
    001 over 12 years
    What about the datatype as uniqueidentifier is it going to be the same as varchar(36)?
  • dburges
    dburges over 12 years
    GUIDs are a whole other issue. They are not varchar data, they have their own datatype of uniqueIdentifier. You need to research issues with using GUIDs as PKs but the biggest one is to be sure to NEVER use them as the clustered index which the PK is by default. There is alot of information on this, just "Google SQL Server GUID PK problems".
  • ninesided
    ninesided over 12 years
    @001 - if that's the case, then it's a good candidate for a primary key
  • Gromer
    Gromer over 11 years
    I think the point is that getting a range is a lot easier with a numeric identifier. SELECT ... WHERE Id > 58 AND Id < 98, something along those lines. This also depends on how the varchar index is created (random or more of a sequential algorithm).
  • Milan Jaric
    Milan Jaric over 5 years
    First, there is sequential version of GUID in SQL server NEWSEQUENTIALID(). NEWID() is not sequential. BUT, Indexes and PK also depends on how data is updated. In most cases, if a row contains any column that has variable length data, on any update, fragmentation will happen. This applies to indexes. If index contains or "includes" var length column data on any update when new value is bigger, it will cause (a) page split, (b) break sequence since row or indexed row will be moved to some other page. Then reorganize, reorder, reindex will improve performance. It is not just PK fault!