VARCHAR as foreign key/primary key in database good or bad?

52,504

Solution 1

The problem with VARCHAR being used for any KEY is that they can hold WHITE SPACE. White space consists of ANY non-screen-readable character, like spaces tabs, carriage returns etc. Using a VARCHAR as a key can make your life difficult when you start to hunt down why tables aren't returning records with extra spaces at the end of their keys.

Sure, you CAN use VARCHAR, but you do have to be very careful with the input and output. They also take up more space and are likely slower when doing a Queries.

Integer types have a small list of 10 characters that are valid, 0,1,2,3,4,5,6,7,8,9. They are a much better solution to use as keys.

You could always use an integer-based key and use VARCHAR as a UNIQUE value if you wanted to have the advantages of faster lookups.

Solution 2

My 2 cents:

From a performance perspective, using CHAR or VARCHAR as primary key or index is a nightmare.

I've tested compound primary keys (INT + CHAR, INT + VARCHAR, INT + INT) and by far INT + INT was the best performance (loading a data warehouse). Lets say about twice more performance if you keep only numeric primary keys/indexes.

Solution 3

When I'm doing design work I ask myself: have I got anything in this data that I can guarantee is going to be non-NULL, unique, and unchanging? If so that's a candidate to be the primary key. If not, I know I have to generate a key value to use. Assuming, then, that my candidate key happens to be a VARCHAR I then look at the data. Is it reasonably short in length (meaning, say, 20 characters or less)? Or is the VARCHAR field rather long? If it's short it's usable as a key - if it's long, perhaps it's better to not use it as a key (although if it's in consideration for being the primary key I'm probably going to have to index it anyways). At least part of my concern is that the primary key is going to have to be indexed and will perhaps be used as a foreign key from some other table. Comparisons of VARCHAR fields tend to be slower than the comparison of numeric fields (particularly binary numeric fields such as integers) so using a long VARCHAR field as a key may result in slow performance. YMMV.

Solution 4

with an int you can store up to 2 billion in 4 bytes with varchars you cannot you need to have 10 bytes or so to store that, if you use varchars there is also a 2 byte overhead

so now you add up the 6 extra bytes in every PK and FK + the 2 byte varchar overhead

Solution 5

If you make the category name into the ID you will have a problem if you ever decide to rename a category.

Share:
52,504
Admin
Author by

Admin

Updated on July 09, 2022

Comments

  • Admin
    Admin almost 2 years

    Is it better if I use ID nr:s instead of VARCHARS as foreign keys? And is it better to use ID nr:s isntead of VARCHARS as Primary Keys? By ID nr I mean INT!

    This is what I have now:

    category table:
    cat_id ( INT ) (PK)
    cat_name (VARCHAR)
    
    category options table:
    option_id ( INT ) (PK)
    car_id ( INT ) (FK)
    option_name ( VARCHAR ) 
    

    I COULD HAVE THIS I THINK:

    category table:
    cat_name (VARCHAR) (PK)
    
    category options table:
    cat_name ( VARCHAR ) (FK)
    option_name ( VARCHAR ) ( PK )
    

    Or am I thinking completely wrong here?

    • Admin
      Admin over 14 years
      I will go with numbers then, thanks all!
    • Armstrongest
      Armstrongest over 14 years
      Good decision. VARCHAR could be used to support an inherited database, or legacy code. It's also more difficult ( though definitely not impossible ) to increment letters.
    • Tom H
      Tom H over 14 years
      Why would you need to increment letters?
    • Armstrongest
      Armstrongest over 14 years
      You wouldn't. That was a bit of tongue-in-cheek.
    • orokusaki
      orokusaki about 14 years
      Increment letters... FTL
    • Rok Kralj
      Rok Kralj about 11 years
      In second case, you can just drop the category table. Doesn't bring anything to the table, unless you mean to extend it in future.
  • Admin
    Admin over 14 years
    Thanks for the answer, please explain "CODE"... check this Q out to find out my intended use please: stackoverflow.com/questions/2100008/…
  • Tony
    Tony over 14 years
    Even if a key does change it shouldn't be a problem if your database is correctly set up to cascade changes.
  • Tony
    Tony over 14 years
    But the problem with numeric identifiers is they are hard to read and you always need to join to other tables to obtain their meaning when looking at your data. Imagine a table that relates three (or more) other tables with all IDs as integers? You have three columns of numbers which have no immediate meaning.
  • Matthew Farwell
    Matthew Farwell over 14 years
    @Tony I don't think I've ever come across a situation where I would want to cascade a change like that. In fact, I've never used a database where we've had cascade switched on :-)
  • Armstrongest
    Armstrongest over 14 years
    Yeah, I mean, when it comes down to it, each character in a VARCHAR is a binary number to a processor. It just has to do the work of translating it using an ASCII lookup table for each letter. "A" == 65 == 00000000 01000001. That's stored for each letter. In fact, if you're using words as Keys, you'd run out of English words LONG before you used the possible number combinations in just TWO BYTES.
  • OMG Ponies
    OMG Ponies over 14 years
    @Tony: Immediately readable data typically means denormalized data.
  • Tony
    Tony over 14 years
    I don't think the speed of queries is particularly affected by the use of VARCHAR. A database I work on uses CHAR primary keys on some tables with over 4 million records and queries execute in under a second. IU don't think a surrogate integer key would make a lot of difference.
  • Armstrongest
    Armstrongest over 14 years
    @Tony. I know what you mean, however, that's what front-ends are for. They are for reading the data. The Database Structure should probably be created for efficiency, ease of maintenance, robustness, and scalability. Readability can be done with some Quick Views if you need it while developing. While it can be tempting to give every key a readable name, in any sizeable database, you will end up with problems trying to make every key readable. Not everything that CAN be done, should be done.
  • SQLMenace
    SQLMenace over 14 years
    @Tony char doesn't have the 2 byte overhead of a varchar also my tables are in the billion row ranges so 2 bytes extra per key is 7 GB extra I have to pay for SAN space just for one table...then backups take longer etc etc when you have large table you need to optimize a lot more than with a couple of million rows
  • Tony
    Tony over 14 years
    @MatthieuF: True, keys (hopefully!) don't change but I guess you might have to deal with it in one way or another if they did. [putting my neck on the line here] I prefer to use natural keys, hence I've had to think about this possibility :)
  • Tony
    Tony over 14 years
    @OMG Ponies: I take your point, but it's not always an immediate indication of denormalized data. In the example @astander gave (Stock Exchange company codes) it makes perfect sense to use a CHAR key for the company. They are all unique but also quite readable.
  • Tony
    Tony over 14 years
    @Atomiton: Yes, front ends are there to make things easier but can you honestly say you have done all your database problem solving through a front end other than the one provided by the database? :)
  • Tony
    Tony over 14 years
    Fair point. The database I have at the moment is the largest I've worked with so yes, I can see those extra couple of bytes would cause a problem. Also, thinking about it I don't have any VARCHAR keys, only CHAR! :)
  • OMG Ponies
    OMG Ponies over 14 years
    @Tony: I said typically, not always. Using a VARCHAR should be because there is a natural key, while an integer would be an artificial key because as you pointed out - there's no implicit relationship between the number and the data.
  • David Thornley
    David Thornley over 14 years
    @Tony: Why would you look at the key columns? You know they exist, and you know you can join on them. My limited experience is that pure join tables like you've mentioned are rare, and it's easy to write a query to display them in a more understandable form.
  • Tony
    Tony over 14 years
    @OMG Ponies: I had noticed I'd missed the word "typically", sorry about that. I tried to edit my comment but it's locked now.
  • Tony
    Tony over 14 years
    @David Thornley: You look at them because they identify your data. If you were looking for info using a company stock code (to continue the example) why not use the code directly, rather than having to look up a number to then use that to find the data you actually want? As I predicted this is rapidly becoming a discussion about natural vs surrogate keys and it's all my fault... :)
  • OMG Ponies
    OMG Ponies over 14 years
    @Tony: No worries. My main point was, beyond what's already provided, why VARCHAR vs Int would be chosen when data modelling.
  • David Thornley
    David Thornley over 14 years
    @Tony: It's not your fault; the whole question is tied up with natural vs. surrogate keys. For a natural key, you use VARCHAR when approriate. For a surrogate key, you'd always use an integer or something. I just seem to be more on the surrogate-key side than you are.
  • Armstrongest
    Armstrongest over 14 years
    Overhead is still overhead. It may not be that significant, but unless there's a good reason, it still isn't a great idea, unless there's a good business reason.
  • Armstrongest
    Armstrongest over 14 years
    @All. Sure, a company stock code, for example, would make a great readable unique primary key. After all, it should ALWAYS be a unique way to represent a product, right? However, sometimes we can't predict how our data will be used in the future. One scenario ( there are many ) is this: You discontinue an old product and ( accidentally? ) reuse the stock code in the future. You now have no [easy] way to isolate the old from the new data. Note that the point isn't the example, it's that allowing the computer to automate the selection of unique keys future proofs your data better.
  • Armstrongest
    Armstrongest over 14 years
    Lol. Sometimes I wonder if, in the future, wars will be started over issues like these. You're right, though, it always depends on the data and how it will be used. Personally, I've been burned too many times using non-integer types as a PK, so I put up with the "non-human-readableness" of PKs that don't relate directly to the record.
  • TomTom
    TomTom over 11 years
    Nothing random here. Stock code change is not unheard of (SUN -> JAVA). Same company, how to deal with this? Unique - NO. Not the moment you get international. Or use not only stocks but also other financial instruments. Stock code - which one? There are multiple symbologies (reuters etc.) with standardized (but different) codes.
  • NobleUplift
    NobleUplift almost 10 years
    -1. Your second-to-last sentence is either incorrect, or very misleading. The 'characters' in an integer type have no meaning to the database whatsoever, it is simply a visual representation of a byte value. An INT is just as fast as a VARCHAR(4), and a BIGINT is just as fast as a VARCHAR(8) (assuming all the characters are one byte in length).
  • Alex S
    Alex S almost 10 years
    -1 You will never get a join mismatch due to whitespace differences. That's what foreign keys are designed to prevent, which is what this question is all about. At worst, you'll get a constraint violation, and perhaps have to scratch your head over why it isn't allowing a value that seems to match one that's present in the referenced table and column. But any sane application will constraint inserted values to be from the referenced table and column in the first place (e.g., populate a drop-down list for the user to choose from), so this normally won't be a problem.