Best practices for SQL varchar column length

273,153

Solution 1

No DBMS I know of has any "optimization" that will make a VARCHAR with a 2^n length perform better than one with a max length that is not a power of 2.

I think early SQL Server versions actually treated a VARCHAR with length 255 differently than one with a higher maximum length. I don't know if this is still the case.

For almost all DBMS, the actual storage that is required is only determined by the number of characters you put into it, not the max length you define. So from a storage point of view (and most probably a performance one as well), it does not make any difference whether you declare a column as VARCHAR(100) or VARCHAR(500).

You should see the max length provided for a VARCHAR column as a kind of constraint (or business rule) rather than a technical/physical thing.

For PostgreSQL the best setup is to use text without a length restriction and a CHECK CONSTRAINT that limits the number of characters to whatever your business requires.

If that requirement changes, altering the check constraint is much faster than altering the table (because the table does not need to be re-written)

The same can be applied for Oracle and others - in Oracle it would be VARCHAR(4000) instead of text though.

I don't know if there is a physical storage difference between VARCHAR(max) and e.g. VARCHAR(500) in SQL Server. But apparently there is a performance impact when using varchar(max) as compared to varchar(8000).

See this link (posted by Erwin Brandstetter as a comment)

Edit 2013-09-22

Regarding bigown's comment:

In Postgres versions before 9.2 (which was not available when I wrote the initial answer) a change to the column definition did rewrite the whole table, see e.g. here. Since 9.2 this is no longer the case and a quick test confirmed that increasing the column size for a table with 1.2 million rows indeed only took 0.5 seconds.

For Oracle this seems to be true as well, judging by the time it takes to alter a big table's varchar column. But I could not find any reference for that.

For MySQL the manual says "In most cases, ALTER TABLE makes a temporary copy of the original table". And my own tests confirm that: running an ALTER TABLE on a table with 1.2 million rows (the same as in my test with Postgres) to increase the size of a column took 1.5 minutes. In MySQL however you can not use the "workaround" to use a check constraint to limit the number of characters in a column.

For SQL Server I could not find a clear statement on this but the execution time to increase the size of a varchar column (again the 1.2 million rows table from above) indicates that no rewrite takes place.

Edit 2017-01-24

Seems I was (at least partially) wrong about SQL Server. See this answer from Aaron Bertrand that shows that the declared length of a nvarchar or varchar columns makes a huge difference for the performance.

Solution 2

VARCHAR(255) and VARCHAR(2) take exactly the same amount of space on disk! So the only reason to limit it is if you have a specific need for it to be smaller. Otherwise make them all 255.

Specifically, when doing sorting, larger column do take up more space, so if that hurts performance, then you need to worry about it and make them smaller. But if you only ever select 1 row from that table, then you can just make them all 255 and it won't matter.

See: What are the optimum varchar sizes for MySQL?

Solution 3

Whenever I set up a new SQL table I feel the same way about 2^n being more "even"... but to sum up the answers here, there is no significant impact on storage space simply by defining varchar(2^n) or even varchar(MAX).

That said, you should still anticipate the potential implications on storage and performance when setting a high varchar() limit. For example, let's say you create a varchar(MAX) column to hold product descriptions with full-text indexing. If 99% of descriptions are only 500 characters long, and then suddenly you get somebody who replaces said descriptions with wikipedia articles, you may notice unanticipated significant storage and performance hits.

Another thing to consider from Bill Karwin:

There's one possible performance impact: in MySQL, temporary tables and MEMORY tables store a VARCHAR column as a fixed-length column, padded out to its maximum length. If you design VARCHAR columns much larger than the greatest size you need, you will consume more memory than you have to. This affects cache efficiency, sorting speed, etc.

Basically, just come up with reasonable business constraints and error on a slightly larger size. As @onedaywhen pointed out, family names in UK are usually between 1-35 characters. If you decide to make it varchar(64), you're not really going to hurt anything... unless you're storing this guy's family name that's said to be up to 666 characters long. In that case, maybe varchar(1028) makes more sense.

And in case it's helpful, here's what varchar 2^5 through 2^10 might look like if filled:

varchar(32)     Lorem ipsum dolor sit amet amet.

varchar(64)     Lorem ipsum dolor sit amet, consectetur adipiscing elit. Donecie

varchar(128)    Lorem ipsum dolor sit amet, consectetur adipiscing elit. Donecie
                vestibulum massa. Nullam dignissim elementum molestie. Vehiculas

varchar(256)    Lorem ipsum dolor sit amet, consectetur adipiscing elit. Donecie
                vestibulum massa. Nullam dignissim elementum molestie. Vehiculas
                velit metus, sit amet tristique purus condimentum eleifend. Quis
                que mollis magna vel massa malesuada bibendum. Proinde tincidunt

varchar(512)    Lorem ipsum dolor sit amet, consectetur adipiscing elit. Donecie
                vestibulum massa. Nullam dignissim elementum molestie. Vehiculas
                velit metus, sit amet tristique purus condimentum eleifend. Quis
                que mollis magna vel massa malesuada bibendum. Proinde tincidunt
                dolor tellus, sit amet porta neque varius vitae. Seduse molestie
                lacus id lacinia tempus. Vestibulum accumsan facilisis lorem, et
                mollis diam pretium gravida. In facilisis vitae tortor id vulput
                ate. Proin ornare arcu in sollicitudin pharetra. Crasti molestie

varchar(1024)   Lorem ipsum dolor sit amet, consectetur adipiscing elit. Donecie
                vestibulum massa. Nullam dignissim elementum molestie. Vehiculas
                velit metus, sit amet tristique purus condimentum eleifend. Quis
                que mollis magna vel massa malesuada bibendum. Proinde tincidunt
                dolor tellus, sit amet porta neque varius vitae. Seduse molestie
                lacus id lacinia tempus. Vestibulum accumsan facilisis lorem, et
                mollis diam pretium gravida. In facilisis vitae tortor id vulput
                ate. Proin ornare arcu in sollicitudin pharetra. Crasti molestie
                dapibus leo lobortis eleifend. Vivamus vitae diam turpis. Vivamu
                nec tristique magna, vel tincidunt diam. Maecenas elementum semi
                quam. In ut est porttitor, sagittis nulla id, fermentum turpist.
                Curabitur pretium nibh a imperdiet cursus. Sed at vulputate este
                proin fermentum pretium justo, ac malesuada eros et Pellentesque
                vulputate hendrerit molestie. Aenean imperdiet a enim at finibus
                fusce ut ullamcorper risus, a cursus massa. Nunc non dapibus vel
                Lorem ipsum dolor sit amet, consectetur Praesent ut ultrices sit

Solution 4

The best value is the one that is right for the data as defined in the underlying domain.

For some domains, VARCHAR(10) is right for the Name attribute, for other domains VARCHAR(255) might be the best choice.

Solution 5

Adding to a_horse_with_no_name's answer you might find the following of interest...

it does not make any difference whether you declare a column as VARCHAR(100) or VACHAR(500).

-- try to create a table with max varchar length
drop table if exists foo;
create table foo(name varchar(65535) not null)engine=innodb;

MySQL Database Error: Row size too large.

-- try to create a table with max varchar length - 2 bytes for the length
drop table if exists foo;
create table foo(name varchar(65533) not null)engine=innodb;

Executed Successfully

-- try to create a table with max varchar length with nullable field
drop table if exists foo;
create table foo(name varchar(65533))engine=innodb;

MySQL Database Error: Row size too large.

-- try to create a table with max varchar length with nullable field
drop table if exists foo;
create table foo(name varchar(65532))engine=innodb;

Executed Successfully

Dont forget the length byte(s) and the nullable byte so:

name varchar(100) not null will be 1 byte (length) + up to 100 chars (latin1)

name varchar(500) not null will be 2 bytes (length) + up to 500 chars (latin1)

name varchar(65533) not null will be 2 bytes (length) + up to 65533 chars (latin1)

name varchar(65532) will be 2 bytes (length) + up to 65532 chars (latin1) + 1 null byte

Hope this helps :)

Share:
273,153
esskar
Author by

esskar

creator of schmoose, passionate about software development and more ... far more ...

Updated on July 08, 2022

Comments

  • esskar
    esskar almost 2 years

    Every time is set up a new SQL table or add a new varchar column to an existing table, I am wondering one thing: what is the best value for the length.

    So, lets say, you have a column called name of type varchar. So, you have to choose the length. I cannot think of a name > 20 chars, but you will never know. But instead of using 20, I always round up to the next 2^n number. In this case, I would choose 32 as the length. I do that, because from an computer scientist point of view, a number 2^n looks more even to me than other numbers and I'm just assuming that the architecture underneath can handle those numbers slightly better than others.

    On the other hand, MSSQL server for example, sets the default length value to 50, when you choose to create a varchar column. That makes me thinking about it. Why 50? is it just a random number, or based on average column length, or what?

    It could also be - or probably is - that different SQL servers implementations (like MySQL, MSSQL, Postgres, ...) have different best column length values.