why varbinary instead of varchar

37,817

Solution 1

Mediawiki changed from varchar to varbinary in early 2011:

War on varchar. Changed all occurrences of varchar(N) and varchar(N) binary to varbinary(N). varchars cause problems ("Invalid mix of collations" errors) on MySQL databases with certain configs, most notably the default MySQL config.

Solution 2

In MSSQL:

I think the big difference is only between nvarchar and varbinary.

Because nvarchar stores 2 bytes for each character instead of 1 byte.

varchar does the same as varbinary: from MSDN:

The storage size is the actual length of the data entered + 2 bytes" for both.

The difference here is by varbinary The data that is entered can be 0 bytes in length.

Here is a small example:

CREATE TABLE Test (textData varchar(255), binaryData varbinary(255))

INSERT INTO Test 
VALUES('This is an example.', CONVERT(varbinary(255),'This is an example.',0))
INSERT INTO Test 
VALUES('ÜŰÚÁÉÍä', CONVERT(varbinary(255),'ÜŰÚÁÉÍä',0))

What you can use here is the DATALENGTH function:

SELECT datalength(TextData), datalength(binaryData) FROM test

The result is 19 - 19 and 7 - 7

So in size they are the same, BUT there is an other difference. If you check the column specifications, you can see, that the varbinary (of course) has no collation and character set, so it could help use values from different type of encoding and character set easily.

SELECT 
  *
FROM   
  INFORMATION_SCHEMA.COLUMNS 
WHERE   
  TABLE_NAME = 'Test' 
ORDER BY 
  ORDINAL_POSITION ASC; 
Share:
37,817
user1411084
Author by

user1411084

Updated on July 20, 2022

Comments

  • user1411084
    user1411084 almost 2 years

    Please take a look at this table :

    http://www.mediawiki.org/wiki/Manual:Logging_table

    As you can see wikipedia use varbinary instead of varchar :

    | log_type      | **varbinary**(32)       | NO   | MUL |                |
    | log_action    | **varbinary**(32)       | NO   |     |                |
    | log_timestamp | **binary**(14)          | NO   | MUL | 19700101000000 |
    | log_user      | int(10) unsigned        | NO   | MUL | 0              |  
    | log_user_text | **varbinary**(255)      |      |     |                |
    

    All of these information are text , so why they save them as binary ?

    They do this for all tables .

  • usr
    usr over 11 years
    It looks like they fixed the problem at the wrong side. They silenced the symptom instead of fixing the underlying collations issue.