SQL SELECT speed int vs varchar
Solution 1
Int comparisons are faster than varchar comparisons, for the simple fact that ints take up much less space than varchars.
This holds true both for unindexed and indexed access. The fastest way to go is an indexed int column.
As I see you've tagged the question postgreql, you might be interested in the space usage of different date types:
-
int
fields occupy between 2 and 8 bytes, with 4 being usually more than enough ( -2147483648 to +2147483647 ) - character types occupy 4 bytes plus the actual strings.
Solution 2
Some rough benchmarks:
4 million records in Postgres 9.x
Table A = base table with some columns
Table B = Table A + extra column id of type bigint with random numbers
Table C = Table A + extra column id of type text with random 16-char ASCII strings
Results on 8GB RAM, i7, SSD laptop:
Size on disk: A=261MB B=292MB C=322MB
Non-indexed by id: select count(*), select by id: 450ms same on all tables
Insert* one row per TX: B=9ms/record C=9ms/record
Bulk insert* in single TX: B=140usec/record C=180usec/record
Indexed by id, select by id: B=about 200us C=about 200us
* inserts to the table already containing 4M records
so it looks like for this setup, as long as your indexes fit in RAM, bigint vs 16-char text makes no difference in speed.
Solution 3
It will be a bit faster using an int instead of a varchar. More important for speed is to have an index on the field that the query can use to find the records.
There is another reason to use an int, and that is to normalise the database. Instead of having the text 'Mercedes-Benz' stored thousands of times in the table, you should store it's id and have the brand name stored once in a separate table.
Solution 4
Breaking down to the actual performance of string comparison versus non-floats, in this case any size unsigned and signed does not matter. Size is actually the true difference in performance. Be it 1byte+(up to 126bytes) versus 1,2,4 or 8 byte comparison... obviously non-float are smaller than strings and floats, and thus more CPU friendly in assembly.
String to string comparison in all languages is slower than something that can be compared in 1 instruction by the CPU. Even comparing 8 byte (64bit) on a 32bit CPU is still faster than a VARCHAR(2) or larger. * Again, look at the produced assembly (even by hand) it takes more instructions to compare char by char than 1 to 8 byte CPU numeric.
Now, how much faster? depends also upon the volume of data. If you are simply comparing 5 to 'audi' - and that is all your DB has, the resulting difference is so minimal you would never see it. Depending upon CPU, implementation (client/server, web/script, etc) you probably will not see it until you hit few hundred comparisons on the DB server (maybe even a couple thousand comparisons before it is noticeable).
- To void the incorrect dispute about hash comparisons. Most hashing algorithms themselves are slow, so you do not benefit from things like CRC64 and smaller. For over 12 years I developed search algorithms for multi-county search engines and 7 years for the credit bureaus. Anything you can keep in numeric the faster... for example phone numbers, zip codes, even currency * 1000 (storage) currency div 1000 (retrieval) is faster than DECIMAL for comparisons.
Ozz
Solution 5
Index or not, int is a lot faster (the longer the varchar, the slower it gets).
Another reason: index on varchar field will be much larger than on int. For larger tables it may mean hundreds of megabytes (and thousands of pages). That makes the performance much worse as reading the index alone requires many disk reads.
googletorp
I'm a senior Drupal developer, working as a consultant for Reveal IT. Over the past year I've spent a lot of time on Drupal and Drupal Commerce, created a lot of different sites with it and enjoyed it all the way. I maintain or co-maintain a host of modules on drupal.org and have contributed to a lot of other modules. Recently I've started contributing to Drupal core, making me in the top 5% of most contributions. When I'm not doing work or Drupal related stuff, I usually spend time with my beautiful wife and amazing son, play soccer, make grandiose cakes or some other fun stuff.
Updated on July 24, 2022Comments
-
googletorp almost 2 years
I'm in the process of creating a table and it made me wonder.
If I store, say cars that has a make (fx BMW, Audi ect.), will it make any difference on the query speed if I store the make as an int or varchar.
So is
SELECT * FROM table WHERE make = 5 AND ...;
Faster/slower than
SELECT * FROM table WHERE make = 'audi' AND ...;
or will the speed be more or less the same?
-
googletorp about 14 yearsInteresting, How will the speed difference be between ENUM and int?
-
Robert Munteanu about 14 yearsDoes PostgresSQL have an
enum
data type? I though it was MySQL specific. -
googletorp about 14 yearsPostgres has ENUM, but I don't think it's implemented quite the same way as MySQL. postgresql.org/docs/current/static/datatype-enum.html
-
Magnus Hagander about 14 yearsYou are referring to pg 7.4. In modern versions, they take up 1byte+length if you have <126 bytes. Also note that the reason strings are much slower is often that collation-sensitive comparison is hugely expensive - not that the string takes more space. But the end result is the same, of course.
-
Magnus Hagander about 14 yearsPerformance wise, ENUM should perform more or less the same as int in the search field, but as varchar in the target list (because it has to transfer the whole string to the client for matched rows, not just the int)
-
Robert Munteanu about 14 years@Magnus - thanks for the heads-up. Feel free to edit my answer as I see you have enough rep points.
-
Andris over 9 yearsCould you explain more? Do you mean instead of
Mercedes-Benz
to store thousands of times id1
. For example tablecar_brands
, columnsBrands
andId
. RowMercedes-Benz
and1
. And in main table columnBrands
and value1
. And whenSELECT
, then at first getId
from tablecar_brands
and thenSELECT Something FROM main_table WHERE Brands = (SELECT Id FROM car_brands WHERE Brands = Mercedes-Benz)
. Or some other approach? -
Guffa over 9 years@user2118559: Yes, that is how you would store it. To get the data you would generally use a join rather than a subquery:
select something from main_table c inner join car_brands b on b.Id = c.Brands where b.Brands = 'Mercedes-Benz'
. -
Guffa about 8 yearsWhy the downvote? If you don't explain what it is that you think is wrong, it can't improve the answer.
-
MrMesees about 8 years"not that the string takes more space"... strings of characters above minimal sizes take up a heck of a lot more space than even high-precision numbers, because a number (singular) has a fixed unit, strings are always aggregate types. 8 bytes for a 64-bit number 4 bytes per-character in a string, including either a length byte or struct; or another terminator character for incredibly naive implementations...
-
Wilt almost 8 yearsHere an interesting read on why NOT to use enum in MySQL (just to add some fuel to the fire :D )
-
AiRiFiEd about 7 years@RobertMunteanu Hey Robert, apologies I know this is an old post but can I kindly check...on the following: in order to query integers, i have to link each string column to another table (relationship). however, that means more joining operations are required for each query. How do i determine if this trade-off is worth it? Thank you!
-
lulalala about 7 yearsFor example of 5 millions records of "audi", wouldn't the index only hold only one copy of string of "audi" and 5 millions integers of primary_key? Would the size difference really be that large, be it vchar or integer?
-
Chibueze Opata over 6 yearsVery interesting. How come the difference is negligible?
-
Marcin Wojnarski about 5 years"Int comparisons are faster than varchar comparisons, for the simple fact that ints take up much less space than varchars" - this is NOT true in general. Depending on the DBMS you use and the exact data types and strings you want to insert, it may turn out that your (say) 8-byte ints are longer than ascii varchars holding some textual IDs of avg length 3-4 chars. So, this answer - being imprecise and lacking any specific context or experimental results - doesn't really answer the question. Everyone knows that varchars are allowed to take much more space than ints, but they do NOT have to.
-
Awais fiaz almost 5 yearsYou are right lulalala but for a column which is going to contain random strings the answer is fair enough.
-
Brettins over 4 yearsCan you back up your claim here about indexed access? Every benchmark I've seen posted online says that varchar vs int are identical for indexed access, and you are posting no data nor reference to back up your claim. stackoverflow.com/a/48583244/834393
-
rulhaniam almost 4 yearsOne simple thing that suggest search would be faster with integers is due to the comparison between 2 integer vs 2 string (varchar). the compare operation is constant .i.e. O(1) in case of integers but with strings its not constant (but depends on the length of the string).
-
Nearoo over 2 years@ChibuezeOpata Before data can be compared, it has to be loaded into caches closer to the CPU than RAM. The overall comparison time is likely completely dominated by that loading. i7 L1 & L2 caches are 64 bytes in size, and you can likely load 64 succing bytes from RAM at once. Hence, as long as your data is smaller than 32 bytes (compare 32 to bytes to each other), the only difference is the cycles the CPU need for comparison, which is negligible. Just a guess, the actual answer is probably super complex.
-
Nearoo over 2 yearsComputers are not that simple to deduce more memory => less performance. Indexed columns are highly processed datastructures whose size might not be a function of the width of the columns: When columns are small, indexes are alrger than the data in the columns to increase performance, and when they're large, they're smaller to increase performance. Further, data loading is completely pararell up to a certain degree. This answer should not be accepted.