SQL indexing on varchar

25,548

Solution 1

Keys on VARCHAR columns can be very long which results in less records per page and more depth (more levels in the B-Tree). Longer indexes also increase the cache miss ratio.

How many strings in average map to each integer?

If there are relatively few, you can create an index only on integer column and PostgreSQL will do the fine filtering on records:

CREATE INDEX ix_mytable_assoc ON mytable (assoc);

SELECT  floatval
FROM    mytable
WHERE   assoc = givenint
        AND phrase = givenstring

You can also consider creating the index on the string hashes:

CREATE INDEX ix_mytable_md5 ON mytable (DECODE(MD5(phrase), 'HEX'));

SELECT  floatval
FROM    mytable
WHERE   DECODE(MD5(phrase), 'HEX') = DECODE(MD5('givenstring'), 'HEX')
        AND phrase = givenstring -- who knows when do we get a collision?

Each hash is only 16 bytes long, so the index keys will be much shorter while still preserving the selectiveness almost perfectly.

Solution 2

I'd recommend simply a hash index:

create index mytable_phrase_idx on mytable using hash(phrase);

This way queries like

select floatval from mytable where phrase='foo bar';

will be very quick. Test this:

create temporary table test ( k varchar(50), v float);
insert into test (k, v) select 'foo bar number '||generate_series(1,1000000), 1;
create index test_k_idx on test using hash (k);
analyze test;
explain analyze select v from test where k='foo bar number 634652';
                                                   QUERY PLAN                                                    
-----------------------------------------------------------------------------------------------------------------
 Index Scan using test_k_idx on test  (cost=0.00..8.45 rows=1 width=8) (actual time=0.201..0.206 rows=1 loops=1)
   Index Cond: ((k)::text = 'foo bar number 634652'::text)
 Total runtime: 0.265 ms
(3 rows)
Share:
25,548
alex
Author by

alex

Updated on July 12, 2020

Comments

  • alex
    alex almost 4 years

    I have a table whose columns are varchar(50) and a float. I need to (very quickly) look get the float associated with a given string. Even with indexing, this is rather slow.

    I know, however, that each string is associated with an integer, which I know at the time of lookup, so that each string maps to a unique integer, but each integer does not map to a unique string. One might think of it as a tree structure.

    Is there anything to be gained by adding this integer to the table, indexing on it, and using a query like:

    SELECT floatval FROM mytable WHERE phrase=givenstring AND assoc=givenint
    

    This is Postgres, and if you could not tell, I have very little experience with databases.