PostgreSQL case insensitive SELECT on array

13,379

Solution 1

One alternative not mentioned is to install the citext extension that comes with PostgreSQL 8.4+ and use an array of citext:

regress=# CREATE EXTENSION citext;
regress=# SELECT 'foo' = ANY( '{"Foo","bar","bAz"}'::citext[] );
 ?column? 
----------
 t
(1 row)

If you want to be strictly correct about this and avoid extensions you have to do some pretty ugly subqueries because Pg doesn't have many rich array operations, in particular no functional mapping operations. Something like:

SELECT array_agg(lower(($1)[n])) FROM generate_subscripts($1,1) n;

... where $1 is the array parameter. In your case I think you can cheat a bit because you don't care about preserving the array's order, so you can do something like:

SELECT 'foo' IN (SELECT lower(x) FROM unnest('{"Foo","bar","bAz"}'::text[]) x);

Solution 2

This seems hackish to me but I think it should work

SELECT value FROM table WHERE 'foo' = ANY(lower(value::text)::text[])

ilike could have issues if your arrays can have _ or %

Note that what you are doing is converting the text array to a single text string, converting it to lower case, and then back to an array. This should be safe. If this is not sufficient you could use various combinations of string_to_array and array_to_string, but I think the standard textual representations should be safer.

Update building on subquery solution below, one option would be a simple function:

CREATE OR REPLACE FUNCTION lower(text[]) RETURNS text[] LANGUAGE SQL IMMUTABLE AS
$$
SELECT array_agg(lower(value)) FROM unnest($1) value;
$$;

Then you could do:

SELECT value FROM table WHERE 'foo' = ANY(lower(value));

This might actually be the best approach. You could also create GIN indexes on the output of the function if you want.

Solution 3

Another alternative would be with unnest()

WITH tbl AS (SELECT 1 AS id, '{"Foo","bar","bAz"}'::text[] AS value)

SELECT value
FROM  (SELECT id, value, unnest(value) AS val FROM tbl) x
WHERE  lower(val) = 'foo'
GROUP  BY id, value;

I added an id column to get exactly identical results - i.e. duplicate value if there are duplicates in the base table. Depending on your circumstances, you can probably omit the id from the query to collapse duplicates in the results or if there are no dupes to begin with. Also demonstrating a syntax alternative:

SELECT value
FROM  (SELECT value, lower(unnest(value)) AS val FROM tbl) x
WHERE  val = 'foo'
GROUP  BY value;

If array elements are unique within arrays in lower case, you don't even need the GROUP BY, since every value can only match once.

SELECT value
FROM  (SELECT value, lower(unnest(value)) AS val FROM tbl) x
WHERE  val = 'foo';

'foo' must be lower case, obviously.
Should be fast.

If you want that fast wit a big table, I would create a functional GIN index, though.

Share:
13,379

Related videos on Youtube

PerryW
Author by

PerryW

I used to be a developer, many many years ago... These days I'm an IT manager to pay the bills and a farmer by choice. Still like to code a bit to keep my hand in and ward off senility. You'll mostly find me asking questions on StackOverflow and answering them on ELL

Updated on June 05, 2022

Comments

  • PerryW
    PerryW almost 2 years

    I'm having problems finding the answer here, on google or in the docs ...
    I need to do a case insensitive select against an array type.

    So if:

    value = {"Foo","bar","bAz"}
    

    I need

    SELECT value FROM table WHERE 'foo' = ANY(value)
    

    to match.

    I've tried lots of combinations of lower() with no success.

    ILIKE instead of = seems to work but I've always been nervous about LIKE - is that the best way?

    • PerryW
      PerryW about 11 years
      So ILIKE is ruled out as pointed out by @Chris Travers below - it's quite likely that a value could legitimately contain an underscore
    • PerryW
      PerryW about 11 years
      So it wasn't such a dumb question then @Erwin? :) (first time I've been edited - not complaining, just fascinated)
    • Erwin Brandstetter
      Erwin Brandstetter about 11 years
      On the contrary: it's a very interesting question, IMO, and it has attracted a number of interesting answers already. But primarily I edited that bit out, because we try to keep the noise ratio in questions and answers low on SO. Some noise can go into comments. :)
  • Craig Ringer
    Craig Ringer about 11 years
    Useful hack. I don't think there are any charsets/locales where lower will transform chars that the array syntax cares about. (BTW, I sometimes wish Pg had array map, filter and fold/foldl/foldr for those cases where PL/PgSQL is overkill but pure SQL is clumsy. Array sort too, actually.)
  • Erwin Brandstetter
    Erwin Brandstetter about 11 years
    +1 on citext. May or may not be practical for the OP, but it's the perfect opportunity to mention that extension.
  • PerryW
    PerryW about 11 years
    @ErwinBrandstetter is right - I can't install libraries sadly - it's a bit of a black-box instance
  • Chris Travers
    Chris Travers about 11 years
    @craig, agree. Actually sort could be easily done with a plain sql function.
  • Chris Travers
    Chris Travers about 11 years
    Actually, you can wrap the subquery in a function (I just added this to my answer) so that lower(array_of_text) will just work.
  • Chris Travers
    Chris Travers about 11 years
    Also note you can take the subquery and write a lower(text[]) function so it just works.
  • Craig Ringer
    Craig Ringer about 11 years
    @ChrisTravers Easily, but not necessarily efficiently. Array access at SQL level isn't super-efficient unfortunately. One day I'll get around to writing them, array access in C doesn't look too hard and things like sorts can be done with generic opclass actions.
  • Chris Travers
    Chris Travers about 11 years
    Actually, maybe it would be worth putting together an extension with these array functions. They could all be sql or pl/pgsql easily enough.
  • Erwin Brandstetter
    Erwin Brandstetter about 11 years
    While that works and simplifies the code, it would be slower. You would go back and forth between array and set representation. You could base the functional GIN Index I mentioned on it, though. Then you have fast & simple syntax.
  • Lee
    Lee over 3 years
    @ChrisTraversI actually tried your function but it returns NULL for empty array, I posted another answer that uses casting to text instead.