correct way to create a pivot table in postgresql using CASE WHEN

11,775

Solution 1

Use MAX() or MIN(), not FIRST(). In this scenario, you will have all NULLs in the column per each group value except for, at most, one with a not null value. By definition, this is both the MIN and the MAX of that set of values (all nulls are excluded).

Solution 2

PostgreSQL does have a couple of functions for pivot queries, see this article at Postgresonline. You can find these functions in the contrib.

Solution 3

As Matthew Wood pointed out, use MIN() or MAX(), not FIRST():

SELECT 
    an.acc, 
    MAX(
        CASE tn.rank 
            WHEN 'species' THEN tn.name 
            ELSE NULL 
        END
    ) AS species, 
    MAX(
        CASE tn.rank 
            WHEN 'phylum' THEN tn.name 
            ELSE NULL 
        END
    ) AS phylum 
FROM tax_node tn, 
    acc2tax_node an
WHERE tn.taxid = an.taxid 
    and an.acc = 'AJ012531' 
GROUP by an.acc;
Share:
11,775
mojones
Author by

mojones

Updated on June 24, 2022

Comments

  • mojones
    mojones about 2 years

    I am trying to create a pivot table type view in postgresql and am nearly there! Here is the basic query:

    select 
    acc2tax_node.acc, tax_node.name, tax_node.rank 
    from 
    tax_node, acc2tax_node 
    where 
    tax_node.taxid=acc2tax_node.taxid and acc2tax_node.acc='AJ012531';
    

    And the data:

       acc    |          name           |     rank     
    ----------+-------------------------+--------------
     AJ012531 | Paromalostomum fusculum | species
     AJ012531 | Paromalostomum          | genus
     AJ012531 | Macrostomidae           | family
     AJ012531 | Macrostomida            | order
     AJ012531 | Macrostomorpha          | no rank
     AJ012531 | Turbellaria             | class
     AJ012531 | Platyhelminthes         | phylum
     AJ012531 | Acoelomata              | no rank
     AJ012531 | Bilateria               | no rank
     AJ012531 | Eumetazoa               | no rank
     AJ012531 | Metazoa                 | kingdom
     AJ012531 | Fungi/Metazoa group     | no rank
     AJ012531 | Eukaryota               | superkingdom
     AJ012531 | cellular organisms      | no rank
    

    What I am trying to get is the following:

    acc      | species                  | phylum
    AJ012531 | Paromalostomum fusculum  | Platyhelminthes
    

    I am trying to do this with CASE WHEN, so I've got as far as the following:

    select 
    acc2tax_node.acc, 
    CASE tax_node.rank WHEN 'species' THEN tax_node.name ELSE NULL END as species, 
    CASE tax_node.rank WHEN 'phylum' THEN tax_node.name ELSE NULL END as phylum 
    from 
    tax_node, acc2tax_node 
    where 
    tax_node.taxid=acc2tax_node.taxid and acc2tax_node.acc='AJ012531';
    

    Which gives me the output:

       acc    |         species         |     phylum      
    ----------+-------------------------+-----------------
     AJ012531 | Paromalostomum fusculum | 
     AJ012531 |                         | 
     AJ012531 |                         | 
     AJ012531 |                         | 
     AJ012531 |                         | 
     AJ012531 |                         | 
     AJ012531 |                         | Platyhelminthes
     AJ012531 |                         | 
     AJ012531 |                         | 
     AJ012531 |                         | 
     AJ012531 |                         | 
     AJ012531 |                         | 
     AJ012531 |                         | 
     AJ012531 |                         | 
    

    Now I know that I have to group by acc at some point, so I try

    select 
    acc2tax_node.acc, 
    CASE tax_node.rank WHEN 'species' THEN tax_node.name ELSE NULL END as sp, 
    CASE tax_node.rank WHEN 'phylum' THEN tax_node.name ELSE NULL END as ph 
    from 
    tax_node, acc2tax_node 
    where 
    tax_node.taxid=acc2tax_node.taxid and acc2tax_node.acc='AJ012531' 
    group by acc2tax_node.acc;
    

    But I get the dreaded

    ERROR:  column "tax_node.rank" must appear in the GROUP BY clause or be used in an aggregate function
    

    All the previous examples I've been able to find use something like SUM() around the CASE statements, so I guess that is the aggregate function. I have tried using FIRST():

    select 
    acc2tax_node.acc, 
    FIRST(CASE tax_node.rank WHEN 'species' THEN tax_node.name ELSE NULL END) as sp, 
    FIRST(CASE tax_node.rank WHEN 'phylum' THEN tax_node.name ELSE NULL END) as ph 
    from tax_node, acc2tax_node where tax_node.taxid=acc2tax_node.taxid and acc2tax_node.acc='AJ012531' group by acc2tax_node.acc;
    

    but get the error:

    ERROR:  function first(character varying) does not exist
    

    Can anyone offer any hints?