correct way to create a pivot table in postgresql using CASE WHEN
Solution 1
Use MAX() or MIN(), not FIRST(). In this scenario, you will have all NULLs in the column per each group value except for, at most, one with a not null value. By definition, this is both the MIN and the MAX of that set of values (all nulls are excluded).
Solution 2
PostgreSQL does have a couple of functions for pivot queries, see this article at Postgresonline. You can find these functions in the contrib.
Solution 3
As Matthew Wood pointed out, use MIN() or MAX(), not FIRST():
SELECT
an.acc,
MAX(
CASE tn.rank
WHEN 'species' THEN tn.name
ELSE NULL
END
) AS species,
MAX(
CASE tn.rank
WHEN 'phylum' THEN tn.name
ELSE NULL
END
) AS phylum
FROM tax_node tn,
acc2tax_node an
WHERE tn.taxid = an.taxid
and an.acc = 'AJ012531'
GROUP by an.acc;
mojones
Updated on June 24, 2022Comments
-
mojones about 2 years
I am trying to create a pivot table type view in postgresql and am nearly there! Here is the basic query:
select acc2tax_node.acc, tax_node.name, tax_node.rank from tax_node, acc2tax_node where tax_node.taxid=acc2tax_node.taxid and acc2tax_node.acc='AJ012531';
And the data:
acc | name | rank ----------+-------------------------+-------------- AJ012531 | Paromalostomum fusculum | species AJ012531 | Paromalostomum | genus AJ012531 | Macrostomidae | family AJ012531 | Macrostomida | order AJ012531 | Macrostomorpha | no rank AJ012531 | Turbellaria | class AJ012531 | Platyhelminthes | phylum AJ012531 | Acoelomata | no rank AJ012531 | Bilateria | no rank AJ012531 | Eumetazoa | no rank AJ012531 | Metazoa | kingdom AJ012531 | Fungi/Metazoa group | no rank AJ012531 | Eukaryota | superkingdom AJ012531 | cellular organisms | no rank
What I am trying to get is the following:
acc | species | phylum AJ012531 | Paromalostomum fusculum | Platyhelminthes
I am trying to do this with CASE WHEN, so I've got as far as the following:
select acc2tax_node.acc, CASE tax_node.rank WHEN 'species' THEN tax_node.name ELSE NULL END as species, CASE tax_node.rank WHEN 'phylum' THEN tax_node.name ELSE NULL END as phylum from tax_node, acc2tax_node where tax_node.taxid=acc2tax_node.taxid and acc2tax_node.acc='AJ012531';
Which gives me the output:
acc | species | phylum ----------+-------------------------+----------------- AJ012531 | Paromalostomum fusculum | AJ012531 | | AJ012531 | | AJ012531 | | AJ012531 | | AJ012531 | | AJ012531 | | Platyhelminthes AJ012531 | | AJ012531 | | AJ012531 | | AJ012531 | | AJ012531 | | AJ012531 | | AJ012531 | |
Now I know that I have to group by acc at some point, so I try
select acc2tax_node.acc, CASE tax_node.rank WHEN 'species' THEN tax_node.name ELSE NULL END as sp, CASE tax_node.rank WHEN 'phylum' THEN tax_node.name ELSE NULL END as ph from tax_node, acc2tax_node where tax_node.taxid=acc2tax_node.taxid and acc2tax_node.acc='AJ012531' group by acc2tax_node.acc;
But I get the dreaded
ERROR: column "tax_node.rank" must appear in the GROUP BY clause or be used in an aggregate function
All the previous examples I've been able to find use something like SUM() around the CASE statements, so I guess that is the aggregate function. I have tried using FIRST():
select acc2tax_node.acc, FIRST(CASE tax_node.rank WHEN 'species' THEN tax_node.name ELSE NULL END) as sp, FIRST(CASE tax_node.rank WHEN 'phylum' THEN tax_node.name ELSE NULL END) as ph from tax_node, acc2tax_node where tax_node.taxid=acc2tax_node.taxid and acc2tax_node.acc='AJ012531' group by acc2tax_node.acc;
but get the error:
ERROR: function first(character varying) does not exist
Can anyone offer any hints?