Calculate 95th percentile of values with grouping variable
Solution 1
This can be achieved using the plyr
library. We specify the grouping variable Watershed
and ask for the 95% quantile of WQ.
library(plyr)
#Random seed
set.seed(42)
#Sample data
dat <- data.frame(Watershed = sample(letters[1:2], 100, TRUE), WQ = rnorm(100))
#plyr call
ddply(dat, "Watershed", summarise, WQ95 = quantile(WQ, .95))
and the results
Watershed WQ95
1 a 1.353993
2 b 1.461711
Solution 2
I hope I understand your question correctly. Is this what you're looking for?
my.df <- data.frame(group = gl(3, 5), var = runif(15))
aggregate(my.df$var, by = list(my.df$group), FUN = function(x) quantile(x, probs = 0.95))
Group.1 x
1 1 0.6913747
2 2 0.8067847
3 3 0.9643744
EDIT
Based on Vincent's answer,
aggregate(my.df$var, by = list(my.df$group), FUN = quantile, probs = 0.95)
also works (you can skin a cat 1001 ways - I've been told). A side note, you can specify a vector of desired -iles, say c(0.1, 0.2, 0.3...)
for deciles. Or you can try function summary
for some predefined statistics.
aggregate(my.df$var, by = list(my.df$group), FUN = summary)
Solution 3
Use a combination of the tapply and quantile functions. For example, if your dataset looks like this:
DF <- data.frame('watershed'=sample(c('a','b','c','d'), 1000, replace=T), wq=rnorm(1000))
Use this:
with(DF, tapply(wq, watershed, quantile, probs=0.95))
Solution 4
In Excel, you're going to want to use an array formula to make this easy. I suggest the following:
{=PERCENTILE(IF($A2:$A6 = Watershed ID, $B$2:$B$6), 0.95)}
Column A would be the Watershed ids, and Column B would be the WQ values.
Also, be sure to enter the formula as an array formula. Do so by pressing Ctrl+Shift+Enter when entering the formula.
Related videos on Youtube
Christine Mazzarella
Updated on January 21, 2021Comments
-
Christine Mazzarella almost 3 years
I'm trying to calculate the 95th percentile for multiple water quality values grouped by watershed, for example:
Watershed WQ 50500101 62.370661 50500101 65.505046 50500101 58.741477 50500105 71.220034 50500105 57.917249
I reviewed this question posted - Percentile for Each Observation w/r/t Grouping Variable. It seems very close to what I want to do but it's for EACH observation. I need it for each grouping variable. so ideally,
Watershed WQ - 95th 50500101 x 50500105 y
-
Vincent over 12 yearsand I had never used gl before... :)
-
Vincent over 12 yearsRichie: is that 'with' edit really an improvement? I don't mind it, but I'm just wondering if you just find it more elegant that way or if there's an actual technical benefit.
-
Richie Cotton over 12 yearsI'd be tempted to use
daply
, since the results nicely condense to an array, e.g.,daply(dat, .(Watershed), function(x) quantile(x$WQ, 0.95))
. -
Excellll over 12 yearsPlug in the value for Watershed ID. That was just a placeholder. For instance {=PERCENTILE(IF($A2:$A6 = 50500101, $B$2:$B$6), 0.95)}
-
Excellll over 12 yearsIf you use a cell reference for the Watershed ID, you can fill down the formula for all IDs in the table.
-
hadley over 12 yearsData frames are usually easier to work with in terms of future aggregations and joining back to the original data
-
Roman Luštrik over 12 yearsI find it a matter of taste, although it may have its advantages if you want it a bit more dynamic.