Count number of rows within each group
Solution 1
Current best practice (tidyverse) is:
require(dplyr)
df1 %>% count(Year, Month)
Solution 2
Following @Joshua's suggestion, here's one way you might count the number of observations in your df
dataframe where Year
= 2007 and Month
= Nov (assuming they are columns):
nrow(df[,df$YEAR == 2007 & df$Month == "Nov"])
and with aggregate
, following @GregSnow:
aggregate(x ~ Year + Month, data = df, FUN = length)
Solution 3
dplyr
package does this with count
/tally
commands, or the n()
function:
First, some data:
df <- data.frame(x = rep(1:6, rep(c(1, 2, 3), 2)), year = 1993:2004, month = c(1, 1:11))
Now the count:
library(dplyr)
count(df, year, month)
#piping
df %>% count(year, month)
We can also use a slightly longer version with piping and the n()
function:
df %>%
group_by(year, month) %>%
summarise(number = n())
or the tally
function:
df %>%
group_by(year, month) %>%
tally()
Solution 4
An old question without a data.table
solution. So here goes...
Using .N
library(data.table)
DT <- data.table(df)
DT[, .N, by = list(year, month)]
Solution 5
The simple option to use with aggregate
is the length
function which will give you the length of the vector in the subset. Sometimes a little more robust is to use function(x) sum( !is.na(x) )
.
MikeTP
Energy commodity trader learning R to better analyze big high frequency time series data.
Updated on October 29, 2021Comments
-
MikeTP over 2 years
I have a dataframe and I would like to count the number of rows within each group. I reguarly use the
aggregate
function to sum data as follows:df2 <- aggregate(x ~ Year + Month, data = df1, sum)
Now, I would like to count observations but can't seem to find the proper argument for
FUN
. Intuitively, I thought it would be as follows:df2 <- aggregate(x ~ Year + Month, data = df1, count)
But, no such luck.
Any ideas?
Some toy data:
set.seed(2) df1 <- data.frame(x = 1:20, Year = sample(2012:2014, 20, replace = TRUE), Month = sample(month.abb[1:3], 20, replace = TRUE))
-
Joshua Ulrich about 12 years
nrow
,NROW
,length
... -
Hong Ooi about 12 yearsI keep reading this question as asking for a fun way to count things (as opposed to the many unfun ways, I guess).
-
Prolix over 8 years@JoshuaUlrich:
nrow
did not work for me butNROW
andlength
worked fine. +1
-
-
sop almost 9 yearsIs there a way to aggregate a variable and do counting too (like 2 functions in aggregation: mean + count)? I need to get the mean of a column and the number of rows for the same value in other column
-
geotheory almost 9 yearsI'd
cbind
the results ofaggregate(Sepal.Length ~ Species, iris, mean)
andaggregate(Sepal.Length ~ Species, iris, length)
-
sop almost 9 yearsI have done it, but it seems that I get 2 times each column except the one that is aggregated; so I have done a merge on them and it seems to be ok
-
Manoj Kumar over 7 yearsI don't know but this could be useful as well...
df %>% group_by(group, variable) %>% mutate(count = n())
-
geotheory over 7 yearsYes dplyr is best practice now.
-
thelatemail almost 5 yearsJust to note that if you are using the default, non-formula method for
aggregate
, there is no need to rename each variable inby=
likelist(year=df1$year)
etc. Adata.frame
is alist
already soaggregate(df1[c("Count")], by=df1[c("Year", "Month")], FUN=sum, na.rm=TRUE)
will work. -
sindri_baldur over 4 yearsstandard nowadays to use
.()
instead oflist()
andsetDT()
to convert a data.frame to data.table. So in one stepsetDT(df)[, .N, by = .(year, month)]
. -
camille over 2 yearsI'm a daily dplyr user but still wouldn't call it necessarily best practice, more like common personal preference
-
geotheory over 2 yearsYou are perfectly right - dplyr isn't best for all cases, e.g. data.table or poorman might be preferable. And what does 'best practice' mean anyway?