Count number of rows within each group

r dataframe aggregate r-faq

326,052

Solution 1

Current best practice (tidyverse) is:

require(dplyr)
df1 %>% count(Year, Month)

Solution 2

Following @Joshua's suggestion, here's one way you might count the number of observations in your df dataframe where Year = 2007 and Month = Nov (assuming they are columns):

nrow(df[,df$YEAR == 2007 & df$Month == "Nov"])

and with aggregate, following @GregSnow:

aggregate(x ~ Year + Month, data = df, FUN = length)

Solution 3

dplyr package does this with count/tally commands, or the n() function:

First, some data:

df <- data.frame(x = rep(1:6, rep(c(1, 2, 3), 2)), year = 1993:2004, month = c(1, 1:11))

Now the count:

library(dplyr)
count(df, year, month)
#piping
df %>% count(year, month)

We can also use a slightly longer version with piping and the n() function:

df %>% 
  group_by(year, month) %>%
  summarise(number = n())

or the tally function:

df %>% 
  group_by(year, month) %>%
  tally()

Solution 4

An old question without a data.table solution. So here goes...

Using .N

library(data.table)
DT <- data.table(df)
DT[, .N, by = list(year, month)]

Solution 5

The simple option to use with aggregate is the length function which will give you the length of the vector in the subset. Sometimes a little more robust is to use function(x) sum( !is.na(x) ).

View more solutions

326,052

Author by

MikeTP

Energy commodity trader learning R to better analyze big high frequency time series data.

Updated on October 29, 2021

Comments

MikeTP over 2 years
I have a dataframe and I would like to count the number of rows within each group. I reguarly use the aggregate function to sum data as follows:
```
df2 <- aggregate(x ~ Year + Month, data = df1, sum)
```
Now, I would like to count observations but can't seem to find the proper argument for FUN. Intuitively, I thought it would be as follows:
```
df2 <- aggregate(x ~ Year + Month, data = df1, count)
```
But, no such luck.

Any ideas?

Some toy data:
```
set.seed(2)
df1 <- data.frame(x = 1:20,
                  Year = sample(2012:2014, 20, replace = TRUE),
                  Month = sample(month.abb[1:3], 20, replace = TRUE))
```
- Joshua Ulrich about 12 years
  
  nrow, NROW, length...
- Hong Ooi about 12 years
  
  I keep reading this question as asking for a fun way to count things (as opposed to the many unfun ways, I guess).
- Prolix over 8 years
  
  @JoshuaUlrich: nrow did not work for me but NROW and lengthworked fine. +1
sop almost 9 years

Is there a way to aggregate a variable and do counting too (like 2 functions in aggregation: mean + count)? I need to get the mean of a column and the number of rows for the same value in other column
geotheory almost 9 years

I'd cbind the results of aggregate(Sepal.Length ~ Species, iris, mean) and aggregate(Sepal.Length ~ Species, iris, length)
sop almost 9 years

I have done it, but it seems that I get 2 times each column except the one that is aggregated; so I have done a merge on them and it seems to be ok
Manoj Kumar over 7 years

I don't know but this could be useful as well... df %>% group_by(group, variable) %>% mutate(count = n())
geotheory over 7 years

Yes dplyr is best practice now.
thelatemail almost 5 years

Just to note that if you are using the default, non-formula method for aggregate, there is no need to rename each variable in by= like list(year=df1$year) etc. A data.frame is a list already so aggregate(df1[c("Count")], by=df1[c("Year", "Month")], FUN=sum, na.rm=TRUE) will work.
sindri_baldur over 4 years

standard nowadays to use .() instead of list() and setDT() to convert a data.frame to data.table. So in one step setDT(df)[, .N, by = .(year, month)].
camille over 2 years

I'm a daily dplyr user but still wouldn't call it necessarily best practice, more like common personal preference
geotheory over 2 years

You are perfectly right - dplyr isn't best for all cases, e.g. data.table or poorman might be preferable. And what does 'best practice' mean anyway?