How to select columns conditionally in a data frame in R
Solution 1
You're looking for aggregate
. Here is a forumla that returns the median age and weight by sex:
aggregate(cbind(age, weight) ~ sex, data=jalal, FUN=median)
## sex age weight
## 1 F 20.5 189.9
## 2 M 21.0 198.1
To get a data frame containing just the women, here is the syntax for [
:
jalal[jalal$sex == 'F',]
Note the quotes around 'F'
. A bare F
means FALSE
. That's why your second subset
expression fails.
subset(jalal, subset=(sex =='F'))
## age sex weight eye.color hair.color
## 1 23 F 93.8 blue black
## 3 22 F 196.5 hazel gray
## 6 16 F 152.1 blue gray
...
In the comment, it is requested for a method for the mean values for women with blue eyes. The first approach is to filter the data frame to just blue-eyed people:
aggregate(cbind(age, weight) ~ sex, data=jalal[jalal$eye.color == 'blue',], FUN=mean)
## sex age weight
## 1 F 19.66667 151.7667
## 2 M 18.00000 212.8500
But this seems hackish, after all, we're not filtering the data frame on women. So here is a formula that gives the mean age and weight, by sex and eye color. From this, you can find the mean of blue-eyed women, green-eyed men, etc.:
aggregate(cbind(age, weight) ~ sex + eye.color, data=jalal, FUN=mean)
## sex eye.color age weight
## 1 M amber 21.50000 218.5000
## 2 F blue 19.66667 151.7667
## 3 M blue 18.00000 212.8500
## 4 M brown 19.33333 194.9000
## 5 F gray 19.00000 194.6333
## 6 M gray 23.00000 198.2000
## 7 F green 18.50000 221.0500
## 8 M green 21.50000 183.5500
## 9 F hazel 21.50000 176.9500
Note rows 2 and 3 here match the results in the prior expression.
Solution 2
Here's an alternative solution using the data.table
package:
require(data.table)
jalal <- as.data.table(jalal)
To subset on females:
jalal[sex == "F"]
To calculate the mean, median, etc:
> jalal[sex == "F", mean(weight)]
[1] 183.52
> jalal[sex == "F", list(mean(weight), median(age))]
V1 V2
1: 183.52 20.5
Mona Jalal
contact me at [email protected] I am a 5th-year computer science Ph.D. Candidate at Boston University advised by Professor Vijaya Kolachalama in computer vision as the area of study. Currently, I am working on my proposal exam and thesis on the use of efficient computer vision and deep learning for cancer detection in H&E stained digital pathology images.
Updated on June 04, 2022Comments
-
Mona Jalal almost 2 years
How can I find the mean/median (any other such thing) of women? I have tried a few piece of code to access the women data in particular but was unsuccessful. Any help is really appreciated.
> jalal <- read.csv("jalal.csv", header=TRUE,sep=",") > which(jalal$sex==F) integer(0) > jalal age sex weight eye.color hair.color 1 23 F 93.8 blue black 2 21 M 180.8 amber gray 3 22 F 196.5 hazel gray 4 22 M 256.2 amber black 5 21 M 219.6 blue gray 6 16 F 152.1 blue gray 7 21 F 183.3 gray chestnut 8 18 M 179.1 brown blond 9 15 M 206.1 blue white 10 19 M 211.6 brown blond 11 20 F 209.4 blue white 12 21 M 194.0 brown auburn 13 22 F 204.1 green black 14 21 F 157.4 hazel red 15 15 F 238.0 green gray 16 20 F 154.8 gray gray 17 16 F 245.8 gray gray 18 23 M 198.2 gray red 19 19 M 169.1 green brown 20 24 M 198.0 green gray > subset(jalal, subset=(sex =F)) -> females > females [1] age sex weight eye.color hair.color <0 rows> (or 0-length row.names) > subset(jalal, subset=(sex ==F)) -> females > females [1] age sex weight eye.color hair.color <0 rows> (or 0-length row.names)
Here's what's in jalal.csv:
"age","sex","weight","eye.color","hair.color" 23,"F",93.8,"blue","black" 21,"M",180.8,"amber","gray" 22,"F",196.5,"hazel","gray" 22,"M",256.2,"amber","black" 21,"M",219.6,"blue","gray" 16,"F",152.1,"blue","gray" 21,"F",183.3,"gray","chestnut" 18,"M",179.1,"brown","blond" 15,"M",206.1,"blue","white" 19,"M",211.6,"brown","blond" 20,"F",209.4,"blue","white" 21,"M",194,"brown","auburn" 22,"F",204.1,"green","black" 21,"F",157.4,"hazel","red" 15,"F",238,"green","gray" 20,"F",154.8,"gray","gray" 16,"F",245.8,"gray","gray" 23,"M",198.2,"gray","red" 19,"M",169.1,"green","brown" 24,"M",198,"green","gray"
-
Mona Jalal over 10 yearsAlso I was wondering if the
fun
can count instead of just mean/median/weighted mean! Like how can I use aggregate to count number of people who have brown or black eyes!? I couldn't find a function for counting in?aggregate
--Basically I want to know how to find a list of "fun" functions inaggregate
-
Matthew Lundberg over 10 yearsA count is a vector length in R. Pass
FUN=length
for this. It's easiest to create a column of 1's (jalal$count <- 1
) and usecount
in place ofcbind(age, weight)
in the formula. -
Matthew Lundberg over 10 yearsYou can name the columns:
list(MeanWeight=mean(weight), MedianAge=median(age))
-
Scott Ritchie over 10 yearsThanks! I'm still in the process of learning the data.table syntax.
-
Mona Jalal over 10 years@Mathew Lundberg: Can I find how old is the third heaviest person using
aggregate
function? I was trying this but it wasn't helpful:> aggregate( age~weight, data=jalal, FUN=rank)