Converting a factor with 2 levels to binary values 0/1 in R

53,946

Solution 1

As an addition to @Dason's answer, note that...

test <- c("male","female")

as.factor(test)
#[1] male   female
#Levels: female male

...will return female as the reference group (1) and male as the comparison group (2),

To spin it the other way, you would need to do...

factor(test,levels=c("male","female"))
#[1] male   female
#Levels: male female

As @marius notes, using contrasts will show you how it will work in the regression model:

contrasts(as.factor(test))
#       male
#female    0
#male      1

contrasts(factor(test,levels=c("male","female")))
#       female
#male        0
#female      1

Solution 2

Convert to a factor and let R take care of the rest. You should never have to take care of explicitly creating dummy variables when using R.

Solution 3

If you're doing this for real, you should absolutely follow @Dason's advice. I'm going to assume that you're teaching a class and want to demonstrate indicator variables (with thanks to this question):

dat <- data.frame(gender=sample(c("male", "female"), 10, replace=TRUE))

model.matrix(~gender, data=dat)

   (Intercept) gendermale
1            1          1
2            1          0
3            1          1
4            1          0
5            1          1
6            1          1
7            1          1
8            1          0
9            1          0
10           1          1
attr(,"assign")
[1] 0 1
attr(,"contrasts")
attr(,"contrasts")$gender
[1] "contr.treatment"

If you don't want the intercept, use model.matrix(~gender -1 , data=dat) instead.

Share:
53,946
Admin
Author by

Admin

Updated on July 09, 2022

Comments

  • Admin
    Admin almost 2 years

    I have a variable, called gender, with binary categorical values "female"/"male". I want to change its type to integers 0/1 so that I can use it in a regression analysis. i.e I want values "female" and "male" to be mapped to 1 and 0.

    > str(gender)
    gender : Factor w/ 2 levels "female","male":  1 1 1 0 0 0 0 1 1 0 ...
    > gender[1]
    [1] female
    

    I would like to convert gender variable type so that I get int value 1 when I query an element, i.e.

    > gender[1]
    [1] 1
    
  • mnel
    mnel over 11 years
    +1 far better to address the real issue, not the exact problem!
  • Marius
    Marius over 11 years
    Or, to see even more explicitly how the levels will be treated in a regression model, contrasts(factor(test))
  • Kevin T
    Kevin T about 3 years
    @Dason, what about if you wanted to include gender in a correlation matrix? This will not work if gender is a factor.