Removing Specific factor level from factor variable

21,460
str(
  as.data.frame(
    lapply(
      df2, 
      function(x) factor(as.character(x), levels=levels(x)[levels(x) != "e"])
) ) )
# 'data.frame':  10 obs. of  3 variables:
# $ var1: Factor w/ 4 levels "a","b","c","d": 1 2 3 4 NA 1 2 3 4 NA
# $ var2: Factor w/ 4 levels "a","b","c","d": NA 4 3 2 1 NA 4 3 2 1
# $ var3: Factor w/ 4 levels "a","c","d","b": 1 2 3 NA 1 2 3 NA 1 2
Share:
21,460
user2460499
Author by

user2460499

Updated on July 11, 2022

Comments

  • user2460499
    user2460499 almost 2 years

    I have a data frame that has several variables that have 5 factor levels. I want to delete only one of those levels. First I assigned all instances of of that level to NA, and then used the droplevels command to get rid the empty levels.

    However for one variable in my data frame one of the levels I don't want dropped has no observations in it. Is there a way to remove only a specific factor level, and not just the empty ones.

    Here is a reproducible example

    df <- data.frame(var1=rep(letters[1:5],2),var2=rep(letters[5:1],2),var3=c("a","c","d","e","a","c","d","e","a","c"))
    levels(df$var3)<-c("a","c","d","e","b")
    

    This sets up a data frame like mine. Now I want to remove all instances of the level e, and then drop it as a possible level. I do this with the code below.

    df2<-replace(df, df=="e",NA)
    df2<-droplevels(df2)
    

    The problem is when I use droplevels it drops level b from var3 also. I don't want to remove level b just level e from all of the variables. I have looked for a way to remove just a specific level, but have not found the answer. Can anyone show me how to remove just a specific factor level? What I would ideally like is a droplevels command that I can tell to just remove level e. Does such a function exist?

  • BrodieG
    BrodieG over 10 years
    I think he wishes to drop the e level from all columns
  • IRTFM
    IRTFM over 10 years
    I do not think the as.character is needed.
  • BrodieG
    BrodieG over 10 years
    You're right, but I'm always super wary of factors all of a sudden behaving like their underlying numbers as opposed to their "values". Clearly within the factor function expecting normal behavior is reasonable.
  • PatrickT
    PatrickT over 6 years
    as.data.frame messes with variable names, but check.names = FALSE helps apparently.