list all factor levels of a data.frame

96,389

Solution 1

Here are some options. We loop through the 'data' with sapply and get the levels of each column (assuming that all the columns are factor class)

sapply(data, levels)

Or if we need to pipe (%>%) it, this can be done as

library(dplyr)
data %>% 
     sapply(levels)

Or another option is summarise_each from dplyr where we specify the levels within the funs.

 data %>%
      summarise_each(funs(list(levels(.))))

Solution 2

If your problem is specifically to output a list of all levels for a factor, then I have found a simple solution using :

unique(df$x)

For instance, for the infamous iris dataset:

unique(iris$Species)

Solution 3

Or using purrr:

data %>% purrr::map(levels)

Or to first factorize everything:

data %>% dplyr::mutate_all(as.factor) %>% purrr::map(levels)

And answering the question about how to get the lengths:

data %>% map(levels) %>% map(length)

Solution 4

A simpler method is to use the sqldf package and use a select distinct statement. This makes it easier to automatically get the names of factor levels and then specify as levels to other columns/variables.

Generic code snippet is:

library(sqldf)
    array_name = sqldf("select DISTINCT *colname1* as '*column_title*' from *table_name*")

Sample code using iris dataset:

df1 = iris
factor1 <- sqldf("select distinct Species as 'flower_type' from df1")
factor1    ## to print the names of factors

Output:

  flower_type
1      setosa
2  versicolor
3   virginica

Solution 5

In case you want to display factor levels only for thos columns which are declared as.factor, you can use:

lapply(df[sapply(df, is.factor)], levels)
Share:
96,389
ckluss
Author by

ckluss

Updated on April 01, 2021

Comments

  • ckluss
    ckluss about 3 years

    with str(data) I get the headof the levels (1-2 values)

    fac1: Factor w/ 2  levels ... :
    fac2: Factor w/ 5  levels ... :
    fac3: Factor w/ 20 levels ... :
    val: num ...
    

    with dplyr::glimpse(data) I get more values, but no infos about number/values of factor-levels. Is there an automatic way to get all level informations of all factor vars in a data.frame? A short form with more info for

    levels(data$fac1)
    levels(data$fac2)
    levels(data$fac3)
    

    or more precisely a elegant version for something like

    for (n in names(data))
      if (is.factor(data[[n]])) {
        print(n)
        print(levels(data[[n]]))
      }
    

    thx Christof

  • BigDataScientist
    BigDataScientist about 8 years
    How do we get length of all of those levels
  • G. Grothendieck
    G. Grothendieck almost 8 years
    If you indent each code line by 4 spaces it will format itself properly.
  • Amit Kohli
    Amit Kohli over 6 years
    @BigDataScientist check my answer
  • igorkf
    igorkf over 4 years
    Nice approach. I like it.