Coerce multiple columns to factors at once
Solution 1
Choose some columns to coerce to factors:
cols <- c("A", "C", "D", "H")
Use lapply()
to coerce and replace the chosen columns:
data[cols] <- lapply(data[cols], factor) ## as.factor() could also be used
Check the result:
sapply(data, class)
# A B C D E F G
# "factor" "integer" "factor" "factor" "integer" "integer" "integer"
# H I J
# "factor" "integer" "integer"
Solution 2
Here is an option using dplyr
. The %<>%
operator from magrittr
update the lhs object with the resulting value.
library(magrittr)
library(dplyr)
cols <- c("A", "C", "D", "H")
data %<>%
mutate_each_(funs(factor(.)),cols)
str(data)
#'data.frame': 4 obs. of 10 variables:
# $ A: Factor w/ 4 levels "23","24","26",..: 1 2 3 4
# $ B: int 15 13 39 16
# $ C: Factor w/ 4 levels "3","5","18","37": 2 1 3 4
# $ D: Factor w/ 4 levels "2","6","28","38": 3 1 4 2
# $ E: int 14 4 22 20
# $ F: int 7 19 36 27
# $ G: int 35 40 21 10
# $ H: Factor w/ 4 levels "11","29","32",..: 1 4 3 2
# $ I: int 17 1 9 25
# $ J: int 12 30 8 33
Or if we are using data.table
, either use a for
loop with set
setDT(data)
for(j in cols){
set(data, i=NULL, j=j, value=factor(data[[j]]))
}
Or we can specify the 'cols' in .SDcols
and assign (:=
) the rhs to 'cols'
setDT(data)[, (cols):= lapply(.SD, factor), .SDcols=cols]
Solution 3
The more recent tidyverse
way is to use the mutate_at
function:
library(tidyverse)
library(magrittr)
set.seed(88)
data <- data.frame(matrix(sample(1:40), 4, 10, dimnames = list(1:4, LETTERS[1:10])))
cols <- c("A", "C", "D", "H")
data %<>% mutate_at(cols, factor)
str(data)
$ A: Factor w/ 4 levels "5","17","18",..: 2 1 4 3
$ B: int 36 35 2 26
$ C: Factor w/ 4 levels "22","31","32",..: 1 2 4 3
$ D: Factor w/ 4 levels "1","9","16","39": 3 4 1 2
$ E: int 3 14 30 38
$ F: int 27 15 28 37
$ G: int 19 11 6 21
$ H: Factor w/ 4 levels "7","12","20",..: 1 3 4 2
$ I: int 23 24 13 8
$ J: int 10 25 4 33
Solution 4
You can use mutate_if
(dplyr
):
For example, coerce integer
in factor
:
mydata=structure(list(a = 1:10, b = 1:10, c = c("a", "a", "b", "b",
"c", "c", "c", "c", "c", "c")), row.names = c(NA, -10L), class = c("tbl_df",
"tbl", "data.frame"))
# A tibble: 10 x 3
a b c
<int> <int> <chr>
1 1 1 a
2 2 2 a
3 3 3 b
4 4 4 b
5 5 5 c
6 6 6 c
7 7 7 c
8 8 8 c
9 9 9 c
10 10 10 c
Use the function:
library(dplyr)
mydata%>%
mutate_if(is.integer,as.factor)
# A tibble: 10 x 3
a b c
<fct> <fct> <chr>
1 1 1 a
2 2 2 a
3 3 3 b
4 4 4 b
5 5 5 c
6 6 6 c
7 7 7 c
8 8 8 c
9 9 9 c
10 10 10 c
Solution 5
and, for completeness and with regards to this question asking about changing string columns only, there's mutate_if
:
data <- cbind(stringVar = sample(c("foo","bar"),10,replace=TRUE),
data.frame(matrix(sample(1:40), 10, 10, dimnames = list(1:10, LETTERS[1:10]))),stringsAsFactors=FALSE)
factoredData = data %>% mutate_if(is.character,funs(factor(.)))
Related videos on Youtube
wsda
Updated on December 03, 2021Comments
-
wsda over 2 years
I have a sample data frame like below:
data <- data.frame(matrix(sample(1:40), 4, 10, dimnames = list(1:4, LETTERS[1:10])))
I want to know how can I select multiple columns and convert them together to factors. I usually do it in the way like
data$A = as.factor(data$A)
. But when the data frame is very large and contains lots of columns, this way will be very time consuming. Does anyone know of a better way to do it?-
Zheyuan Li almost 6 yearsAll answers here are using function
factor
notas.factor
(as you did). In fact, usingas.factor
is preferred: Why useas.factor()
instead of justfactor()
-
-
TayTay over 8 yearsWouldn't it need to be
data[,cols] <- lapply(data[,cols], factor)
(with the leading comma for columns)? -
Rich Scriven over 8 years@Tgsmith61591- It could be either. With the comma is a matrix-type subset, without the comma is a list subset. Data frames can be subsetted by either one so either way would work.
-
cbrnr about 6 yearsYou don't even need to use
funs
if you only perform one transformation;mutate_at(cols, factor)
is sufficient. -
Ben almost 6 yearsHow can this solution be expanded to include factor levels and labels?
-
Rich Scriven over 5 years@Ben - It's probably best to ask a new question
-
Microscone almost 5 yearsThis is a great solution, and my go-to code now for changing column classes. However, I think using sapply to view the classes is ugly/hard to read. str(data) works better for me.
-
Tan Naidu almost 5 yearsTo add to Rich Scriven's answer, I had too many columns and didn't want to name all of them. I ended up using indices such as in sample below: cols <- c(2, 5, 7, 14:16) data[cols] <- lapply(data[cols], factor)
-
Brian D almost 5 years@Ben you can specify labels and levels by extending the answer:
data[cols] <- lapply(data[cols], factor, levels=c("val1", "val2", ...), labels=c("label1", "label2", ...)
be careful with this though... all of the variables will use the same levels and labels you provide. -
Casey Jayne over 2 yearscan you add your citation for why we need/should use 'across'? I don't see it in R4DS or the ?dplyr page
-
GuedesBF over 2 yearsdplyr.tidyverse.org/reference/across.html "across() supersedes the family of "scoped variants" like summarise_at(), summarise_if(), and summarise_all()."