converting multiple columns from character to numeric format in r
Solution 1
You could try
DF <- data.frame("a" = as.character(0:5),
"b" = paste(0:5, ".1", sep = ""),
"c" = letters[1:6],
stringsAsFactors = FALSE)
# Check columns classes
sapply(DF, class)
# a b c
# "character" "character" "character"
cols.num <- c("a","b")
DF[cols.num] <- sapply(DF[cols.num],as.numeric)
sapply(DF, class)
# a b c
# "numeric" "numeric" "character"
Solution 2
If you're already using the tidyverse, there are a few solution depending on the exact situation.
Basic if you know it's all numbers and doesn't have NAs
library(dplyr)
# solution
dataset %>% mutate_if(is.character,as.numeric)
Test cases
df <- data.frame(
x1 = c('1','2','3'),
x2 = c('4','5','6'),
x3 = c('1','a','x'), # vector with alpha characters
x4 = c('1',NA,'6'), # numeric and NA
x5 = c('1',NA,'x'), # alpha and NA
stringsAsFactors = F)
# display starting structure
df %>% str()
Convert all character vectors to numeric (could fail if not numeric)
df %>%
select(-x3) %>% # this removes the alpha column if all your character columns need converted to numeric
mutate_if(is.character,as.numeric) %>%
str()
Check if each column can be converted. This can be an anonymous function. It returns FALSE
if there is a non-numeric or non-NA character somewhere. It also checks if it's a character vector to ignore factors. na.omit removes original NAs before creating "bad" NAs.
is_all_numeric <- function(x) {
!any(is.na(suppressWarnings(as.numeric(na.omit(x))))) & is.character(x)
}
df %>%
mutate_if(is_all_numeric,as.numeric) %>%
str()
If you want to convert specific named columns, then mutate_at is better.
df %>% mutate_at('x1', as.numeric) %>% str()
Solution 3
You can use index of columns:
data_set[,1:9] <- sapply(dataset[,1:9],as.character)
Solution 4
I used this code to convert all columns to numeric except the first one:
library(dplyr)
# check structure, row and column number with: glimpse(df)
# convert to numeric e.g. from 2nd column to 10th column
df <- df %>%
mutate_at(c(2:10), as.numeric)
Solution 5
Using the across() function from dplyr 1.0
df <- df %>% mutate(across(, ~as.numeric(.))
Related videos on Youtube
ec0n0micus
Updated on November 28, 2021Comments
-
ec0n0micus over 2 years
What is the most efficient way to convert multiple columns in a data frame from character to numeric format?
I have a dataframe called DF with all character variables.
I would like to do something like
for (i in names(DF){ DF$i <- as.numeric(DF$i) }
Thank you
-
Aaron left Stack Overflow about 10 yearsWhy are they all character format in the first place? I'd suspect there's something that can be done earlier in the process to make them numeric from the start.
-
-
zsad512 about 6 yearsError in
[.data.table
(data, nums) : When i is a data.table (or character vector), the columns to join by must be specified either using 'on=' argument (see ?data.table) or by keying x (i.e. sorted, and, marked as sorted, see ?setkey). Keyed joins might have further speed benefits on very large data due to x being sorted in RAM. -
zsad512 about 6 years
-
FilipeTeixeira about 6 yearsI'm not sure how you can directly convert from character to numeric. You'd have to work with factors and then to numeric. Unless I'm missing something.
-
ARobertson about 6 years@FilipeTeixeira I believe you're thinking about converting from factors to numeric. You have to convert factors to characters to numeric, unless you truly want the numeric factor level, which in my experience has been rare. If you run the test code above, you can see that it works. It will fail if they aren't actually numbers though, but then you'll have to deal with that anyway.
-
Idiot Tom over 3 yearsuse of retype() from hablar library was cleaner and easier than all the other solutions here, and worked for my use case. Thanks!
-
Falnésio over 3 yearsretype() worked great! Used it on 3000+ columns that had NAs splattered about.
-
coding_is_fun about 2 yearsthis approach keeps your data.frame as a data.frame, while
lapply
converts your dataframe into a list -
Ian almost 2 yearsmutate_if() has been deprecated. Use across() now, eg mutate( across( where(is.character), ~ as.numeric(.x) ) )