converting multiple columns from character to numeric format in r

207,213

Solution 1

You could try

DF <- data.frame("a" = as.character(0:5),
                 "b" = paste(0:5, ".1", sep = ""),
                 "c" = letters[1:6],
                 stringsAsFactors = FALSE)

# Check columns classes
sapply(DF, class)

#           a           b           c 
# "character" "character" "character" 

cols.num <- c("a","b")
DF[cols.num] <- sapply(DF[cols.num],as.numeric)
sapply(DF, class)

#          a           b           c 
#  "numeric"   "numeric" "character"

Solution 2

If you're already using the tidyverse, there are a few solution depending on the exact situation.

Basic if you know it's all numbers and doesn't have NAs

library(dplyr)

# solution
dataset %>% mutate_if(is.character,as.numeric)

Test cases

df <- data.frame(
  x1 = c('1','2','3'),
  x2 = c('4','5','6'),
  x3 = c('1','a','x'), # vector with alpha characters
  x4 = c('1',NA,'6'), # numeric and NA
  x5 = c('1',NA,'x'), # alpha and NA
  stringsAsFactors = F)

# display starting structure
df %>% str()

Convert all character vectors to numeric (could fail if not numeric)

df %>%
  select(-x3) %>% # this removes the alpha column if all your character columns need converted to numeric
  mutate_if(is.character,as.numeric) %>%
  str()

Check if each column can be converted. This can be an anonymous function. It returns FALSE if there is a non-numeric or non-NA character somewhere. It also checks if it's a character vector to ignore factors. na.omit removes original NAs before creating "bad" NAs.

is_all_numeric <- function(x) {
  !any(is.na(suppressWarnings(as.numeric(na.omit(x))))) & is.character(x)
}
df %>% 
  mutate_if(is_all_numeric,as.numeric) %>%
  str()

If you want to convert specific named columns, then mutate_at is better.

df %>% mutate_at('x1', as.numeric) %>% str()

Solution 3

You can use index of columns: data_set[,1:9] <- sapply(dataset[,1:9],as.character)

Solution 4

I used this code to convert all columns to numeric except the first one:

    library(dplyr)
    # check structure, row and column number with: glimpse(df)
    # convert to numeric e.g. from 2nd column to 10th column
    df <- df %>% 
     mutate_at(c(2:10), as.numeric)

Solution 5

Using the across() function from dplyr 1.0

   df <- df %>% mutate(across(, ~as.numeric(.))
Share:
207,213

Related videos on Youtube

ec0n0micus
Author by

ec0n0micus

Updated on November 28, 2021

Comments

  • ec0n0micus
    ec0n0micus over 2 years

    What is the most efficient way to convert multiple columns in a data frame from character to numeric format?

    I have a dataframe called DF with all character variables.

    I would like to do something like

    for (i in names(DF){
        DF$i <- as.numeric(DF$i)
    }
    

    Thank you

    • Aaron left Stack Overflow
      Aaron left Stack Overflow about 10 years
      Why are they all character format in the first place? I'd suspect there's something that can be done earlier in the process to make them numeric from the start.
  • zsad512
    zsad512 about 6 years
    Error in [.data.table(data, nums) : When i is a data.table (or character vector), the columns to join by must be specified either using 'on=' argument (see ?data.table) or by keying x (i.e. sorted, and, marked as sorted, see ?setkey). Keyed joins might have further speed benefits on very large data due to x being sorted in RAM.
  • zsad512
    zsad512 about 6 years
  • FilipeTeixeira
    FilipeTeixeira about 6 years
    I'm not sure how you can directly convert from character to numeric. You'd have to work with factors and then to numeric. Unless I'm missing something.
  • ARobertson
    ARobertson about 6 years
    @FilipeTeixeira I believe you're thinking about converting from factors to numeric. You have to convert factors to characters to numeric, unless you truly want the numeric factor level, which in my experience has been rare. If you run the test code above, you can see that it works. It will fail if they aren't actually numbers though, but then you'll have to deal with that anyway.
  • Idiot Tom
    Idiot Tom over 3 years
    use of retype() from hablar library was cleaner and easier than all the other solutions here, and worked for my use case. Thanks!
  • Falnésio
    Falnésio over 3 years
    retype() worked great! Used it on 3000+ columns that had NAs splattered about.
  • coding_is_fun
    coding_is_fun about 2 years
    this approach keeps your data.frame as a data.frame, while lapply converts your dataframe into a list
  • Ian
    Ian almost 2 years
    mutate_if() has been deprecated. Use across() now, eg mutate( across( where(is.character), ~ as.numeric(.x) ) )