Remove thousand's separator

11,348

Solution 1

I think I just found another solution:

It's necessary to use stringsAsFactors = FALSE.

Like this:

df2 <- as.data.frame(apply(df1, 2, gsub, pattern = "([0-9])\\.([0-9])", replacement= "\\1\\2"), stringsAsFactors = FALSE)

df3 <- as.data.frame(data.matrix(df2))

Solution 2

You can use this :

sapply(df, function(v) {as.numeric(gsub("\\.","", as.character(v)))})

Which gives :

        A       B    C
[1,] 1100    7800  200
[2,] 2300     500 3100
[3,] 5400 1000000 4500

This will give you a matrix object, but you can wrap it into data.frame() if you wish.

Note that the columns in you original data are not characters but factors.


Edit: Alternatively, instead of wrapping it with data.frame(), you can do this to get the result directly as a data.frame:

# the as.character(.) is just in case it's loaded as a factor
df[] <- lapply(df, function(x) as.numeric(gsub("\\.", "", as.character(x))))
Share:
11,348
speendo
Author by

speendo

Updated on June 09, 2022

Comments

  • speendo
    speendo almost 2 years

    I imported an Excel file and got a data frame like this

    structure(list(A = structure(1:3, .Label = c("1.100", "2.300", 
    "5.400"), class = "factor"), B = structure(c(3L, 2L, 1L), .Label = c("1.000.000", 
    "500", "7.800"), class = "factor"), C = structure(1:3, .Label = c("200", 
    "3.100", "4.500"), class = "factor")), .Names = c("A", "B", "C"
    ), row.names = c(NA, -3L), class = "data.frame")
    

    I would now like to convert these chars to numeric or even integer. However, the dot character (.) is not a decimal sign but a "thousand's separator" (it's German).

    How would I convert the data frame properly?

    I tried this:

    df2 <- as.data.frame(apply(df1, 2, gsub, pattern = "([0-9])\\.([0-9])", replacement= "\\1\\2"))
    
    df3 <- as.data.frame(data.matrix(df2))
    

    however, apply seems to convert each column to a list of factors. Can I maybe prevent apply from doing so?

  • speendo
    speendo about 11 years
    oh you are right - bad minimal example. In the "real" data, they are characters.
  • Arun
    Arun about 11 years
    I guess this'll just replace 2 dots?
  • speendo
    speendo about 11 years
    why do you think just 2 dots? just tried it with structure(list(A = c("800.000.000.000", "2.034.312.421", "321.325.123.234" ), B = c("800.000.000.000", "2.034.312.421", "321.325.123.234" ), C = c("800.000.000.000", "2.034.312.421", "321.325.123.234" )), .Names = c("A", "B", "C"), row.names = c(NA, -3L), class = "data.frame") - all dots were replaced.
  • Arun
    Arun about 11 years
    Yes indeed, sorry, I don't know why I said that. However, this wouldn't work if the number was ".578", right?
  • speendo
    speendo about 11 years
    I think it should also work then. In principle this runs the command gsub("([0-9])\\.([0-9])", "\\1\\2", x) for every x in the data frame. In other words, the function searches for all patterns <digit1>.<digit2> and replaces them with <digit1><digit2>. This should work with all digits - problems could occur with patterns like <digit1>.<digit2>.<digit3> but here the dot wouldn't be a thousand's seperator anyway.
  • speendo
    speendo about 11 years
    oh, if the number was just .578 it wouldn't work, right. But in this case the dot is also not a thousand's seperator.
  • Arun
    Arun about 11 years
    Yes, right. Your answer is very well defined to the OP's question.
  • speendo
    speendo about 11 years
    maybe because of the personal union ;-)
  • Arun
    Arun about 11 years
    hahaha... just noticed!! my God! I need a nap!!! ahahaha this is too hilarious!