Remove thousand's separator
Solution 1
I think I just found another solution:
It's necessary to use stringsAsFactors = FALSE
.
Like this:
df2 <- as.data.frame(apply(df1, 2, gsub, pattern = "([0-9])\\.([0-9])", replacement= "\\1\\2"), stringsAsFactors = FALSE)
df3 <- as.data.frame(data.matrix(df2))
Solution 2
You can use this :
sapply(df, function(v) {as.numeric(gsub("\\.","", as.character(v)))})
Which gives :
A B C
[1,] 1100 7800 200
[2,] 2300 500 3100
[3,] 5400 1000000 4500
This will give you a matrix
object, but you can wrap it into data.frame()
if you wish.
Note that the columns in you original data are not characters but factors.
Edit: Alternatively, instead of wrapping it with data.frame()
, you can do this to get the result directly as a data.frame
:
# the as.character(.) is just in case it's loaded as a factor
df[] <- lapply(df, function(x) as.numeric(gsub("\\.", "", as.character(x))))
speendo
Updated on June 09, 2022Comments
-
speendo almost 2 years
I imported an Excel file and got a data frame like this
structure(list(A = structure(1:3, .Label = c("1.100", "2.300", "5.400"), class = "factor"), B = structure(c(3L, 2L, 1L), .Label = c("1.000.000", "500", "7.800"), class = "factor"), C = structure(1:3, .Label = c("200", "3.100", "4.500"), class = "factor")), .Names = c("A", "B", "C" ), row.names = c(NA, -3L), class = "data.frame")
I would now like to convert these
chars
tonumeric
or eveninteger
. However, the dot character (.
) is not a decimal sign but a "thousand's separator" (it's German).How would I convert the data frame properly?
I tried this:
df2 <- as.data.frame(apply(df1, 2, gsub, pattern = "([0-9])\\.([0-9])", replacement= "\\1\\2")) df3 <- as.data.frame(data.matrix(df2))
however,
apply
seems to convert each column to a list of factors. Can I maybe preventapply
from doing so? -
speendo about 11 yearsoh you are right - bad minimal example. In the "real" data, they are characters.
-
Arun about 11 yearsI guess this'll just replace 2 dots?
-
speendo about 11 yearswhy do you think just 2 dots? just tried it with
structure(list(A = c("800.000.000.000", "2.034.312.421", "321.325.123.234" ), B = c("800.000.000.000", "2.034.312.421", "321.325.123.234" ), C = c("800.000.000.000", "2.034.312.421", "321.325.123.234" )), .Names = c("A", "B", "C"), row.names = c(NA, -3L), class = "data.frame")
- all dots were replaced. -
Arun about 11 yearsYes indeed, sorry, I don't know why I said that. However, this wouldn't work if the number was ".578", right?
-
speendo about 11 yearsI think it should also work then. In principle this runs the command
gsub("([0-9])\\.([0-9])", "\\1\\2", x)
for everyx
in the data frame. In other words, the function searches for all patterns<digit1>.<digit2>
and replaces them with<digit1><digit2>
. This should work with all digits - problems could occur with patterns like<digit1>.<digit2>.<digit3>
but here the dot wouldn't be a thousand's seperator anyway. -
speendo about 11 yearsoh, if the number was just
.578
it wouldn't work, right. But in this case the dot is also not a thousand's seperator. -
Arun about 11 yearsYes, right. Your answer is very well defined to the OP's question.
-
speendo about 11 yearsmaybe because of the personal union ;-)
-
Arun about 11 yearshahaha... just noticed!! my God! I need a nap!!! ahahaha this is too hilarious!