In R: remove commas from a field AND have the modified field remain part of the dataframe
69,561
gsub()
will return a character vector, not a numeric vector (which is it sounds like you want). as.numeric()
will convert the character vector back into a numeric vector:
> df <- data.frame(numbers = c("123,456,789", "1,234,567", "1,234", "1"))
> df
numbers
1 123,456,789
2 1,234,567
3 1,234
4 1
> df$numbers <- as.numeric(gsub(",","",df$numbers))
> df
numbers
1 123456789
2 1234567
3 1234
4 1
The result is still a data.frame
:
> class(df)
[1] "data.frame"
Related videos on Youtube
Author by
mark stevenson
Updated on July 11, 2022Comments
-
mark stevenson almost 2 years
I need to remove commas from a field in an R dataframe. Technically I have managed to do this, but the result seems to be neither a vector nor a matrix, and I cannot get it back into the dataframe in a usable format. So is there a way to remove the commas from a field, AND have that field remain part of the dataframe.
Here is a sample of the field that needs commas removed, and the results generated by my code:
> print(x['TOT_EMP']) TOT_EMP 1 132,588,810 2 6,542,950 3 2,278,260 4 248,760 > y [1] "c(\"132588810\" \"6542950\" \"2278260\" \"248760\...)"
The desired result is a numeric field:
TOT_EMP 1 132588810 2 6542950 3 2278260 4 248760 x<-read.csv("/home/mark/Desktop/national_M2013_dl.csv",header=TRUE,colClasses="character") y=(gsub(",","",x['TOT_EMP'])) print(y)
-
mark stevenson over 9 yearsAs an aside, the commas aren't even in the original CSV file. They are somehow added during the read-in.
-
David Arenburg over 9 yearsTry
x[,'TOT_EMP'] <- gsub(",","",x[,'TOT_EMP'])
-
mark stevenson over 9 yearsIt looks like add drop can be used to add or drop columns, but I'm not sure about using it to remove commas within a field.
-
mark stevenson over 9 yearsDavid, that's perfect. Thank you.
-
-
mark stevenson over 9 yearsI tried using data.frame, but it didn't work, possibly because the vector elements were not comma delimited. Instead of a comma delimited vector such as c("123,456,789", "1,234,567", "1,234", "1"), I had "c(\"132588810\" \"6542950\" \"2278260\" \"248760\...)" which looks like it is tab delimited.
-
Richard Border over 9 yearsThis appears to be an entirely different question. Were the values of
x['TOT_EMP']
not what you started with? -
Richard Border over 9 yearsWhat does
head(read.csv("/home/mark/Desktop/national_M2013_dl.csv"))
return?