how do I remove question mark(?) from a data set in R
Solution 1
look at gsub
census$x <- gsub("?",NA,census$x, fixed = TRUE)
edit: forgot to add fixed = TRUE
As Richard pointed out, this will catch all occurrences of a ?
Solution 2
Here's an easy way to replace " ?"
with NA
in all columns.
# find elements
idx <- census == " ?"
# replace elements with NA
is.na(census) <- idx
How it works?
The command idx <- census == " ?"
creates a logical matrix with the same numbers of rows and columns as the data frame census
. This matrix idx
contains TRUE
where census
contains " ?"
and FALSE
at the other positions.
The matrix idx
is used as an index. The command is.na(census) <- idx
is used to replace values in census
at the positions in idx
with NA
.
Note that the function is.na<-
is used here. It is not identical with the is.na
function.
Learner27
Updated on June 29, 2022Comments
-
Learner27 almost 2 years
Hello everyone I am analysing UCI adult
census
data. The data has question marks (?
) for every missing value.I want to replace all the question marks with
NA
.i tried:
library(XML) census<-read.csv("https://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.data",header=F,na.strings="?") names(census)<-c("Age","Workclass","Fnlwght","Education","EducationNum","MaritalStatus","Occupation" ,"Relationship" , "Race","Gender","CapitalGain","CapitalLoss","HoursPerWeek","NativeCountry","Salary" ) table(census$Workclass) ? Federal-gov Local-gov Never-worked Private Self-emp-inc 1836 960 2093 7 22696 1116 Self-emp-not-inc State-gov Without-pay 2541 1298 14 x <-ifelse(census$Workclass=="?",NA,census$Workclass) table(x) x 1 2 3 4 5 6 7 8 9 1836 960 2093 7 22696 1116 2541 1298 14
but it did not work.
Please help.