how do I remove question mark(?) from a data set in R

r na
16,448

Solution 1

look at gsub

census$x <- gsub("?",NA,census$x, fixed = TRUE)

edit: forgot to add fixed = TRUE

As Richard pointed out, this will catch all occurrences of a ?

Solution 2

Here's an easy way to replace " ?" with NA in all columns.

# find elements
idx <- census == " ?"
# replace elements with NA
is.na(census) <- idx

How it works?

The command idx <- census == " ?" creates a logical matrix with the same numbers of rows and columns as the data frame census. This matrix idx contains TRUE where census contains " ?" and FALSE at the other positions.

The matrix idx is used as an index. The command is.na(census) <- idx is used to replace values in census at the positions in idx with NA.

Note that the function is.na<- is used here. It is not identical with the is.na function.

Share:
16,448
Learner27
Author by

Learner27

Updated on June 29, 2022

Comments

  • Learner27
    Learner27 almost 2 years

    Hello everyone I am analysing UCI adult census data. The data has question marks (?) for every missing value.

    I want to replace all the question marks with NA.

    i tried:

    library(XML)
    census<-read.csv("https://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.data",header=F,na.strings="?")
    names(census)<-c("Age","Workclass","Fnlwght","Education","EducationNum","MaritalStatus","Occupation"   
      ,"Relationship" , "Race","Gender","CapitalGain","CapitalLoss","HoursPerWeek","NativeCountry","Salary"  )
    
    table(census$Workclass)
    
                    ?       Federal-gov         Local-gov      Never-worked           Private      Self-emp-inc 
                 1836               960              2093                 7             22696              1116 
     Self-emp-not-inc         State-gov       Without-pay 
                 2541              1298                14 
    
    x
    
    <-ifelse(census$Workclass=="?",NA,census$Workclass)
     table(x)
    x
        1     2     3     4     5     6     7     8     9 
     1836   960  2093     7 22696  1116  2541  1298    14
    

    but it did not work.

    Please help.