Find columns with all missing values

61,424

Solution 1

This is easy enough to with sapply and a small anonymous function:

sapply(test1, function(x)all(is.na(x)))
   X1    X2    X3 
FALSE FALSE FALSE 

sapply(test2, function(x)all(is.na(x)))
   X1    X2    X3 
FALSE  TRUE FALSE 

And inside a function:

na.test <-  function (x) {
  w <- sapply(x, function(x)all(is.na(x)))
  if (any(w)) {
    stop(paste("All NA in columns", paste(which(w), collapse=", ")))
  }
}

na.test(test1)

na.test(test2)
Error in na.test(test2) : All NA in columns 2

Solution 2

In dplyr

ColNums_NotAllMissing <- function(df){ # helper function
  as.vector(which(colSums(is.na(df)) != nrow(df)))
}

df %>%
select(ColNums_NotAllMissing(.))

example:
x <- data.frame(x = c(NA, NA, NA), y = c(1, 2, NA), z = c(5, 6, 7))

x %>%
select(ColNums_NotAllMissing(.))

or, the other way around

Cols_AllMissing <- function(df){ # helper function
  as.vector(which(colSums(is.na(df)) == nrow(df)))
}


x %>%
  select(-Cols_AllMissing(.))

Solution 3

To find the columns with all values missing

 allmisscols <- apply(dataset,2, function(x)all(is.na(x)));  
 colswithallmiss <-names(allmisscols[allmisscols>0]);    
 print("the columns with all values missing");    
 print(colswithallmiss);

Solution 4

This one will generate the column names that are full of NAs:

library(purrr)
df %>% keep(~all(is.na(.x))) %>% names

Solution 5

To test whether columns have all missing values:

apply(test1,2,function(x) {all(is.na(x))})

To get which columns have all missing values:

  test1.nona <- test1[ , colSums(is.na(test1)) == 0]
Share:
61,424
SHRram
Author by

SHRram

Updated on February 21, 2021

Comments

  • SHRram
    SHRram about 3 years

    I am writing a function, which needs a check on whether (and which!) column (variable) has all missing values (NA, <NA>). The following is fragment of the function:

    test1 <- data.frame (matrix(c(1,2,3,NA,2,3,NA,NA,2), 3,3))
    test2 <- data.frame (matrix(c(1,2,3,NA,NA,NA,NA,NA,2), 3,3))
    
    na.test <-  function (data) {
      if (colSums(!is.na(data) == 0)){
          stop ("The some variable in the dataset has all missing value,
         remove the column to proceed")
          }
          }
    na.test (test1)
    
    Warning message:
    In if (colSums(!is.na(data) == 0)) { :
      the condition has length > 1 and only the first element will be used
    

    Q1: Why is the above error and any fixes ?

    Q2: Is there any way to find which of columns have all NA, for example output the list (name of variable or column number)?