Error in drop && !has.j : invalid 'x' type in 'x && y’ when using sum(complete.cases) Windows7 R3.2.1

r function csv dataframe

16,732

Solution 1

complete <- function(directory, id = 1:332) {
  files_list <- list.files(directory, full.names=TRUE)
  nobs <- c()
  for (i in id) {
    dat <- read.csv(files_list[i])
    nobs <- c(nobs, sum(complete.cases(dat)))
  }
  data.frame(id,nobs)
}

You were close. But you shouldn't read in all of the files at once and then find the complete cases. It will not separate the results by id for you. Instead I just edited your code a little bit.

Test

complete("specdata", c(2,4,8,10,12))
  id nobs
1  2 1041
2  4  474
3  8  192
4 10  148
5 12   96

Solution 2

I have no idea what is throwing the error, but I would recommend avoiding the process that is leading up to it. Your situation would benefit greatly from vectorization. I don't think this code will work out of the box, but should be on the right path:

#* Get the file names of the CSV files to read
files <- list.files(getwd(), pattern = "\\d{3}[.]csv$")

#* Read in all of the CSV files into a list of data frames
DataFrames <- lapply(files, read.csv)

#* Calculate the number of complete cases in each file
CompleteCases <- vapply(DataFrames, 
                        function(df) sum(complete.cases(df)),
                        numeric(1))

#* Produce a data frame with the file name, and the number of complete cases in the file.
data.frame(file = basename(files),
           nobs = CompleteCases)

Solution 3

You are making a silly mistake in the last line

dat_subset[, "ID", "nobs"] # incorrect code and will generate the error
#Error in drop && length(x) == 1L : invalid 'x' type in 'x && y'

base R does not allow subsetting inside [ ] with a comma-separated column name list. You should convert that into a character vector and pass as one parameter, as follows:

 dat_subset[, c("ID", "nobs")]

above is the correct way of subsetting on multiple columns.

16,732

Author by

MSUTangles1984

Updated on June 06, 2022

Comments

MSUTangles1984 almost 2 years
I am very new to programming, both in R and in general.

Here is my goal for writing this script:
1. I have 332 csv files. I want to, “Write a function that reads a directory full of files and reports the number of completely observed cases in each data file. The function should return a data frame where the first column is the name of the file and the second column is the number of complete cases.”
2. The outline of the function is as follows:
  
  complete <- function(directory, id = 1:332) { ## 'directory' is a character vector of length 1 indicating ## the location of the CSV files
```
    ## 'id' is an integer vector indicating the monitor ID numbers
    ## to be used

    ## Return a data frame of the form:
    ## id nobs
    ## 1  117
    ## 2  1041
    ## ...
    ## where 'id' is the monitor ID number and 'nobs' is the
    ## number of complete cases
```
  }
Example output would look like this:
```
source("complete.R")
complete("specdata", 1)
##   id nobs
## 1  1  117

complete("specdata", c(2, 4, 8, 10, 12))
##   id nobs
## 1  2 1041
## 2  4  474
## 3  8  192
## 4 10  148
## 5 12   96
```
1. My script so far looks like this:
setwd("C:/users/beachlb/Desktop/R_Programming/specdata") #this is the local directory on my computer where all 332 csv files are stored
```
>complete <- function(directory, id = 1:332) {

>files_list <- list.files(directory, full.names=TRUE) #creates a list of files from within the specified directory

>dat <- data.frame() #creates an empty data frame that we can use to add data to

>for (i in id) {

>dat <- rbind(dat, read.csv(files_list[i]))  #loops through the 332 csv files, rbinding them together into one data frame called dat
  }

>dat$nobs <- sum(complete.cases(dat)) #add the column nobs to dat, populated with number of rows of complete cases in the dataframe

>dat_subset <- dat[which(dat[, "ID"] %in% id),] #subsets dat so that only the desired cases are included in output when function is run

>dat_subset[, "ID", "nobs"] #prints all rows of the desired data frame for the named columns}
```
1. When I run my function as is, I get this error, “Error in drop && !has.j : invalid 'x' type in 'x && y’. I am not sure what is throwing me that error. I would appreciate any advice on what could be causing this error and how I can work to resolve it. Pointing me to literature I could read to study this and/or tutorials that would help me strengthen the coding skills needed to avoid this error would also be appreciated.
2. Preface: I am not sure if I should ask this question on a separate thread. Right now, my function is written to populate the total number of complete cases for all rows (for all 332 files), instead of specifically calculating the number of complete cases for a given monitor id and putting that into the column nobs for that ID only. (Note that each file is named after the monitor id and contains only cases from that monitor, such that 001.csv = output from monitor 1, 002.csv = output from monitor 2). Therefore, I am hoping for someone to help point me to a resource for how to subset dat so that when the nobs column populates, each row in the nobs column gives the number of complete cases for each id number.