Using apply to find max in a data frame with missing values and strings
One of the points of using a data frame is that everything in a column must have the same class. If you want to treat your data as numeric, then run as.numeric()
on each column and the strings, like "SON"
, will be converted to NA
.
Data frames are also focused on column-wise operations. If you want to go row-wise, a matrix
probably makes more sense:
mat = sapply(df, function(x) as.numeric(as.character(x)))
# as.numeric(as.character()) is necessary when starting with a factor
mat
# ID N1 N2 N3 N4
# [1,] 1 2 3 4 5
# [2,] 11 NA -12 14 55
# [3,] 21 12 NA 34 14
apply(mat, 1, max, na.rm = T)
# [1] 5 55 34
Why does R reject
na.rm= TRUE
when I usecheckd
but not when I usemax
inapply
After the first three arguments, (X
, MARGIN
, FUN
), apply
just passes arguments on through to the function you pass to FUN
. If you look at the help for ?max
, you'll see that it is defined to take an argument called na.rm
. Your definition for checkd
has no such argument. If you want to add an na.rm
argument to your function, you could do it like this:
checkd <- function(x, na.rm = TRUE) if(is.integer(x)) max(x, na.rm = na.rm)
# or even this
checkd <- function(x, ...) if(is.integer(x)) max(x, ...)
Note that this function probably doesn't do what you want - it checks to see if the vector you give it - a whole row in your example - consists only of integers, and if so it will return the max. Since a vector can only have one type, if you have any non-integer in there, is.integer(x)
will be false and the the max won't be calculated.
I also deleted your == TRUE
, which doesn't do anything.
DomB
Updated on July 16, 2022Comments
-
DomB almost 2 years
I have the following data set:
df<-data.frame(read.table(header = TRUE, text = " ID N1 N2 N3 N4 1 2 3 4 5 11 NA -12 14 55 21 12 SON 34 14"))
I want to find out what is the max entry in each row. This would be, for example, 5 in the first row. Obviously, the situation is more complicated because of missing values ('NA') and a string ('SON').
I first tried the following command:
df$Result<-apply(df,1, max, na.rm= TRUE)
The results are [5,55, SON]! Not what I wanted. I therefore then tried:
checkd<- function(x) if(is.integer(x)== TRUE)max(x) df$Result<-apply(df,1, checkd)
Funnily, it removed the last column
df$Result
. Does anyone know what did I do wrong? Also, what would be the solution to my problem?Also, of I try the following code:
checkd<- function(x) if(is.integer(x)== TRUE)max(x) df$Result<-apply(df,1, checkd, na.rm= TRUE)
it gives me
Error in FUN(newX[, i], ...) : unused argument (na.rm = TRUE)
! Why is that? My functioncheckd
does generally not seem to cause any problems to R. Why does R rejectna.rm= TRUE
when I usecheckd
but not when I usemax
in apply?Thanks,
Dom
-
DomB over 8 yearsthanks! That is really useful! Just a quick follow-up question. I played around with
as.numeric
! I happened to notice that it turns the third column3 -12 SON
into the following numerical values2 1 3
! My question why does it not translate3
into3
but into2
and-12
not into-12
but into1
etc. If I wanted to do this, what would be the way? Anyway, thanks so much for your explanations. Super useful! -
Gregor Thomas over 8 yearsYou probably have a factor to start with. See edits. And see here for more details.