how to remove the negative values from a data frame in R

36,716

Solution 1

I want to add that it's not necessary to write a for loop, you can just set:

dframe[dframe < 0] <- NA

As dframe < 0 gives the logical indices TRUE where dframe is less than zero, and can be used to index dframe and replace TRUE values with NA.

@MrFlick explained the use of NA instead of NULL, and how to ignore NA values when calculating means of each row:

rowMeans(dframe, na.rm=TRUE) 

Edited to answer question re: rowMeans producing NaNs and how to remove:

NA is "not available" and is a missing value indicator, while NaN is "not a number" which can be produced when the result of an arithmetic operation can't be defined numerically, e.g. 0/0. I can't see your dframe values, but I would guess that this is the result of taking the row means when all row values are NA, while setting na.rm=TRUE. See the difference between mean(c(NA, NA, NA), na.rm=TRUE) vs. mean(c(NA, NA, NA), na.rm=FALSE). You can leave NaN or decide how to define row means when all row values are negative and have been replaced by NA.

To consider only non-NaN values, you can subset for not NaN using !is.nan, see this example:

mea <- c(2, 4, NaN, 6)
mea
# [1]   2   4 NaN   6
!is.nan(mea) # not NaN, output logical
# [1]  TRUE  TRUE FALSE  TRUE 
mea <- mea[!is.nan(mea)]
# [1] 2 4 6

Or you can replace NaN values with some desired value by setting mea[is.nan(mea)] <- ??

Solution 2

An easier way to remove all rows with negative values of your dataframe would be:

df <- df[df > 0]

That way any row with a negative value would cease to be in your dataframe.

Solution 3

It is another way that might help someone.

I had the same problem before, However I decide to use dplyr for this problem.

    library("dplyr")

       data <- data %>%
            filter(column > 0)

 rowMeans(data, na.rm = TRUE)

Also I would advice to get both (negative and positive) some times they will be required after for further clarification such is the why are they negative or other cases.

resultPos2 <- result2 %>%# we get the df that is positive
    filter(periodBudget > 0)

resultNeg2 <- result2 %>%# we get the df that is negative
    filter(periodBudget < 0)

this make it easier to hand out to other people and check for errors if required or reasons that why is negative.

handy for financial cases or data that has been manipulated for other employees

Share:
36,716
Admin
Author by

Admin

Updated on July 09, 2022

Comments

  • Admin
    Admin almost 2 years

    I want to remove the negative values from a dataframe and then I need to calculate the mean of each row separately (mean of positive values for each row) I wrote this to remove negative values but it didn't work. I have a warning like that :

    Error in [<-.data.frame(*tmp*, i, j, value = NULL) : replacement has length zero

    How can I fix this problem?

    for (i in 1:1000) {
      for(j in 1:20){
         if (dframe[i,j]<=0) dframe[i,j]<-NULL
         j=j+1
      }
      i=i+1
    }
    
  • neilfws
    neilfws about 7 years
    I think you both mean rowMeans, not rowSums.
  • Djork
    Djork about 7 years
    Ha! yes thanks for catching that, editing the answer
  • Admin
    Admin about 7 years
    Thank you so much. I have written that: dframe[dframe < 0] <- NA > mea<-rowMeans(dframe, na.rm=TRUE) but there is NaN values in the mea. How can I remove them? what does NaN mean? Is it same NA?
  • Djork
    Djork about 7 years
    NA is "not available" and is a missing value indicator, while NaN is "not a number" which can be produced when the result of an arithmetic operation that can't be defined numerically, e.g. 0/0. I can't see your dframe values, but I would guess that this is the result of taking the row means when all values are NA, while setting na.rm=TRUE. See the difference between mean(c(NA, NA, NA), na.rm=TRUE) vs. mean(c(NA, NA, NA), na.rm=FALSE). You can leave NaN or decide how to define row means when all row values are negative.
  • Djork
    Djork about 7 years
    I edited my answer on how to deal with NaNs produced by rowMeans.