Convert numeric values into binary (0/1)

10,523

Solution 1

Here's your data.frame:

x <- structure(list(apple = c(3L, 0L, 1L), banana = 0:2, orange = c(2L, 
1L, 2L)), .Names = c("apple", "banana", "orange"), class = "data.frame", row.names = c("Tim", 
"Tom", "Bob"))

And your matrix:

as.matrix((x > 0) + 0)
    apple banana orange
Tim     1      0      1
Tom     0      1      1
Bob     1      1      1

Update

I had no idea that a quick pre-bedtime posting would generate any discussion, but the discussions themselves are quite interesting, so I wanted to summarize here:

My instinct was to simply take the fact that underneath a TRUE and FALSE in R, are the numbers 1 and 0. If you try (a not so good way) to check for equivalence, such as 1 == TRUE or 0 == FALSE, you'll get TRUE. My shortcut way (which turns out to take more time than the correct, or at least more conceptually correct way) was to just add 0 to my TRUEs and FALSEs, since I know that R would coerce the logical vectors to numeric.

The correct, or at least, more appropriate way, would be to convert the output using as.numeric (I think that's what @JoshO'Brien intended to write). BUT.... unfortunately, that removes the dimensional attributes of the input, so you need to re-convert the resulting vector to a matrix, which, as it turns out, is still faster than adding 0 as I did in my answer.

Having read the comments and criticisms, I thought I would add one more option---using apply to loop through the columns and use the as.numeric approach. That is slower than manually re-creating the matrix, but slightly faster than adding 0 to the logical comparison.

x <- data.frame(replicate(1e4,sample(0:1e3)))
library(rbenchmark)
benchmark(X1 = {
            x1 <- as.matrix((x > 0) + 0)
          },
          X2 = {
            x2 <- apply(x, 2, function(y) as.numeric(y > 0))
          },
          X3 = {
            x3 <- as.numeric(as.matrix(x) > 0)
            x3 <- matrix(x3, nrow = 1001)
          },
          X4 = {
            x4 <- ifelse(x > 0, 1, 0)
          },
          columns = c("test", "replications", "elapsed", 
                      "relative", "user.self"))
#   test replications elapsed relative user.self
# 1   X1          100 116.618    1.985   110.711
# 2   X2          100 105.026    1.788    94.070
# 3   X3          100  58.750    1.000    46.007
# 4   X4          100 382.410    6.509   311.567

all.equal(x1, x2, check.attributes=FALSE)
# [1] TRUE
all.equal(x1, x3, check.attributes=FALSE)
# [1] TRUE
all.equal(x1, x4, check.attributes=FALSE)
# [1] TRUE

Thanks for the discussion y'all!

Solution 2

I usually use this approach:

df[df > 0] = 1

Solution 3

use can use ifelse. It should work on both matrix as well as dataframe however, resultant value will be matrix

> df <- cbind(aaple = c(3, 0 , 1), banana = c(0, 1, 2), orange = c(2, 1, 2))
> df
     aaple banana orange
[1,]     3      0      2
[2,]     0      1      1
[3,]     1      2      2

> ifelse(df>0, 1, 0)
     aaple banana orange
[1,]     1      0      1
[2,]     0      1      1
[3,]     1      1      1

Solution 4

Just use a comparison:

d = t(matrix(c(3,0,2,0,1,1,1,2,2), 3))
d > 0
t(matrix(as.numeric(d>0), ncol(d)))
Share:
10,523
lolibility
Author by

lolibility

Updated on June 26, 2022

Comments

  • lolibility
    lolibility about 2 years

    I have a data frame with counts of different kinds of fruits of different people. Like below

        apple  banana  orange
    Tim     3       0       2
    Tom     0       1       1
    Bob     1       2       2
    

    How can I change it into a binary matrix, i.e. if a person has at least one fruit, no matter how many he has, then the I record 1, if not, record 0. Like below

        apple  banana  orange
    Tim     1       0       1
    Tom     0       1       1
    Bob     1       1       1
    
  • CHP
    CHP over 11 years
    Why is this better than ifelse ? (not asking because I suggested that, just curious)
  • Josh O'Brien
    Josh O'Brien over 11 years
    @ChinmayPatil -- For one thing, Ananda's solution runs 3-4 times faster than the ifelse() version. (FWIW as.logical(as.matrix(x) > 0) is twice again as fast as his solution.) Here's the data.frame I used to run a few time trials: x <- data.frame(replicate(1e4,sample(0:1e3))).
  • CHP
    CHP over 11 years
    @JoshO'Brien thanks for reply. I also checked same thing. It indeed is the case. :)
  • A5C1D2H2I1M1N2O1R2T1
    A5C1D2H2I1M1N2O1R2T1 over 11 years
    @JoshO'Brien, thanks. You did mean as.numeric though, right? I'll be updating my answer soon.
  • A5C1D2H2I1M1N2O1R2T1
    A5C1D2H2I1M1N2O1R2T1 over 11 years
    @ChinmayPatil, I've updated my answer with some benchmarks and further reflection.
  • Josh O'Brien
    Josh O'Brien over 11 years
    @AnandaMahto -- Yes. Good catch. Thanks for posting those benchmarks!
  • Henrik
    Henrik almost 4 years
    Hi @A5C1D2H2I1M1N2O1R2T1 I noted some timings on as.numeric vs as.integer here. Perhaps worth mention in your nice post? In addition, difference in memory required. Cheers