R Apply() function on specific dataframe columns

231,064

Solution 1

Using an example data.frame and example function (just +1 to all values)

A <- function(x) x + 1
wifi <- data.frame(replicate(9,1:4))
wifi

#  X1 X2 X3 X4 X5 X6 X7 X8 X9
#1  1  1  1  1  1  1  1  1  1
#2  2  2  2  2  2  2  2  2  2
#3  3  3  3  3  3  3  3  3  3
#4  4  4  4  4  4  4  4  4  4

data.frame(wifi[1:3], apply(wifi[4:9],2, A) )
#or
cbind(wifi[1:3], apply(wifi[4:9],2, A) )

#  X1 X2 X3 X4 X5 X6 X7 X8 X9
#1  1  1  1  2  2  2  2  2  2
#2  2  2  2  3  3  3  3  3  3
#3  3  3  3  4  4  4  4  4  4
#4  4  4  4  5  5  5  5  5  5

Or even:

data.frame(wifi[1:3], lapply(wifi[4:9], A) )
#or
cbind(wifi[1:3], lapply(wifi[4:9], A) )

#  X1 X2 X3 X4 X5 X6 X7 X8 X9
#1  1  1  1  2  2  2  2  2  2
#2  2  2  2  3  3  3  3  3  3
#3  3  3  3  4  4  4  4  4  4
#4  4  4  4  5  5  5  5  5  5

Solution 2

lapply is probably a better choice than apply here, as apply first coerces your data.frame to an array which means all the columns must have the same type. Depending on your context, this could have unintended consequences.

The pattern is:

df[cols] <- lapply(df[cols], FUN)

The 'cols' vector can be variable names or indices. I prefer to use names whenever possible (it's robust to column reordering). So in your case this might be:

wifi[4:9] <- lapply(wifi[4:9], A)

An example of using column names:

wifi <- data.frame(A=1:4, B=runif(4), C=5:8)
wifi[c("B", "C")] <- lapply(wifi[c("B", "C")], function(x) -1 * x)

Solution 3

This task is easily achieved with the dplyr package's across functionality.

Borrowing the data structure suggested by thelatemail:

A <- function(x) x + 1
wifi <- data.frame(replicate(9,1:4))

We can indicate the columns we wish to apply the function to either by index like this:

library(dplyr)
wifi %>% 
   mutate(across(4:9, A))
#  X1 X2 X3 X4 X5 X6 X7 X8 X9
#1  1  1  1  2  2  2  2  2  2
#2  2  2  2  3  3  3  3  3  3
#3  3  3  3  4  4  4  4  4  4
#4  4  4  4  5  5  5  5  5  5

Or by name:

wifi %>% 
   mutate(across(X4:X9, A))
#  X1 X2 X3 X4 X5 X6 X7 X8 X9
#1  1  1  1  2  2  2  2  2  2
#2  2  2  2  3  3  3  3  3  3
#3  3  3  3  4  4  4  4  4  4
#4  4  4  4  5  5  5  5  5  5

Solution 4

As mentioned, you simply want the standard R apply function applied to columns (MARGIN=2):

wifi[,4:9] <- apply(wifi[,4:9], MARGIN=2, FUN=A)

Or, for short:

wifi[,4:9] <- apply(wifi[,4:9], 2, A)

This updates columns 4:9 in-place using the A() function. Now, let's assume that na.rm is an argument to A(), which it probably should be. We can pass na.rm=T to remove NA values from the computation like so:

wifi[,4:9] <- apply(wifi[,4:9], MARGIN=2, FUN=A, na.rm=T)

The same is true for any other arguments you want to pass to your custom function.

Share:
231,064
skmathur
Author by

skmathur

Updated on January 26, 2022

Comments

  • skmathur
    skmathur over 2 years

    I want to use the apply function on a dataframe, but only apply the function to the last 5 columns.

    B<- by(wifi,(wifi$Room),FUN=function(y){apply(y, 2, A)})
    

    This applies A to all the columns of y

    B<- by(wifi,(wifi$Room),FUN=function(y){apply(y[4:9], 2, A)})
    

    This applies A only to columns 4-9 of y, but the total return of B strips off the first 3 columns... I still want those, I just don't want A applied to them.

    wifi[,1:3]+B 
    

    also does not do what I expected/wanted.

  • jcfaria
    jcfaria over 10 years
    A small correction: wifi <- data.frame(A=1:4, B=runif(4), C=5:8)
  • santeko
    santeko about 9 years
    Is there a way to do this using $ to index a certain column by name instead of using [ : ] to index by column number? I tried adding colnames: colnames(wifi) = c("a", "b", "c", "d", "e", "f", "g", "h" ,"i") but any attempt at using lapply(wifi$e, 2, X) wasn't happening.
  • thelatemail
    thelatemail about 9 years
    @skotturi - you can do this like wifi[c("a","b","c")] to index multiple columns by name.
  • Mox
    Mox about 6 years
    Could you be more explicit about how you created the [cols] vector?
  • cparmstrong
    cparmstrong about 6 years
    @Mox you can just do cols <- c("var1", "var2")
  • Agile Bean
    Agile Bean over 5 years
    as alternative using dplyr avoiding the redundancy to repeat the column specification, you could do wifi[4:9] %<>% map_dbl(A)
  • kittygirl
    kittygirl almost 4 years
    @thelatemail,In apply(wifi[4:9],2, A),wifi[4:9] is data.frame.And apply can only used to array or matrix.Why your answer workable?
  • thelatemail
    thelatemail almost 4 years
    @kittygirl - that's because apply can be used on a data.frame. The data.frame will be coerced to a matrix as part of the function when apply is used.
  • kittygirl
    kittygirl almost 4 years
    @thelatemail,will lose rowname or colname information?
  • Kay
    Kay about 3 years
    @AgileBean: map is a nice alternative but I would advice using the %<>% operator. Please scroll to the end of r4ds.had.co.nz/pipes.html