Apply a function to a subset of data.table columns, by column-indices instead of name

12,680

The idiomatic approach is to use .SD and .SDcols

You can force the RHS to be evaluated in the parent frame by wrapping in ()

a[, (b) := lapply(.SD, as.numeric), .SDcols = b]

For columns 2:3

a[, 2:3 := lapply(.SD, as.numeric), .SDcols = 2:3]

or

mysubset <- 2:3
a[, (mysubset) := lapply(.SD, as.numeric), .SDcols = mysubset]
Share:
12,680
Tahnoon Pasha
Author by

Tahnoon Pasha

Updated on June 03, 2022

Comments

  • Tahnoon Pasha
    Tahnoon Pasha almost 2 years

    I'm trying to apply a function to a group of columns in a large data.table without referring to each one individually.

    a <- data.table(
      a=as.character(rnorm(5)),
      b=as.character(rnorm(5)),
      c=as.character(rnorm(5)),
      d=as.character(rnorm(5))
    )
    b <- c('a','b','c','d')
    

    with the MWE above, this:

    a[,b=as.numeric(b),with=F]
    

    works, but this:

    a[,b[2:3]:=data.table(as.numeric(b[2:3])),with=F]
    

    doesn't work. What is the correct way to apply the as.numeric function to just columns 2 and 3 of a without referring to them individually.

    (In the actual data set there are tens of columns so it would be impractical)