Subset by column index in R - Data.Table vs. dataframe

r dataframe data.table subset

13,845

For data.table, you need to include with=FALSE in your column subset statement.

data[, 3:11, with=FALSE]

13,845

Author by

ben_says

Updated on June 26, 2022

Comments

ben_says almost 2 years

install.packages('data.table')
library(data.table)

data <- read.csv("http://www.ats.ucla.edu/stat/data/hsb2_small.csv")
head(data, 10)

   > id female race ses schtyp prog read write math science socst
   >  1:  70      0    4   1      1    1   57    52   41      47    57
   >  2: 121      1    4   2      1    3   68    59   53      63    61
   >  3:  86      0    4   3      1    1   44    33   54      58    31
   >  4: 141      0    4   3      1    3   63    44   47      53    56
   >  5: 172      0    4   2      1    2   47    52   57      53    61
   >  6: 113      0    4   2      1    2   44    52   51      63    61
   >  7:  50      0    3   2      1    1   50    59   42      53    61
   >  8:  11      0    1   2      1    2   34    46   45      39    36
   >  9:  84      0    4   2      1    1   63    57   54      58    51
   > 10:  48      0    3   2      1    2   57    55   52      50    51

and we see it is a

class(data)

   > [1] "data.frame"

so we can snag specific columns (only showing 10 rows for this page's example...)

data[ , c(1, 7, 8)]

   >     id read write
   > 1   70   57    52
   > 2  121   68    59
   > 3   86   44    33
   > 4  141   63    44
   > 5  172   47    52
   > 6  113   44    52
   > 7   50   50    59
   > 8   11   34    46
   > 9   84   63    57
   > 10  48   57    55

or a range (helpful if you have many variables)

data[ , 3:11]

   >    race ses schtyp prog read write math science socst
   > 1     4   1      1    1   57    52   41      47    57
   > 2     4   2      1    3   68    59   53      63    61
   > 3     4   3      1    1   44    33   54      58    31
   > 4     4   3      1    3   63    44   47      53    56
   > 5     4   2      1    2   47    52   57      53    61
   > 6     4   2      1    2   44    52   51      63    61
   > 7     3   2      1    1   50    59   42      53    61
   > 8     1   2      1    2   34    46   45      39    36
   > 9     4   2      1    1   63    57   54      58    51
   > 10    3   2      1    2   57    55   52      50    51

Everything works well until I start using data.table.

setDT(data)
class(data)

    > [1] "data.table" "data.frame"

How do I accomplish the similar subsetting with data.table? the same code above yields...

data[ , c(1, 7, 8)]

    > [1] 1 7 8

data[ , 3:11]

    > [1]  3  4  5  6  7  8  9 10 11

I am aware of dplyr select() but I seek a solution that doesn't involve typing the column names, and would greatly appreciate a clear method for subsetting a data.table by using a "column number." I have occasionally used subset(), and even gone so far as constructing character vector J for use in data[ I, J, by = K]. I must be missing something. Code-masters would consider this trivial, and easily display a flexible solution allowing one to, for example, select columns 1,3,5, 10 through 30, and 97.