Subset by column index in R - Data.Table vs. dataframe

13,845

For data.table, you need to include with=FALSE in your column subset statement.

data[, 3:11, with=FALSE]
Share:
13,845
ben_says
Author by

ben_says

Updated on June 26, 2022

Comments

  • ben_says
    ben_says almost 2 years
    install.packages('data.table')
    library(data.table)
    
    data <- read.csv("http://www.ats.ucla.edu/stat/data/hsb2_small.csv")
    head(data, 10)
    
       > id female race ses schtyp prog read write math science socst
       >  1:  70      0    4   1      1    1   57    52   41      47    57
       >  2: 121      1    4   2      1    3   68    59   53      63    61
       >  3:  86      0    4   3      1    1   44    33   54      58    31
       >  4: 141      0    4   3      1    3   63    44   47      53    56
       >  5: 172      0    4   2      1    2   47    52   57      53    61
       >  6: 113      0    4   2      1    2   44    52   51      63    61
       >  7:  50      0    3   2      1    1   50    59   42      53    61
       >  8:  11      0    1   2      1    2   34    46   45      39    36
       >  9:  84      0    4   2      1    1   63    57   54      58    51
       > 10:  48      0    3   2      1    2   57    55   52      50    51
    

    and we see it is a

    class(data)
    
       > [1] "data.frame"
    

    so we can snag specific columns (only showing 10 rows for this page's example...)

    data[ , c(1, 7, 8)]
    
       >     id read write
       > 1   70   57    52
       > 2  121   68    59
       > 3   86   44    33
       > 4  141   63    44
       > 5  172   47    52
       > 6  113   44    52
       > 7   50   50    59
       > 8   11   34    46
       > 9   84   63    57
       > 10  48   57    55
    

    or a range (helpful if you have many variables)

    data[ , 3:11]
    
       >    race ses schtyp prog read write math science socst
       > 1     4   1      1    1   57    52   41      47    57
       > 2     4   2      1    3   68    59   53      63    61
       > 3     4   3      1    1   44    33   54      58    31
       > 4     4   3      1    3   63    44   47      53    56
       > 5     4   2      1    2   47    52   57      53    61
       > 6     4   2      1    2   44    52   51      63    61
       > 7     3   2      1    1   50    59   42      53    61
       > 8     1   2      1    2   34    46   45      39    36
       > 9     4   2      1    1   63    57   54      58    51
       > 10    3   2      1    2   57    55   52      50    51
    

    Everything works well until I start using data.table.

    setDT(data)
    class(data)
    
        > [1] "data.table" "data.frame"
    

    How do I accomplish the similar subsetting with data.table? the same code above yields...

    data[ , c(1, 7, 8)]
    
        > [1] 1 7 8
    
    data[ , 3:11]
    
        > [1]  3  4  5  6  7  8  9 10 11
    

    I am aware of dplyr select() but I seek a solution that doesn't involve typing the column names, and would greatly appreciate a clear method for subsetting a data.table by using a "column number." I have occasionally used subset(), and even gone so far as constructing character vector J for use in data[ I, J, by = K]. I must be missing something. Code-masters would consider this trivial, and easily display a flexible solution allowing one to, for example, select columns 1,3,5, 10 through 30, and 97.

    • A5C1D2H2I1M1N2O1R2T1
      A5C1D2H2I1M1N2O1R2T1 over 8 years
      Add with = FALSE in there.