install.packages('data.table')
library(data.table)
data <- read.csv("http://www.ats.ucla.edu/stat/data/hsb2_small.csv")
head(data, 10)
> id female race ses schtyp prog read write math science socst
> 1: 70 0 4 1 1 1 57 52 41 47 57
> 2: 121 1 4 2 1 3 68 59 53 63 61
> 3: 86 0 4 3 1 1 44 33 54 58 31
> 4: 141 0 4 3 1 3 63 44 47 53 56
> 5: 172 0 4 2 1 2 47 52 57 53 61
> 6: 113 0 4 2 1 2 44 52 51 63 61
> 7: 50 0 3 2 1 1 50 59 42 53 61
> 8: 11 0 1 2 1 2 34 46 45 39 36
> 9: 84 0 4 2 1 1 63 57 54 58 51
> 10: 48 0 3 2 1 2 57 55 52 50 51
and we see it is a
class(data)
> [1] "data.frame"
so we can snag specific columns (only showing 10 rows for this page's example...)
data[ , c(1, 7, 8)]
> id read write
> 1 70 57 52
> 2 121 68 59
> 3 86 44 33
> 4 141 63 44
> 5 172 47 52
> 6 113 44 52
> 7 50 50 59
> 8 11 34 46
> 9 84 63 57
> 10 48 57 55
or a range (helpful if you have many variables)
data[ , 3:11]
> race ses schtyp prog read write math science socst
> 1 4 1 1 1 57 52 41 47 57
> 2 4 2 1 3 68 59 53 63 61
> 3 4 3 1 1 44 33 54 58 31
> 4 4 3 1 3 63 44 47 53 56
> 5 4 2 1 2 47 52 57 53 61
> 6 4 2 1 2 44 52 51 63 61
> 7 3 2 1 1 50 59 42 53 61
> 8 1 2 1 2 34 46 45 39 36
> 9 4 2 1 1 63 57 54 58 51
> 10 3 2 1 2 57 55 52 50 51
Everything works well until I start using data.table.
setDT(data)
class(data)
> [1] "data.table" "data.frame"
How do I accomplish the similar subsetting with data.table? the same code above yields...
data[ , c(1, 7, 8)]
> [1] 1 7 8
data[ , 3:11]
> [1] 3 4 5 6 7 8 9 10 11
I am aware of dplyr select() but I seek a solution that doesn't involve typing the column names, and would greatly appreciate a clear method for subsetting a data.table by using a "column number." I have occasionally used subset(), and even gone so far as constructing character vector J for use in data[ I, J, by = K]. I must be missing something. Code-masters would consider this trivial, and easily display a flexible solution allowing one to, for example, select columns 1,3,5, 10 through 30, and 97.