Extract columns from data table by numeric indices stored in a vector

Question

I want to extract the 4th, 5th, and 6th column from a data table named dt

the following method works:

    dt[, c(4,5,6)]

but the following doesn't:

    a = c(4,5,6)
    dt[, a]

In fact, the second method gives me a reult of:

    4 5 6

Can someone tell me why this is happening? The two method looks equivalent to me.

please read the `data.table` FAQ: https://cran.r-project.org/web/packages/data.table/vignettes/datatable-faq.html — MichaelChirico, Mar 12 '18 at 02:52
Multiple solutions which will work discussed in this question also: https://stackoverflow.com/questions/13383840/select-multiple-columns-in-data-table-by-their-numeric-indices — thelatemail, Mar 12 '18 at 03:29

score 7 · Accepted Answer · answered Mar 12 '18 at 02:30

7

We can use double dots (..) before the object 'a' to extract the columns

dt[, ..a]
#   col4 col5 col6
#1:    4    5    6
#2:    5    6    7
#3:    6    7    8
#4:    7    8    9

Or another option is with = FALSE

dt[, a, with = FALSE]

data

dt <- data.table(col1 = 1:4, col2 = 2:5, col3 = 3:6, col4 = 4:7, col5 = 5:8, col6 = 6:9)

answered Mar 12 '18 at 02:30

akrun

874,273
37
540
662

thanks! the second method works but the first doesn't. I get the following error for using ..a: "Error in eval(expr, envir, enclos) : object '..a' not found". Can you also explain what was wrong with my old code? – Amazonian Mar 12 '18 at 02:39
2

@Amazonian Could you tell me the packageVersion of data.table.. Regarding your code, if you check `?data.table`, `By default with=TRUE and j is evaluated within the frame of x; column names can be used as variables. When with=FALSE j is a character vector of column names, a numeric/logical vector of column positions to select or of the form startcol:endcol, and the value returned is always a data.table. with=FALSE is often useful in data.table to select columns dynamically. Note that x[, cols, with=FALSE] is equivalent to x[, .SD, .SDcols=cols].` – akrun Mar 12 '18 at 02:40
@Amazonian My data.table version is `data.table_1.10.5` – akrun Mar 12 '18 at 02:41
mine is version 1.10.0. I guess my question is how come when I assign the indices to a variable a=c(4,5,6) and use the variable it doesn't work but when I just put in the indices as a vector c(4,5,6) everything works? – Amazonian Mar 12 '18 at 02:46
@Amazonian Please update your version to recent one. The explanation for that one is given in the previous comment – akrun Mar 12 '18 at 02:50
This is the correct way to do this in data.table. Upvoted! – De Novo Mar 12 '18 at 02:57

De Novo · Answer 2 · 2018-03-12T03:14:45.263

@akrun's answer gives you the correct alternative. If you want to know why you need it, here's the more detailed explanation:

The way the data.table subset operation works, in most cases the j expression in dt[i, j, by] with no i or by, is evaluated in the frame of the data table, and returned as is, whether or not it has anything to do with the data table outside the brackets. In versions earlier than 1.9.8, your first code snippet: dt[,c(4, 5, 6)] evaluates to the numeric vector c(4, 5, 6), not the 4th, 5th, and 6th columns. This changed as of data.table v1.9.8 (released November 2016) ( scroll down to v.1.9.8 potentially breaking changes), because people, unsurprisingly, expected dt[,c(4, 5, 6)] to give the 4th 5th and 6th columns. Now, if the j expression is the variable names or numbers, with is automatically set to FALSE. This effectively produces behavior similar to subsetting a data frame (not exactly the same, but similar).

So your second code snippet (where dt[, a] evaluates to a, rather than uses a to subset the columns) is actually the default, and the first is a special case.

To illustrate the odd but standard behavior here, try:

dt[, diag(5)]
#      [,1] [,2] [,3] [,4] [,5]
# [1,]    1    0    0    0    0
# [2,]    0    1    0    0    0
# [3,]    0    0    1    0    0
# [4,]    0    0    0    1    0
# [5,]    0    0    0    0    1

No matter what your dt is, so long as it is a data.table, it will evaluate to the 5*5 identity matrix

Extract columns from data table by numeric indices stored in a vector

2 Answers2

data

Linked

Related