Suppose we have extensive data.table containing multiple columns, some numeric and other characters. For each sub-group by and each column, find the first non-NA value: For example, if two rows represent one sub-group:
Group V1 V2 V3 V4 V5 V6
1 3 NA 5 NA NA ab
1 7 fn 0 2 NA NA
The expected result is:
Group V1 V2 V3 V4 V5 V6
1 3 fn 5 2 NA ab
Suppose we have data.table with about 40 million rows with 10 million groups and 60 columns. The expected result will contain 10 million (one record for each sub-group) and 60 columns.
Other solutions assume only one column with missing values or only numeric columns with NA's. Using R data.table
function nafill except only double and integer data types and na.locf
nor na.locf0
from package zoo
can run hours before completing.