Questions tagged [model.matrix]

113 questions
20
votes
2 answers

Big matrix to run glmnet()

I am having a problem to run glmnet lasso with a wide data set. My data has N=50, but p > 49000, all factors. So to run glmnet i have to create a model.matrix, BUT i just run out of memory when i call model.matrix(formula, data), where formula =…
Flavio Barros
  • 996
  • 1
  • 11
  • 29
8
votes
1 answer

Warning message - dummy from dummies package

I am using the dummies package to generate dummy variables for categorical variables, some with more than two categories. testdf<- data.frame( "A" = as.factor(c(1,2,2,3,3,1)), "B" = c('A','B','A','B','C','C'), "C"=…
Max_IT
  • 602
  • 5
  • 15
5
votes
2 answers

Speed up this loop to create dummy columns with data.table and set in R

I have a data table and I want to create a new column for each unique day, and then assign a 1 in each row where the day matches the column name I have done this using a for loop but I was wondering if there was any way to optimise it using…
5
votes
1 answer

use model.matrix through rpy2?

I prefer python over R for my work. From time to time, I need to use R functions, and I start to try Rpy2 for that purpose. I tried but failed to find out how to replicate following with Rpy2 design <- model.matrix(~Subject+Treat) I have gone as…
xyliu00
  • 726
  • 1
  • 9
  • 24
4
votes
1 answer

How does model.matrix select levels for interaction terms

model.matrix returns fewer levels if lower order terms are included with interaction terms. If two-factor variables have na and nb levels, respectively. In a complete model.matrix with interaction terms, model.matrix(~ A + B + A:B), shouldn't I have…
Shubham Gupta
  • 650
  • 6
  • 18
4
votes
1 answer

Force model.matrix to follow the order of the terms in the formula in R

Lets create a matrix with fake data: data_ex <- data.frame(y = runif(5,0,1), a1 = runif(5,0,1), b2 = runif(5,0,1), c3 = runif(5,0,1), d4 = runif(5,0,1)) > data_ex y a1 b2 c3 d4 1 0.162 0.221 0.483 0.989…
FR_
  • 147
  • 9
4
votes
0 answers

Variable order in interaction terms

I'm trying to fit a number of linear models as shown below. It is important that all interaction terms are sorted lexicographically. Note that the second model is missing the main effect for x. x = rnorm(100) y = rnorm(100) z = x + y +…
rimorob
  • 624
  • 1
  • 5
  • 16
4
votes
1 answer

Is there a function to return the matching response vector to model.matrix?

In glmnet() I have to specify the raw X matrix and response vector Y (different than lm where you can specify the model formula). model.matrix() will correctly remove incomplete observations from the X matrix, but it doesn't include the response in…
Robert Kubrick
  • 8,413
  • 13
  • 59
  • 91
4
votes
1 answer

Melt a dummy matrix to a column

If I have a factor variable, say x = factor(c(1, 2, 3)), then I can use model.matrix function to generate a dummy matrix: model.matrix(~x + 0) and I will get a matrix like: x1 x2 x3 1 1 0 0 2 0 1 0 3 0 0 1 My question is that, if I…
Bayes
  • 67
  • 7
4
votes
1 answer

Rownames for data.table in R for model.matrix

I have a data.table DT and I want to run model.matrix on it. Each row has a string ID, which is stored in the ID column of DT. When I run model.matrix on DT, my formula excludes the ID column. The problem is, model.matrix drops some rows because…
DavidR
  • 810
  • 2
  • 8
  • 16
4
votes
2 answers

model.matrix using multiple columns

I'm trying to use multiple columns from a data.frame in a model.matrix. The data frame looks like this: df1 <- data.frame(id=seq(1,10,1), zip1=(round(runif(10)*100000,0)), zip2=(round(runif(10)*100000,0)) …
screechOwl
  • 27,310
  • 61
  • 158
  • 267
3
votes
1 answer

How can I obtain a minimal data frame of only the variables used in a statistical model in R?

Take the following example: fit <- lm(Sepal.Length ~ log(Sepal.Width), data = iris) I would like a copy of iris that only includes the variables that were involved in making fit. I think model.matrix() or model.frame() don't quite do it because of…
cgmil
  • 410
  • 2
  • 18
3
votes
0 answers

Make a model matrix if missing the response variable and where matrix multiplication recreates the predict function

I want to create a model matrix for a test dataset which is missing the response variable, and where I can perfectly replicate the results of calling predict() on the model if building predictions using matrix multiplication. See code below for…
jruf003
  • 980
  • 5
  • 19
3
votes
1 answer

model.matrix Error: $ operator is invalid for atomic vectors

I ran into this error when using 'model.matrix'. data_A <- data.frame(X1 = c("Y","N"), X2 = c(20,24), Y = c("N","Y")) data_A model.matrix("Y ~ X1 + X2", data_A) Error: $ operator is invalid for atomic vectors What's causing the problem?
LeGeniusII
  • 900
  • 1
  • 10
  • 28
3
votes
2 answers

Check that model has only one factor covariate

I am writing an R package, where the main function takes a model, which may only have a single factor covariate (offsets are allowed). To make sure the user complies with this rule I need to check this. As an example, let's look at the following…
Heidi
  • 187
  • 9
1
2 3 4 5 6 7 8