0

I Have a dataset that contains 79 explanatory variables, In which 43 of them are factor.

Some of the factor variables are just generic labels - For those I intend to use dummy variables for numeric representation.

Some other subset of the factor variables contain ordered levels, for example:

BsmtQual: Evaluates the height of the basement

       Ex   Excellent (100+ inches) 
       Gd   Good (90-99 inches)
       TA   Typical (80-89 inches)
       Fa   Fair (70-79 inches)
       Po   Poor (<70 inches
       NA   No Basement

I want to convert such factor variable to a numeric value that will preserve the order of the levels from the lowest to the highest, meaning that after the operation I want to get something like:

BsmtQual: Evaluates the height of the basement

       Ex records will be replaced with: 6  
       Gd records will be replaced with: 5
       TA records will be replaced with: 4
       Fa records will be replaced with: 3
       Po records will be replaced with: 2
       NA records will be replaced with: 1

(Note sure If I can replace NA with 0 - As NA doesn't actually refers to missing data for this variable, but refers to a record with a low basement score)

How to code this replacement?

Adiel
  • 1,203
  • 3
  • 18
  • 31
  • Do you mean `factor`, or do you really mean `factorial!`? – r2evans Feb 25 '17 at 14:59
  • Probably Factor :) sorry, english isn't my native language – Adiel Feb 25 '17 at 15:00
  • I understand *what* you want to replace the factors with, but perhaps perhaps I'm missing something: *"convert such factor variable to a numeric"*, factors are natively stored as integers, and the order is preserved in the `levels` attribute. If you just want to change the order, then use `levels(x) <- c("Ex", "Gd", "TA", "Fa", "Po")`. – r2evans Feb 25 '17 at 15:05
  • I need to run linear regression on the data. Some factor variables actually have logical (or so I think) numeric values to them - Like in my example.. So instead of creating dummy variables for them, I like to convert them to integers (1 to 1, instead of 1 with n levels to n-1 dummy variables) – Adiel Feb 25 '17 at 18:35
  • It's becoming difficult to talk around this without seeing sample data. Please produce a [reproducible question](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) and edit your question. – r2evans Feb 25 '17 at 19:53

1 Answers1

0
req_var$ExterQual <- revalue(req_var$ExterQual, c("Ex"=5  ,"Gd"=4 , "TA"=3 , "Fa"=2 ,"Po"=1)) 

Here I will not conside NA in these dataset. If you want to give number NA to 0 then add "NA"=0 in the above command.