0

I am using R to estimate the hourly wages of employees. My dataset is called Total1. In the dataset there is a variable called "company sized" consisting of the following categories:

Company size 1 2 3 4 5 6 7

Where:

  • 1 = up to 4 employees 2 = from 5 to 19 employees 3 = from 20 to 49 employees 4 = from 50 to 99 employees 5 = from 100 to 499 employees 6 = over 500 employees 7 = not applicable, public employees

I would like to transform "company size" variable into a continuousnvariable, thus using the median values of the size classes, that is:

Company size 2 10 25 50 250 1000 ?

But what value could I use for "not applicable, public employees", so that it does not create inconsistencies in the estimate of the hourly wage?

Vero
  • 1
  • Why do you want to do this? Is the data set very small so you are looking to get more degrees of freedom? – G. Grothendieck Jan 10 '21 at 13:48
  • https://stackoverflow.com/questions/50898623/how-to-replace-multiple-values-at-once – maydin Jan 10 '21 at 13:49
  • It sounds like you want to turn into an `NA` value, so that when you compute summary statistics like mean and median, you can remove them. Imputation does not seem to be indicated here, although I'd ask yourself why there are 7's in your dataset. What would make a company have a category of 7? Is this on purpose or is it a data entry error? – latlio Jan 10 '21 at 13:55
  • For category 7, you can use NA_integer_ – Ritz735 Jan 10 '21 at 18:01

0 Answers0