7

Given a dataframe column which is a series of integers (age), I want to convert ranges of integers into ordinal variables.

My current code doesn't work, how do I do this?

df <- read.table("http://dl.dropbox.com/u/822467/df.csv", header = TRUE, sep = ",")

df[(df >= 0)  & (df <= 14)] <- "Age1"
df[(df >= 15) & (df <= 44)] <- "Age2"
df[(df >= 45) & (df <= 64)] <- "Age3"
df[(df > 64)] <- "Age4"

table(df)
smci
  • 32,567
  • 20
  • 113
  • 146
RJ-
  • 2,919
  • 3
  • 28
  • 35

1 Answers1

24

Use cut to do this in one step:

dfc <- cut(df$x, breaks=c(0, 15, 45, 56, Inf))
str(dfc)
 Factor w/ 4 levels "(0,15]","(15,45]",..: 3 4 3 2 2 4 2 2 4 4 ...

Once you are satisfied that the breaks are correctly specified, you can then also use the labels argument to relabel the levels:

dfc <- cut(df$x, breaks=c(0, 15, 45, 56, Inf), labels=paste("Age", 1:4, sep=""))
str(dfc)
 Factor w/ 4 levels "Age1","Age2",..: 3 4 3 2 2 4 2 2 4 4 ...
Andrie
  • 176,377
  • 47
  • 447
  • 496
  • Thanks, it works. Do you know what was wrong with what I was originally trying to do? – RJ- Apr 19 '12 at 06:33
  • 2
    @RJ -- Try this (and then compare it line 5 of your code) to see what went wrong: `c(65, 99, 100, 104, "Age3", "Age2") > 64`. – Josh O'Brien Apr 19 '12 at 06:59
  • To obtain an ordered factor (which was mentioned in the OP), include `ordered_result = TRUE` in `cut()`. – BenBarnes Apr 19 '12 at 07:30
  • @Josh O'Brien I see, I understand now. the integers and alphanumeric lie on a continuum. 1 - 10, A - Z – RJ- Apr 19 '12 at 15:18
  • 1
    @RJ- Yep, that's it. And since even the integers are converted to character strings, they are sorted in a dictionary order (rather than by magnitude). e.g. `sort(c(1:12, 100, letters[1:5]))` – Josh O'Brien Apr 19 '12 at 16:35