-1

I'm trying to divide a column of number into 6 equal categories if the number is larger than 0.

This is have tried

if (nost13$actsum > 0) nost13$actclass2 <- as.factor( as.numeric( cut(nost13$actsum ,6)))
else 0

Not working though...

What is wrong?

Vestlink
  • 65
  • 8
  • 1
    The first thing that is wrong is the use of `if`. In R it does not act on vectors as the warning message should have told you. – IRTFM May 15 '16 at 16:00
  • ok :-) Then how would it be done then?? – Vestlink May 15 '16 at 16:02
  • 1
    (1) see `?ifelse`; (2) if that doesn't answer your question, can you please include data that will provide us with a [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) ? – Ben Bolker May 15 '16 at 16:03
  • 1
    Possible duplicate of [Best way to replace a lengthy ifelse structure in R](http://stackoverflow.com/questions/36390967/best-way-to-replace-a-lengthy-ifelse-structure-in-r) – MichaelChirico May 15 '16 at 16:04
  • 1
    ill look into ifelse – Vestlink May 15 '16 at 16:04

1 Answers1

1

Perhaps this untested stab at an answer. Tested solutions can be provided if you first provide a data object. There's also ambiguity about what an "equal category" might be. Equal counts? Or equal span? This answers the equal span which is what cut would deliver.

 nost13$actclass2 <- ifelse(nost13$actsum > 0,
                                 cut(nost13$actsum ,6), 0)

I suspect the coercion to numeric will occur inside ifelse. Your code would have attempted to append 0's to factors which would have ended in tears. If you wnat this to be a factor with levels "0"-"6" then wrap the entire ifelse(....) in factor(.).

Here's some lightweight testing:

 actclass2 <- ifelse(-100:100 > 0,
                                  cut(-100:100 ,6), 0)
 table(actclass2)
#------------
actclass2
  0   4   5   6 
101  33  33  34 

So depending on the distribution of values, you might not have gotten exactly what you wanted. This shows a modification of that strategy that will probably be more pleasing:

> vals <- -100:100
> splits <- seq(min(vals[vals>0]),max(vals[vals>0]), length=8)[-8]
> actclass2 <- ifelse(vals > 0,
+                           cut(vals ,breaks=splits ), 0)
> table(actclass2)
actclass2
  0   1   2   3   4   5   6 
101  14  14  14  14  14  14 

Need a sequence of length = 8 to get 6 intervals with cut, since the max value was discarded and need 7 boundaries to generate 6 intervals. After going through this I'm thinking that the findInterval function would produce a clearer path to success.

> table( findInterval( vals, c(-Inf, 0, splits[-1], Inf) ))

  1   2   3   4   5   6   7   8 
100  16  14  14  14  14  14  15 

findInterval has intervals closed on the left versus cut whose default is closed on the right.

IRTFM
  • 258,963
  • 21
  • 364
  • 487