I am using by
to apply a function to a range columns of a data frame based on a factor. Everything works perfectly well if I use mean()
as the function but if I use median()
I get an error of the type "Error in median.default(x) : need numeric data" even if I don't have NAs in the data frame.
The line that works using mean()
:
by(iris[,1:3], iris$Species, function(x) mean(x,na.rm=T))
> by(iris[,1:3], iris$Species, function(x) mean(x,na.rm=T))
iris$Species: setosa
Sepal.Length Sepal.Width Petal.Length
5.006 3.428 1.462
------------------------------------------------------------
iris$Species: versicolor
Sepal.Length Sepal.Width Petal.Length
5.936 2.770 4.260
------------------------------------------------------------
iris$Species: virginica
Sepal.Length Sepal.Width Petal.Length
6.588 2.974 5.552
Warning messages:
1: mean(<data.frame>) is deprecated.
Use colMeans() or sapply(*, mean) instead.
2: mean(<data.frame>) is deprecated.
Use colMeans() or sapply(*, mean) instead.
3: mean(<data.frame>) is deprecated.
Use colMeans() or sapply(*, mean) instead.
But if I use median()
(note the na.rm=T option
):
> by(iris[,1:3], iris$Species, function(x) median(x,na.rm=T))
Error in median.default(x, na.rm = T) : need numeric data
However if instead of choosing the range [,1:3]
of columns I choose only one of the columns it works:
> by(iris[,1], iris$Species, function(x) median(x,na.rm=T))
iris$Species: setosa
[1] 5
------------------------------------------------------------
iris$Species: versicolor
[1] 5.9
------------------------------------------------------------
iris$Species: virginica
[1] 6.5
How can I achieve this behaviour while selecting a range of columns?