0

I am trying to plot one discrete variable on the x-axis against a continuous one on the y. Imagine in mtcars that I am trying to plot cyl vs. disp. What if some of the values of disp were NA? I would like to know how many NA there were for each value of cyl, and to display this in a simple table, possibly right below the legend (or within the legend itself). Is there a simple (or a complicated) way to do this?

Similar and related question I posed: R - looking at means by subgroup and overall on a line graph

Thanks!

Community
  • 1
  • 1
garson
  • 1,505
  • 3
  • 22
  • 56
  • Yes, there is a complicated way. But note that `ggplot2` is a visualization tool. It is not a good idea to do everything with ggplot2. If you want a table, you can make the table by, say, `knitr::kable()`, `table()`, `tables::tabular()` etc. – kohske Oct 20 '14 at 06:59
  • 1
    If you can live with the table being in the plot area, then this Q is a dupe of http://stackoverflow.com/questions/12318120/adding-table-within-the-plotting-region-of-a-ggplot-in-r#12318578 - just make the table with `table(is.na(foo$X, foo$Y))` or similar. `annotation_custom` gets cropped to the plot area, so until we find a way to add annotations to the legend area.... – Spacedman Oct 20 '14 at 07:07
  • Oops. Table construction should be `table(is.na(foo$X),foo$Y)` to get a 2d table. – Spacedman Oct 20 '14 at 07:14
  • How about adding the number of NAs to the legend itself (instead of "right below" it)? i.e, "4 Cyl (3 NAs)" – etov Oct 20 '14 at 14:35

1 Answers1

0

This answer does not meet all question requirements, but since the details on how exactly the data should be presented are a little vague, I'm posting anyway.

So here's a way to add NA counts to the legend itself:

library(datasets)
mycars <- mtcars
mycars$disp[c(1,2,3)] <- NA

lvls = levels(as.factor(mycars$cyl))
nacounts <- by(mycars, mycars$cyl, function(x) sum(is.na(x$disp)))
labels = paste(lvls," (NA=",as.integer(nacounts),")",sep="")

ggplot(data=mycars) +
   geom_boxplot(aes(x=cyl,y=disp, fill=as.factor(cyl)))  +
   scale_fill_discrete(name="Cyl", labels=labels)

Result

EDIT

Relating to the stat_summary graph referred-to in the question: labels describing line types can be added using the scale_linetype_* functions.

In case you'd like to have the same legend as in the image above, I think you'll have to add graph elements describing cyl, e.g:

ggplot(mycars,aes(cyl,disp)) +
  stat_summary(fun.y=mean, geom="line", lwd=1.5) +
  stat_summary(aes(lty=factor(vs)),fun.y="mean",geom="line") +
  stat_summary(aes(color=factor(cyl)),fun.y="mean",geom="point",size=5) +
  scale_x_continuous(breaks=c(4,6,8),labels=c("four","6","8")) +
  scale_color_discrete(labels=labels)

plot with point geometry overlay

etov
  • 2,972
  • 2
  • 22
  • 36
  • Thanks for responding! Yes, including the NA's in the legend is even better. I am having trouble adapting it for the line graph created here, any suggestions?: http://stackoverflow.com/questions/26452364/r-looking-at-means-by-subgroup-and-overall-on-a-line-graph/26452529#26452529 – garson Oct 20 '14 at 14:54
  • In the example you're referring to, the legend describes the line type, derived from the "vs" parameter. To alter the line-type labels, try adding scale_linetype_manual (e.g. scale_linetype_manual(values=c(1,2), labels=c("1","2"))). I'll update the answer – etov Oct 20 '14 at 15:22
  • Thank you! Is there any way to change the title of the legend from factor(vs) to something else? I figured out hot to change the factor(cyl) title using name=. – garson Oct 20 '14 at 16:30
  • since the factor(vs) legend refers to line types (generated by the lty= part of stat_summary), you should set the name in a scale_linetype_* function. e.g. try adding to the plot: scale_linetype_manual(values=c(1,2), labels=c("1","2"), name="vs") – etov Oct 20 '14 at 17:04