0

I'm not sure whether this is really a programming question, or a data visualization question. If you think another site would be more appropriate, please let me know. Anyway, I have the following data frame:

GRID FLOW NITER    tau
   0  100     1 0.6152
   0  100 35001 0.6152
   0  100 69001 0.6152
   1  100     1 0.6105
   1  100 35001 0.6106
   1  100 69001 0.6106
   2  100     1 0.6147
   2  100 35001 0.6147
   2  100 69001 0.6147
   3  100     1 0.6151
   3  100 35001 0.6153
   3  100 69001 0.6153
   4  100     1 0.6105
   4  100 35001 0.6105
   4  100 69001 0.6105
   5  100     1 0.6140
   5  100 35001 0.6142
   5  100 69001 0.6142
   6  100     1 0.6130
   6  100 35001 0.6129
   6  100 69001 0.6129
   7  100     1 0.6152
   7  100 35001 0.6152
   7  100 69001 0.6152
   8  100     1 0.6098
   8  100 35001 0.6097
   8  100 69001 0.6097
   9  100     1 0.6143
   9  100 35001 0.6143
   9  100 69001 0.6143
  10  100     1 0.6123
  10  100 35001 0.6123
  10  100 69001 0.6123
  11  100     1 0.6148
  11  100 35001 0.6150
  11  100 69001 0.6150
  12  100     1 0.6155
  12  100 35001 0.6154
  12  100 69001 0.6154
  13  100     1 0.6152
  13  100 35001 0.6154
  13  100 69001 0.6154
  14  100     1 0.6154
  14  100 35001 0.6154
  14  100 69001 0.6154
  15  100     1 0.6152
  15  100 35001 0.6153
  15  100 69001 0.6153
  16  100     1 0.6162
  16  100 35001 0.6162
  16  100 69001 0.6162
  17  100     1 0.6150
  17  100 35001 0.6152
  17  100 69001 0.6152
  18  100     1 0.6160
  18  100 35001 0.6160
  18  100 69001 0.6160
  19  100     1 0.6150
  19  100 35001 0.6152
  19  100 69001 0.6152
  20  100     1 0.6150
  20  100 35001 0.6149
  20  100 69001 0.6149
  21  100     1 0.6150
  21  100 35001 0.6150
  21  100 69001 0.6150
  22  100     1 0.6150
  22  100 35001 0.6150
  22  100 69001 0.6150
  23  100     1 0.6152
  23  100 35001 0.6153
  23  100 69001 0.6153
  24  100     1 0.6150
  24  100 35001 0.6152
  24  100 69001 0.6152
  25  100     1 0.6156
  25  100 35001 0.6156
  25  100 69001 0.6156
  26  100     1 0.6152
  26  100 35001 0.6154
  26  100 69001 0.6154
  27  100     1 0.6158
  27  100 35001 0.6159
  27  100 69001 0.6159
  28  100     1 0.6151
  28  100 35001 0.6153
  28  100 69001 0.6153
  29  100     1 0.6160
  29  100 35001 0.6159
  29  100 69001 0.6159
  30  100     1 0.6146
  30  100 35001 0.6148
  30  100 69001 0.6147
  31  100     1 0.6143
  31  100 35001 0.6145
  31  100 69001 0.6145
  32  100     1 0.6151
  32  100 35001 0.6153
  32  100 69001 0.6153
  33  100     1 0.6151
  33  100 35001 0.6153
  33  100 69001 0.6153
  34  100     1 0.6164
  34  100 35001 0.6163
  34  100 69001 0.6163
  35  100     1 0.6172
  35  100 35001 0.6172
  35  100 69001 0.6172
  36  100     1 0.6162
  36  100 35001 0.6163
  36  100 69001 0.6163
  37  100     1 0.6164
  37  100 35001 0.6165
  37  100 69001 0.6165
  38  100     1 0.6157
  38  100 35001 0.6157
  38  100 69001 0.6157
  39  100     1 0.6157
  39  100 35001 0.6157
  39  100 69001 0.6157
  40  100     1 0.6197
  40  100 35001 0.6197
  40  100 69001 0.6197

I need to make a plot to let the reader see the difference (or lack thereof) between the curve for GRID=0, and all the other grids. The intended audience is technical (it's a scientific paper). I tried the following in ggplot:

df$GRID <- as.factor(df$GRID)
p <- ggplot(df, aes(x = NITER, y = tau, linetype = GRID ))
p + geom_line() +
    ggtitle("tau") +
    labs(x="", y="") 

It didn't turn out so well:

enter image description here

First of all, the legend only holds 12 symbols, while the remaining 28 curves weren't plotted at all. Secondly, the text is extremely small. Switching to,

p <- ggplot(df, aes(x = NITER, y = tau, color = GRID ))

now at least R plots all curves:

enter image description here

However, it's extremely difficult to distinguish the different curves. Worse, while before it was easy to spot the GRID=0 curve (it was the only continuous line), now it's very difficult to find it. What can I do? Another solution may be to plot all curves using the same color and the same line type, but adding the curve name (0,1,2,...40) right on top of each curve, instead than using a legend. I have no idea how to do that: also, I need some way not to overwrite the names of two curves which share the same endpoint.

DeltaIV
  • 4,773
  • 12
  • 39
  • 86
  • Any plot with that many different lines is going to be difficult to understand, whether you use color, linetype, or both. However, to set linetypes and colors manually, you can use `scale_colour_manual` and/or `scale_linetype_manual`. For an example, see [this SO answer](http://stackoverflow.com/a/34713257/496488). To make the `GRID=0` line stand out, you can also make it thicker than the others by mapping `GRID` to a size aesthetic and then using `scale_size_manual` to set the `GRID=0` line to be thicker than the others. – eipi10 Jan 19 '16 at 18:34
  • @jenesaisquoi, color can be included. For colored images, journals often ask to pay a publication fee. However, I asked the editor and he told me I wouldn't have to pay. Maybe that's because it's just a colored line plot, so there's not so much color as there would be in, say, a filled contour plot. Or maybe they're being understanding because I didn't want to put the plot, but a reviewer wants me to do it. – DeltaIV Jan 19 '16 at 20:24
  • @eipi10, you're absolutely right, and were it for me, I would have never included such a plot in the paper: it's not only hard to read, but also not so useful (I won't go into the details of the scientific problem, but trust me, it's not an indispensable plot). Unfortunately one of the reviewers wants the plot to be added. – DeltaIV Jan 19 '16 at 20:25
  • I don't know why the question got downvoted: I don't think it's not very smart to do it without explaining why. I may improve the question if I receive constructive criticism, but just downvoting without any explanation doesn't help me improve the quality of the question. – DeltaIV Jan 19 '16 at 20:40
  • I asked [a similar question on crossvalidated](http://stats.stackexchange.com/questions/190152/visualising-many-variables-in-one-plot) a few days ago, maybe the answers are helpful for you, too! – erc Jan 20 '16 at 08:26

2 Answers2

4

Given your data, you need to think about whether different aesthetics (axes) and geometries (dots) are not more appropriate.

One approach would be to use GRID on the x-axis and tau on the y-axis, and plotting NITER coloured points while mitigating overlapping of datapoints with the same value for tau but not for NITER by a slight position shift:

ggplot(df,aes(x = GRID, y = tau, colour = factor(NITER))) +
    geom_point(position = position_dodge(w = 0.3))

enter image description here

mtoto
  • 23,919
  • 4
  • 58
  • 71
  • 2
    This is a much better step in the right direction (+1). Not sure what exactly `NITER` is, but given that there's only three values in this dataset, you might want to also show an example where `colour = factor(NITER)` and maybe even adding some jitter to show the overlapping points. – JasonAizkalns Jan 19 '16 at 19:06
  • On second thought, `position = position_dodge(w = 0.3)` is probably a better call here. – JasonAizkalns Jan 19 '16 at 19:22
  • @mtoto, I don't think thats' going to work. The actual dataframes each contain about 100 `NITER` for every `GRID`! In my example I put just three (1, 35001, 69001), so that I could give an idea of the challenge, without having to post a huge sample data frame. And I couldn't reduce the number of different `GRID`s in my example, because the issue really shows only with a large enough number of curves to plot (40 in my case). – DeltaIV Jan 19 '16 at 20:38
  • Try without factorizing `NITER` in colour – mtoto Jan 19 '16 at 20:43
  • As mentioned above, I think its more of a conceptual problem . Do you have examples of how same sort of data has been plotted before? – mtoto Jan 19 '16 at 20:46
  • I need to leave the office now, tomorrow I'll try without factorising NITER. Concerning examples, I'll try to find one, but usually people just don't do this kind of plot, and even if they do, I've never seen it done for ALL the grids. – DeltaIV Jan 19 '16 at 21:01
  • Without factorizing, the result is quite unreadable because there are many `NITER`s for each `GRID`. But I could join two plots...I use my colored plot, to give the idea that, for each `GRID`, the value of `tau` for `NITER` which goes to 69001 becomes stable. Then, to get an idea of the distribution of the final values of `tau` for each `GRID`, I use your plot, but filtering only the last iteration: `ggplot(subset(df,NITER==69001),aes(x = GRID, y = tau)) + geom_point(position = position_dodge(w = 0.3))`. I may not even need the dodge. Will make a few tests. – DeltaIV Jan 20 '16 at 17:20
3

As an alternative, if you are just trying to bring out the comparison of grid 0 vs all the rest, you can use a variation of something like this. I'm not sure this looks great, but playing with the various aesthetics might improve it.

## Make a variable: one vs the rest
dat$grp <- factor(1L + (dat$GRID == 0))

library(ggplot2)
library(ggthemes)  # theme_tufte
lbls <- c('0', 'Other')
ggplot(dat) +
    geom_line(aes(NITER, tau, group=GRID, color=grp, alpha=grp, size=grp)) +
    theme_tufte() +
    scale_size_manual('Grid', breaks=2:1, values=c(.9, 1.1), labels=lbls) +
    scale_color_manual('Grid', breaks=2:1, values=c('grey0', 'red'), labels=lbls) +
    scale_alpha_manual('Grid', breaks=2:1, values=c(0.1, 1), labels=lbls)

enter image description here

Rorschach
  • 31,301
  • 5
  • 78
  • 129
  • this is also nice, but I solved by combining my colored plot and the other answer. However, in case I need to do this again in the future, I'll keep your suggestion in mind - I like the look of it. – DeltaIV Jan 21 '16 at 19:49