1

I have data with two dimensions (2 treatments x 3 days), and the goal is to draw the line graph presenting the means and error bars with 2 lines. However, when I use "plot" to draw the means, the result is the graph with 3 X 3 square (don't know what it is ).

data:

 no. ,Treatment, D1,  D2,  D3
 1, A , 4 , 5 , 5
 2, A , 6 , 6 , 4
 3, A , 5 , 7 , 8
 4, B , 2 , 1 , 3
 5, B , 3 , 2 , 2
 6, B , 3 , 2 , 3

I used aggregate to compute the means and standard error. However, when I use plot, the result is weird.

dta=read.table(file ='dta.csv', header = T, sep = ',')

dta.mean=aggregate(dta[,-1:-2], list(dta$Treatment),mean)

plot(dta.mean[1,2:4])

I expected the line graph consisting of 2 lines (one is treatment A, the other is treatment B), and the y values are the means with error bars. Please help me :( Thanks a lot!

image1: this is the wrong result image1: this is the wrong result

image2: the expected one image2: the expected one

A. Suliman
  • 12,923
  • 5
  • 24
  • 37
Leah
  • 41
  • 5

1 Answers1

0

Let's take this step by step. I hope this will be helpful for you.

First, let's see what the result is of the aggregate method:

  Group.1       D1       D2       D3
1       A 5.000000 6.000000 5.666667
2       B 2.666667 1.666667 2.666667

Then, your call to plot includes dta.mean[1,2:4] so that is selecting row 1 and columns 2 through 4 for plotting (as a data frame):

  D1 D2       D3
1  5  6 5.666667

Note that this only includes group "A" (row 1) and 3 numeric variables (for D1, D2, and D3).

When you call plot on a data frame you will get a scatterplot matrix as you provided in your question. This is showing 6 scatterplots (D1 vs. D2, D2 vs. D1, D1 vs. D3, D3 vs. D1, D2 vs. D3, D3 vs. D2). Each of those 6 plots has only one point. For example, D1 vs. D2 is D1 = 5, D2 = 6.

Here is the approach I would take:

First, I would melt the data (reshape2 package):

library(reshape2)
dta.m <- melt(dta[-1], id = "Treatment")

   Treatment variable value
1          A       D1     4
2          A       D1     6
3          A       D1     5
4          B       D1     2
5          B       D1     3
6          B       D1     3
7          A       D2     5
8          A       D2     6
...

This puts your data in a long format (as opposed to wide): variable now is D1, D2, or D3, and value includes the value for those variables. This is very helpful and tidy for ggplot.

Next I would aggregate:

dta.mean = aggregate(value~Treatment+variable, dta.m, mean)

Which should give you this for plotting:

  Treatment variable    value
1         A       D1 5.000000
2         B       D1 2.666667
3         A       D2 6.000000
4         B       D2 1.666667
5         A       D3 5.666667
6         B       D3 2.666667

Using ggplot2:

library(ggplot2)
ggplot(dta.mean, aes(x = variable, y = value, group = Treatment, col = Treatment)) +
  geom_line()

plot of melted mean data

To do errorbars, you will need to aggregate data again (similar for what was done for means) and maybe use geom_errorbar.

I hope this will be helpful for you.

Ben
  • 28,684
  • 5
  • 23
  • 45