7

I have a data set with scores to questions asked over two years. Each question has a 2015 value and a 2016 value. I would like to plot each and then show the different between the 2015 value and the 2016 value. Did the score go up or down or stay the same? I was thinking it might be useful to connect pairs of points with a line (or an arrow) to show the direction of change, but I'm having a hard time getting ggplot to do this. Here is my code example:

df <- read.table(text = "question y2015 y2016
q1 90 50
q2 80 60
q3 70 90
q4 90 60
q5 30 20", header = TRUE)

g1 <- ggplot(df, aes(x=question))
g1 <- g1 + geom_point(aes(y=y2015, color="y2015"), size=4)
g1 <- g1 + geom_point(aes(y=y2016, color="y2016"), size=4)
g1

Different approaches to visualizing this are welcome.

oneself
  • 38,641
  • 34
  • 96
  • 120
  • 1
    I don't have time to write up an answer right now, but if you have a fair number of questions (you mention ~100 in a comment below) I would do a scatterplot of 2015 scores (x) vs 2016 scores (y). Add in a 45-degree line and dots above the line are improvements, and the correlation between the two years is clearly visible (and outliers should stand out as well). – Gregor Thomas Jun 29 '16 at 20:09
  • @onself; Could be interesting for you http://stackoverflow.com/questions/38109623/remove-legend-elements-of-one-specific-geom-show-legend-false-does-not-do-t/38110017#38110017 – Alex Jun 29 '16 at 21:15

5 Answers5

4

I think a "dumbbell" chart would work, too. Here I've reshaped your data to long.

df <- read.table(text = "question y2015 y2016
q1 90 50
q2 80 60
q3 70 90
q4 90 60
q5 30 20", header = TRUE)

df.long <- 
  reshape(df, varying = names(df)[2:3],
        direction = 'long',
        #ids = 'question',
        times = 2015:2016,
        v.names = 'perc',
        timevar = 'year'
        )

ggplot(df.long, aes(x = perc, y = question))+
  geom_line(aes(group = question))+
  geom_point(aes(colour = factor(year)), size = 2)+
  theme_bw()+
  scale_color_brewer(palette = 'Set1', name = 'Year')

enter image description here

bouncyball
  • 10,631
  • 19
  • 31
4

If you facet by question and put year on the x-axis, you can highlight the trend direction with color and use the x-axis to show the passage of time.

library(reshape2)
library(dplyr)
library(ggthemes)

ggplot(df %>% melt(id.var="question") %>% 
         group_by(question) %>% 
         mutate(Direction=ifelse(diff(value)>0,"Up","Down")), 
       aes(x=gsub("y","",variable), y=value, color=Direction, group=question)) + 
  geom_point(size=2) + 
  geom_path(arrow=arrow(length=unit(0.1,"in")), show.legend=FALSE) +
  facet_grid(. ~ question) +
  theme_tufte() +
  theme(strip.text.x=element_text(size=15)) +
  guides(color=guide_legend(reverse=TRUE)) +
  scale_y_continuous(limits=c(0,100)) +
  labs(x="Year", y="Value")

With this encoding of aesthetics, you probably don't need the legend, and adding arrows to the line segments may be superfluous as well, but I've left them in for illustration.

enter image description here

eipi10
  • 91,525
  • 24
  • 209
  • 285
2

It is still a bit ugly and needs fine tuning but it got arrows ;)

library(ggplot2)
library(reshape2)
library(dplyr)

ggplot2df <- read.table(text = "question y2015 y2016
q1 90 50
                 q2 80 60
                 q3 70 90
                 q4 90 60
                 q5 30 20", header = TRUE)


df <- ggplot2df %>% 
  mutate(direction = ifelse(y2016 - y2015 > 0, "Up", "Down"))%>%
  melt(id = c("question", "direction"))


g1 <- ggplot(df, aes(x=question, y = value, color = variable, group = question )) + 
  geom_point(size=4) + 
  geom_path(aes(color = direction), arrow=arrow())

enter image description here

Alex
  • 4,925
  • 2
  • 32
  • 48
1

Maybe something like this? Some reshaping of the data is needed and taken care of with the function gather from the very useful library tidyr.

library(tidyr)
library(ggplot2)

g1 <- df %>% gather(year, value, y2015:y2016) %>%
ggplot(aes(x = year, y = value, color= question)) + 
    geom_point() + 
    geom_line(aes(group=interaction(question)))
g1

enter image description here

thepule
  • 1,721
  • 1
  • 12
  • 22
  • 1
    This is a very nice viz. It's easy to identify the one question which had an increase. – bouncyball Jun 29 '16 at 16:48
  • 2
    This is nice. However, in the real data, I have many more questions (~100) and they text for each is longer than "q1". So, I think that visualizing it this way will be too cluttered. – oneself Jun 29 '16 at 16:51
1

This website seems to have the solution you're looking for (it's a handy site):

https://www.r-graph-gallery.com/connected_scatterplot_ggplot2.html

Excerpt:

# Libraries
library(ggplot2)
library(dplyr)
library(babynames)
library(ggrepel)
library(tidyr)

# data
data <- babynames %>% 
  filter(name %in% c("Ashley", "Amanda")) %>%
  filter(sex=="F") %>%
  filter(year>1970) %>%
  select(year, name, n) %>%
  spread(key = name, value=n, -1)

# Select a few date to label the chart
tmp_date <- data %>% sample_frac(0.3)

# plot 
data %>% 
  ggplot(aes(x=Amanda, y=Ashley, label=year)) +
     geom_point(color="#69b3a2") +
     geom_text_repel(data=tmp_date) +
     geom_segment(color="#69b3a2", 
                  aes(
                    xend=c(tail(Amanda, n=-1), NA), 
                    yend=c(tail(Ashley, n=-1), NA)
                  ),
                  arrow=arrow(length=unit(0.3,"cm"))
      ) +
      theme_ipsum()

chart

treedm
  • 33
  • 5