0

I have a dataframe (survey data) over time with an outcome of interest (0 or 1) for two groups (T for control and T for treatment), like the following:

set.seed(3546)
Data <- data.frame(
    date = sample((as.Date(as.Date("2011-12-30"):as.Date("2012-01-04"), 
                           origin="1970-01-01")),
                   1000, replace = TRUE),
    treatment_group = sample(c("C", "T"), 1000, replace = TRUE),
    outcome = sample(c("1", "0"), 1000, replace = TRUE)
    )

For this, I plot the proportion of the two groups showing the outcome 1 separately for the groups, which I do with the following code:

Data %>%
    mutate(treatment_group = factor(treatment_group, levels = c("T", "C")), 
           date = as.POSIXct(date)) %>%
    group_by(treatment_group, date) %>%
    summarise(prop = sum(outcome=="1")/n()) %>%       #calculate proportion 
ggplot() +
theme_classic() +
xlab("Date") +
ylab('Proportion outcome mentioned')+ 
scale_color_manual(values = c('C' = 'black', 'T' = 'darkgrey'),
                   labels = c('C' = 'Remaining sample',
                              'T' = 'Treated Group'),
                   name = "Legend") +
geom_smooth(aes(x = date, y = prop, color = treatment_group),
            se = F, method = 'loess') +
geom_point(aes(x = date, y = prop, color = treatment_group))

and I get the following plot: Proportion of the outcome "1" by group

What I would like - but can't figure out how to - is one line showing the difference in proportion between the values for each time point and the respective confidence interval (for the point estimate of the difference in proportions), roughly like this (obviously the style will stay the same - just to give you an idea) desired sample plot

The line should indicate the difference between the proportions of outcome 1 on that particular day. Thanks a lot in advance for helping. :)

divibisan
  • 11,659
  • 11
  • 40
  • 58
Ivo
  • 3,890
  • 5
  • 22
  • 53

1 Answers1

1

How do you expect to calculate CIs if you don't have any measure of the uncertainty in prop?

That aside, you can reshape the date in the following way to plot the difference of proportions:

Data %>%
    mutate(
        treatment_group = factor(treatment_group, levels = c("T", "C")),
        date = as.POSIXct(date)) %>% #convert date to date
    group_by(treatment_group, date) %>% #group
    summarise(
        prop = sum(outcome == "1") / n()) %>% #calculate proportion
    spread(treatment_group, prop) %>%
    mutate(propdiff = T - C) %>%
    ggplot(aes(date, propdiff)) +
    geom_line() + 
    geom_point()

enter image description here

Explanation: Following summarise, we convert data from long to wide, and calculate propdiff as prop(T) - prop(C).

Maurits Evers
  • 49,617
  • 4
  • 47
  • 68