0

I want to create a graph that shows the busy times in a hostal I am working in. Ideally, I would be able to create two density curves (or bars), one for the regular weekdays and one for weekends and holidays. I have the check-in and check-out times of all our clients and not much more.

I am picturing getting something similar to this type of graph (although, bear in mind people here stay overnight).

Popular times - Google

Any ideas on how to tackle this?

1 Answers1

0

First, let's generate some check-in/check-out data:

library(tidyverse)
library(lubridate)

nclients <- 40
startinterval <- '2021/11/04'
endinterval <- '2021/11/07'


set.seed(1236)
data <- tibble(client = 1:nclients,
               checkin = ymd_hms(sample(seq(as.POSIXct(startinterval),
                                            as.POSIXct(endinterval),
                                            by="sec"),
                                        nclients,
                                        replace = TRUE))
)


data <- data %>%
  mutate(inte = floor(sample(x = 1800:28800, size = nrow(data), replace = TRUE)),
         checkout = checkin + inte) %>%
  select(-inte)

head(data)

# # A tibble: 6 x 3
# client checkin             checkout
# <int> <dttm>              <dttm>
# 1      1 2021-11-06 23:21:22 2021-11-07 06:14:03
# 2      2 2021-11-04 19:22:20 2021-11-04 22:46:54
# 3      3 2021-11-06 21:44:56 2021-11-07 04:22:11
# 4      4 2021-11-05 04:32:33 2021-11-05 09:32:05
# 5      5 2021-11-05 13:27:55 2021-11-05 15:34:22
# 6      6 2021-11-04 15:31:23 2021-11-04 22:41:26

We then need a function to convert that data into one that specifies the hours the client were registered (i.e. between check-in and check-out). Modifying part of Bas's answer here, we get:

whathours <- function(start_time, end_time) {
  time_interval <- interval(start_time, end_time)
  
  start_hour <- floor_date(start_time, unit = "hour")
  end_hour <- ceiling_date(end_time, unit = "hour")
  diff_hours <- as.double(difftime(end_hour, start_hour, "hours"))
  
  hours <- start_hour + hours(0:diff_hours)
  hour_intervals <- int_diff(hours)
  
  hours <- hours[1:(length(hours)-1)]
  tibble(Day = date(hours),
         HourOfDay = hour(hours))

}

Mapping that function, we then generate a dataset that groups how many clients were registered in the hostal for every hour, grouping for type of day as requested:

data2 <- data %>%  
  mutate(start_time = as_datetime(checkin),
         end_time = as_datetime(checkout)) %>% 
  as_tibble() %>% 
  mutate(infoperhour = purrr::map2(start_time, end_time, whathours)) %>% 
  unnest(infoperhour) %>% 
  group_by(Day, HourOfDay) %>% 
  summarise(day = wday(Day),
            typeofday = ifelse(day %in% c(1:5), "weekday", "weekend")) %>%
  group_by(typeofday) %>% count(HourOfDay, sort = TRUE) %>%
  ungroup() %>%
  arrange(typeofday, HourOfDay)
head(data2)

# # A tibble: 6 x 3
# typeofday HourOfDay     n
# <chr>         <int> <int>
# 1 weekday           0     3
# 2 weekday           1     3
# 3 weekday           2     5
# 4 weekday           3     5
# 5 weekday           4     5
# 6 weekday           5     4

Finally, we plot the data:

data2 %>%
  ggplot(.,aes(x = HourOfDay, color = typeofday))+
  geom_density()+
  scale_x_continuous(limits = c(0, 23), breaks = seq(0, 23, by = 1))+
  theme_classic()

Final plot

monoazul
  • 38
  • 8