0

I have a logfile with lots of information. Here is the sample:

event_type       | video                    |id
------------------------------------------------ 
load_video"      | Video -math              | 21
load_video"      | Video -math              | 21
load_video"      | Video - Math and Speed   | 22
play_video"      | Video -math              | 21
seek_video"      | Video -math              | 21
pause_video"     | Video -math              | 21
seek_video"      | Video -math              | 21
play_video"      | Video -math              | 21
pause_video"     | Video -math              | 21
play_video"      | Video - Math and Speed   | 22
pause_video"     | Video - Math and Speed   | 22
stop_video"      | Video - Math and Speed   | 22 

I want to make a transformation by users to get this table.

  id    Video -math                             Video - Math and Speed              
     |  load | play   |  seek  |pause|  stop  | load | play   | seek  | pause | stop
 21  |   2   |    2   |   2    |  2  |    0   |  na  |   na   |   na  |  na   |  na
 22  |   na  |    na  |   na   |  na |    na  |   1  |   1    |    0  |   1   |   1

I have started to use reshape package, but I don't know how can I use it with 3 column.

Edit -> I don't want to make 2 headers. Just I want to show what is my goal.

KorG
  • 96
  • 7
  • 4
    Related / possible duplicate: [*How to reshape data from long to wide format?*](https://stackoverflow.com/q/5890584/2204410) – Jaap Jun 06 '18 at 09:34
  • your desired output seems to have 2 headers, which is not possible. Probably you can paste col1 and col2 of your input data and reshape based on that or you create a list according to video and reshape each subdata inside the list – Cath Jun 06 '18 at 09:37
  • Sorry. I don't want to make 2 headers. I know it is impossible. I just want to show want is my goal. – KorG Jun 06 '18 at 09:50

1 Answers1

1

With dplyr and tidyr:

library(tidyr)
library(dplyr)
library(stringr)

dat %>% 
  mutate_at(1, str_extract, "load|play|seek|pause|stop") %>% 
  unite(video_event_type, video, event_type) %>% 
  count(id, video_event_type) %>% 
  spread(video_event_type, n)

# # A tibble: 2 x 9
#      id `Video - Math and Speed_load` `Video - Math and Speed_pause` `Video - Math and Speed_play` `Video - Math and Speed_stop` `Video -math_load` `Video -math_pause` `Video -math_play` `Video -math_seek`
#   <int>                         <int>                          <int>                         <int>                         <int>              <int>               <int>              <int>              <int>
# 1    21                            NA                             NA                            NA                            NA                  2                   2                  2                  2
# 2    22                             1                              1                             1                             1                 NA                  NA                 NA                 NA

Edit: a slightly more complicated solution using complete to get the expected zeroes:

dat %>% 
  mutate_at(1, str_extract, "load|play|seek|pause|stop") %>% 
  count(id, video, event_type) %>% 
  complete(nesting(id, video), event_type, fill = list(n = 0L)) %>% 
  unite(video_event_type, video, event_type, sep = ".") %>% 
  spread(video_event_type, n)

# # A tibble: 2 x 11
#      id `Video - Math and Speed.load` `Video - Math and Speed.pause` `Video - Math and Speed.play` `Video - Math and Speed.seek` `Video - Math and Speed.stop` `Video -math.load` `Video -math.pause` `Video -math.play` `Video -math.seek` `Video -math.stop`
#   <int>                         <int>                          <int>                         <int>                         <int>                         <int>              <int>               <int>              <int>              <int>              <int>
# 1    21                            NA                             NA                            NA                            NA                            NA                  2                   2                  2                  2                  0
# 2    22                             1                              1                             1                             0                             1                 NA                  NA                 NA                 NA                 NA

(Where dat is:

dat <- read.table(text =
'event_type       | video                    |id
load_video"      | Video -math              | 21
load_video"      | Video -math              | 21
load_video"      | Video - Math and Speed   | 22
play_video"      | Video -math              | 21
seek_video"      | Video -math              | 21
pause_video"     | Video -math              | 21
seek_video"      | Video -math              | 21
play_video"      | Video -math              | 21
pause_video"     | Video -math              | 21
play_video"      | Video - Math and Speed   | 22
pause_video"     | Video - Math and Speed   | 22
stop_video"      | Video - Math and Speed   | 22 
', header = TRUE, sep = "|", quote = "", 
strip.white = TRUE, stringsAsFactors = FALSE)
Aurèle
  • 12,545
  • 1
  • 31
  • 49