I'm attempting to re-create Google Analytics "Last Non-Direct Click" marketing attribution methodology on top of Amplitude data set. In basic terms, this means if a user has a session where the channel is direct, it looks backwards in time and finds the most recent non-direct channel value.
I have a table that looks roughly like this. I also have the timestamp of the session start. Session ID is in order with which the session occurred, ASC.
Session ID | Marketing Channel |
---|---|
1 | direct |
2 | paid |
3 | direct |
4 | organic search |
5 | direct |
I want to create a third column where the values would look like this
Session ID | Marketing Channel | Last Non-Direct Click Channel |
---|---|---|
1 | direct | direct |
2 | paid | paid |
3 | direct | paid |
4 | organic search | organic search |
5 | direct | organic search |
6 | direct | organic search |
7 | direct | organic search |
I need to take any value in the Channel Tactic column where = to "Direct" and replace it with the previous Channel Tactic value that is not <> direct. It has to be the most recent one historically, it cannot be just any that is not direct from the past. So in the above example, row 5 "direct" cannot be either paid or organic search, it must be organic search.
I was thinking of leveraging a window function join but I can't think of way to make sure I'm only grabbing the most recent non-direct channel, not just any random non-direct channel. Keep in mind my data set is huge and most users have many sessions with multiple different marketing channels.
I'm looking for a solution that will be compatible in Snowflake, thank you!