1

Good morning,

I'm trying to transpose some data in big query. I've looked at a few other people who have asked this on stackoverflow but the way to do this seems to be to use legacy sql (using group_concat_unquoted) rather than standard sql. I would use legacy but I've had issues with nested data in the past so have since used standard only.

Here's my example, to give some context I'm trying to map out some customer journeys which I have below:

uniqueid | page_flag | order_of_pages
A        | Collection|   1
A        | Product   |   2
A        | Product   |   3
A        | Login     |   4
A        | Delivery  |   5
B        | Clearance |   1
B        | Search    |   2
B        | Product   |   3
C        | Search    |   1
C        | Collection|   2
C        | Product   |   3

However I'd like to transpose the data so it looks like this:

uniqueid | 1          | 2          | 3       | 4     | 5 
A        | Collection | Product    | Product | Login | Delivery
B        | Clearance  | Search     | Product | NULL  | NULL
C        | Search     | Collection | Product | NULL  | NULL

I've tried using multiple left joins but get the following error:

select a.uniqueid, 
b.page_flag as page1,
c.page_flag as page2,
d.page_flag as page3,
e.page_flag as page4,
f.page_flag as page5

from

(select distinct uniqueid, 
(case when uniqueid is not null then 1 end) as page_hit1,
(case when uniqueid is not null then 2 end) as page_hit2,
(case when uniqueid is not null then 3 end) as page_hit3,
(case when uniqueid is not null then 4 end) as page_hit4,
(case when uniqueid is not null then 5 end) as page_hit5
from `mytable`) a

LEFT JOIN (
SELECT *
from `mytable`) b on a.uniqueid = b.uniqueid
and a.page_hit1 = b.order_of_pages


LEFT JOIN (
SELECT *
from `mytable`) c on a.uniqueid = c.uniqueid
and a.page_hit2 = c.order_of_pages


LEFT JOIN (
SELECT *
from `mytable`) d on a.uniqueid = d.uniqueid
and a.page_hit3 = d.order_of_pages


LEFT JOIN (
SELECT *
from `mytable`) e on a.uniqueid = e.uniqueid
and a.page_hit4 = e.order_of_pages


LEFT JOIN (
SELECT *
from `mytable`) f on a.uniqueid = f.uniqueid
and a.page_hit5 = f.order_of_pages



Error: Query exceeded resource limits for tier 1. Tier 13 or higher required.

I've looked at using Array function as well but I've never used this before and I'm not sure if this is just for transposing the other way around. Any advice would be grand.

Thank you

Thomas Chamberlain
  • 127
  • 2
  • 2
  • 8
  • Check out a new easy way to transpose and pivot: https://medium.com/@hoffa/easy-pivot-in-bigquery-one-step-5a1f13c6c710 – Felipe Hoffa Jun 24 '20 at 17:47

1 Answers1

6

for BigQuery Standard SQL

#standardSQL
SELECT 
  uniqueid,
  MAX(IF(order_of_pages = 1, page_flag, NULL)) AS p1,
  MAX(IF(order_of_pages = 2, page_flag, NULL)) AS p2,
  MAX(IF(order_of_pages = 3, page_flag, NULL)) AS p3,
  MAX(IF(order_of_pages = 4, page_flag, NULL)) AS p4,
  MAX(IF(order_of_pages = 5, page_flag, NULL)) AS p5
FROM `mytable`
GROUP BY uniqueid 

You can play/test with below dummy data from your question

#standardSQL
WITH `mytable` AS (
  SELECT 'A' AS uniqueid, 'Collection' AS page_flag, 1 AS order_of_pages UNION ALL
  SELECT 'A', 'Product', 2 UNION ALL
  SELECT 'A', 'Product', 3 UNION ALL
  SELECT 'A', 'Login', 4 UNION ALL
  SELECT 'A', 'Delivery', 5 UNION ALL
  SELECT 'B', 'Clearance', 1 UNION ALL
  SELECT 'B', 'Search', 2 UNION ALL
  SELECT 'B', 'Product', 3 UNION ALL
  SELECT 'C', 'Search', 1 UNION ALL
  SELECT 'C', 'Collection', 2 UNION ALL
  SELECT 'C', 'Product', 3 
)
SELECT 
  uniqueid,
  MAX(IF(order_of_pages = 1, page_flag, NULL)) AS p1,
  MAX(IF(order_of_pages = 2, page_flag, NULL)) AS p2,
  MAX(IF(order_of_pages = 3, page_flag, NULL)) AS p3,
  MAX(IF(order_of_pages = 4, page_flag, NULL)) AS p4,
  MAX(IF(order_of_pages = 5, page_flag, NULL)) AS p5
FROM `mytable`
GROUP BY uniqueid 
ORDER BY uniqueid   

result is

uniqueid    p1          p2          p3      p4      p5   
A           Collection  Product     Product Login   Delivery     
B           Clearance   Search      Product null    null     
C           Search      Collection  Product null    null

Depends on your needs you can also consider below approach (not pivot though)

#standardSQL
SELECT uniqueid,
   STRING_AGG(page_flag, '>' ORDER BY order_of_pages) AS journey
FROM `mytable`
GROUP BY uniqueid
ORDER BY uniqueid   

if to run with same dummy data as above - result is

uniqueid    journey  
A           Collection>Product>Product>Login>Delivery    
B           Clearance>Search>Product     
C           Search>Collection>Product    
Mikhail Berlyant
  • 165,386
  • 8
  • 154
  • 230
  • Excellent, thank you again Mikhail. Both methods worked perfectly. – Thomas Chamberlain Aug 02 '17 at 10:09
  • In a case where the category too many(for example they go from 1 to 500) and too be tedious to type out individual, how would this be executed? – Sade Apr 28 '20 at 08:27
  • Check out a new easy way to transpose and pivot: https://medium.com/@hoffa/easy-pivot-in-bigquery-one-step-5a1f13c6c710 – Felipe Hoffa Jun 24 '20 at 17:47
  • @FelipeHoffa - believe me - I already did same second you updated one of your posts! Nice! Very nice! Didn't have chance yet to test / play with, but definitely do shortly and hopefully will use here on SO. Thank you!!! – Mikhail Berlyant Jun 24 '20 at 17:52
  • Should I say in the post that I was inspired by the countless times you have posted the SQL generated string solution? Because I was :) – Felipe Hoffa Jun 24 '20 at 17:53
  • sure. I will be honored by being mentioned in any posts of yours. You are the legend and face of BigQuery (at some extend) - so definitely yes :o) – Mikhail Berlyant Jun 24 '20 at 17:55
  • 1
    Article updated with proper credits! If you'd like to introduce changes to the code, I left a copy on https://github.com/fhoffa/code_snippets/blob/master/util/pivot.sql. Open for improvements (but CLA needed) – Felipe Hoffa Jun 24 '20 at 23:40
  • Thank you, @FelipeHoffa! – Mikhail Berlyant Jun 24 '20 at 23:43