0

Hi I want to aggregate group by and change it from tall to wide data in bigquery, how do I do so? I have a lot of sources and here's a sample data.

Here's the table

date source price id
2022-01-01 A 2 1
2022-01-02 A 2 1
2022-01-03 A 4 1
2022-01-04 A 4 1
2022-01-01 B 1 1
2022-01-02 B 1 1
2022-01-03 B 3 1
2022-01-04 B 3 1
2022-01-01 A 2 2
2022-01-02 A 2 2
2022-01-03 A 4 2
2022-01-04 A 4 2
2022-01-01 B 1 2
2022-01-02 B 1 2
2022-01-03 B 3 2
2022-01-04 B 3 2

into fields of min from price from all source for group by id and min price per source group by id

id minPrice minPriceSourceA minPriceSourceB
1 2.5 3 2
2 2.5 3 2

Here's my current code

with Amin as 
(
select 
id,source,
min(price) min price
from table
where source ="A"
group by 1,2
),
Bmin as
(
select 
id,source,
min(price) min price
from table
where source ="B"
group by 1,2
),


select
t1.id,t1.minprice,
Amin.minprice minPriceSourceA,
Bmin.minprice minPriceSourceB
from(
select 
id,source,
min(price) minprice
from table
group by 1,2) t1
left join Amin on t1.id=Amin.id
left join Bmin on t1.id=Bmin.id

The problem is I have over 100 sources and id, if I do query manually the code will be very long. Is there an efficient way to do this?

2 Answers2

0

You can use PIVOT to transpose rows into columns and get the MIN of a list of columns at once:

with sample as (
  select "2022-01-01" as date, "A" as source, 2 as price, "1" as id
  UNION ALL
  select "2022-01-02" as date, "A" as source, 1 as price, "1" as id
  UNION ALL
  select "2022-01-04" as date, "B" as source, 1 as price, "1" as id
  UNION ALL
  select "2022-01-04" as date, "A" as source, 2 as price, "2" as id
  UNION ALL
  select "2022-01-04" as date, "A" as source, 4 as price, "2" as id
  UNION ALL
  select "2022-01-04" as date, "B" as source, 3 as price, "2" as id
),
min_by_source as (
  SELECT * FROM 
    (SELECT id, source, price FROM sample)
    PIVOT(MIN(price) AS minPrice FOR source IN ('A', 'B')) -- add here the others sources
),
min_global as (
  SELECT id, MIN(price) AS minPrice
  FROM sample
  GROUP BY id
)
SELECT * 
FROM min_global
JOIN min_by_source USING (id)

Output:

id  minPrice    minPrice_A  minPrice_B
1   1           1           1
2   2           2           3
Damião Martins
  • 1,402
  • 10
  • 17
  • hi this works, but the if source PIVOT(MIN(price) AS minPrice FOR source IN (items), for items here I tried using select distinct source from sample it won't work, do I have to type it manually and what if the source have spaces in between for example source A item – Evans Gunawan Jun 11 '22 at 17:54
  • The `IN` clause in `PIVOT` do not allow dynamic values, so you cannot use a `SELECT` statement. An way to resolve this is use `EXECUTE IMMEDIATE` to generate a dynamic query, this answer has an example: https://stackoverflow.com/a/67480414/3010548 – Damião Martins Jun 11 '22 at 21:52
0

Consider below option

select * from (
  select * except(date), 
    avg(price) over(partition by id) avgPrice, 
    min(price) over(partition by id) minPrice 
  from your_table)
pivot (min(price) minPriceSource for source in ('A', 'B'))    

if applied to sample data in your question - output is

enter image description here

The problem is I have over 100 sources and id, if I do query manually the code will be very long. Is there an efficient way to do this?

Use below dynamic version

execute immediate (select '''
select * from (
  select * except(date), 
    avg(price) over(partition by id) avgPrice, 
    min(price) over(partition by id) minPrice 
  from your_table)
pivot (min(price) minPriceSource for source in (''' || string_agg(distinct '"' || source || '"') || '''))
'''
from your_table
)
Mikhail Berlyant
  • 165,386
  • 8
  • 154
  • 230