0

I want to get sum of columns values and group them by half year window, but number of columns too many, which I want to automate such painful things with sql logic. since we can do count() for table, can we do sum() for getting sum of each columns values in big query standard sql? based on my intuition, I came up my sql but it didn't produce expected output. Can anyone suggest possible workaround ont this?

objective:

I just want to get sum of each column values then group by id, year, and half-year.

my attempt and reproducible data

here is my input table schema and values (Note that for making this minimal example, I just used sql server), here is dbfiddle

here is my attempt:

with tb1(
    select res.*
    from (
        select distinct
            txn.user_id,
            extract(year from txn.purchase_date) as purch_yr,
            CASE 
                WHEN EXTRACT(QUARTER FROM txn.purchase_date) IN (1, 2) THEN 1 
                WHEN EXTRACT(QUARTER FROM txn.purchase_date) IN (3, 4) THEN 2
            END AS purch_period,
    
        from trans txn
        group by user_id, purch_yr, purch_period
    ) as res
)

tb2 as (
    select 
        *,
        sum(*)   -- intend to get sum of each columns
    from trans
    group by user_id, purch_yr, purch_period
)
select tb1.*, tb2.*
from tb1 left join tb2 on tb2.user_id=tb1.user_id

I intend to sum of each columns using sum(*) and group them by user_id, purch_yr, purch_period but above attempt didn't fetch any data though. The reason of doing this because I have so many columns and I need to do sum(col1), ..., sum(coln). Above attempt didn't work.

update- new attempt:

this is my another attempt but I got null values as sum of columns values:

select res.*
from (
  select distinct
    purchase_dateuser_id,
    extract(year from purchase_datepurchase_date) as purch_yr,
    CASE
        WHEN EXTRACT(QUARTER FROM purchase_datepurchase_date) IN (1, 2) THEN 1 
        WHEN EXTRACT(QUARTER FROM purchase_datepurchase_date) IN (3, 4) THEN 2
    END AS purch_period,
    SUM(prod_ab_amt),
    SUM(prod_bf_amt),
    SUM(prod_gm_amt),
    SUM(catg_cg_amt),
    SUM(catg_dh_amt),
    SUM(catg_xy_amt),
    SUM(catg_mk_amt),
    SUM(brand_hk_amt),
    SUM(brand_hp_amt),
    SUM(brand_vm_amt)
  from trans
  group by user_id, purch_yr, purch_period
) as res

in this attempt, I end up with null values as sum of columns values, any better ideas on this?

sql server solution from SO

I found this solution but it is specific for sql server though:

select ',SUM(' + QUOTENAME(name) + ') AS ' + QUOTENAME(name)
from sys.columns sc
WHERE sc.object_id = OBJECT_ID('trans')
AND sc.name NOT IN ('user_id', 'purchase_date')

Can anyone suggest possible workaround on this? How can I do this either in big query standard sql or similar one?

output:

here is my final output that I want to produce in this dbfiddle:

Is there any way I can get my expected output table? any suggestion on above big query sql?

Hamilton
  • 620
  • 2
  • 14
  • 32
  • SQL can't do this automatically. You can write a stored procedure that queries `INFORMATION_SCHEMA.COLUMNS` to get all the column names and generate all the `SUM()` expressions for them. – Barmar Mar 19 '23 at 04:04
  • 2
    If you have so many similar columns it's probably a bad database schema. You should normalize it so they're in separate rows, not lots of columns. – Barmar Mar 19 '23 at 04:05
  • Is your question about MySQL or Google BigQuery? There's no `sys.columns` in MySQL. Don't spam irrelevant tags. – Barmar Mar 19 '23 at 04:06
  • @Barmar thanks for your heads up. my question is in google bigquery, I was mimicking sql server solution to bigquery that why sys.columns was defined in sql server. How should I do normalization? Could you elaborate your thoughts from first two comments above and do you mind to show possible workaround in bigquery? How should I get my expected outputs in [expected outputs](https://dbfiddle.uk/KYJ9yR0b)? thank you – Hamilton Mar 19 '23 at 04:16
  • 1
    Instead of separate `prod_ab_amt`, `prod_bf_amt`, etc. columns, have two columns `type` and `amt`. `type` contains strings like `prod_ab`, `prod_bf`. Then you can use `SUM(amt) ... GROUP BY user_id, purchase_date, type` – Barmar Mar 19 '23 at 04:19
  • @Barmar will you mind making solution (with comments) that produce my expected output so SO community might get benefits from your input? I am curious how should we do normalization and the points you mentioned above in bigquery standard sql. thank you – Hamilton Mar 19 '23 at 04:23
  • I don't know BigQuery, I can't write an answer for it. I'm suggesting an alternate approach that doesn't require dynamic SQL. – Barmar Mar 19 '23 at 04:29
  • @Barmar I can cope with bigquery sql logic by myself. Could you suggest mysql or postgres or any sql solution that support your points above? – Hamilton Mar 19 '23 at 04:35

1 Answers1

2

Change your table so each type is in its own row, instead of distinct columns.

CREATE TABLE trans (
    user_id varchar(10),
    purchase_date DATE,
    amt_type VARCHAR(10), -- prod_ab, prod_bf, catg_c, etc.
    amt FLOAT
);

Then you can use a query like:

SELECT user_id, purchase_date, amt_type, SUM(amt) AS total_amt
FROM trans
GROUP BY user_id, purchase_date, amt_type

If you want to get all the sums in a single row for a user+date, you can use PIVOT. See How to Pivot table in BigQuery

Barmar
  • 741,623
  • 53
  • 500
  • 612
  • It’s not clear for me how to transform input table, you mentioned do normalization but in the answer seems redefine table schema; I can’t control input table schema. Could you help me with working solution if possible? I stuck in this question a while. – Hamilton Mar 19 '23 at 04:53
  • 1
    That is what normalization is, it's a redefinition of the schema. – Barmar Mar 19 '23 at 21:55
  • thanks, [dbffile](https://dbfiddle.uk/79TYB_UR) – Hamilton Mar 21 '23 at 15:20