1

How would you approach this via SQL? Let's take this example

| id | type | score_a | score_b | score_c | label_a | label_b | label_c |
|----|------|---------|---------|---------|---------|---------|---------|
| 1  | A    | 0.9     |         |         | L1      |         |         |
| 1  | B    |         | 0.7     |         |         | L2      |         |
| 1  | B    |         | 0.2     |         |         | L3      |         |
| 1  | C    |         |         | 0.2     |         |         | L4      |
| 1  | C    |         |         | 0.18    |         |         | L5      |
| 1  | C    |         |         | 0.12    |         |         | L6      |
| 2  | A    | 0.6     |         |         | L1      |         |         |
| 2  | A    | 0.3     |         |         | L2      |         |         |

I want to return the max score per type in conjunction with the label_X, Almost, like a pivot table but with these custom column names. So the outcome of the above will be like:

| id | type | score_a | label_a | score_b | label_b | score_c | label_c |
|----|------|---------|---------|---------|---------|---------|---------|
| 1  | A    | 0.9     | L1      | 0.7     | L2      | 0.2     | L4      |
| 2  | A    | 0.6     | L1      | NULL    | NULL    | NULL    | NULL    |

Something like this is wrong as it yields both results per type per label

SELECT id,
  MAX(score_a) as score_a,
  label_a,
  MAX(score_b) as score_b,
  label_b as label_b,
  MAX(score_c) as score_c,
  label_c
FROM sample_table 
GROUP BY id, label_a, label_b, label_c

Is there an easy way to do this via SQL, I'm doing it right now from BigQuery and tried also pivot table as described here but still no luck on how to flatten these into one big row with several columns

Any other ideas?

UPDATE

Expanding on what BGM mentioned about design; the source of this data is a table with the following form:

| id | type | label | score |
|----|------|-------|-------|
| 1  | A    | L1    | 0.9   |
| 1  | B    | L2    | 0.7   |
| 1  | B    | L3    | 0.2   |
| 1  | C    | L4    | 0.6   |
| 1  | C    | L5    | 0.2   |

That gets converted to a flattened state as depicted at the top of this question using a query like

 SELECT id,
      type,
      MAX(CASE WHEN type = 'A' THEN score ELSE 0 END) as score_a,
      MAX(CASE WHEN type = 'B' THEN score ELSE 0 END) as score_b,
      MAX(CASE WHEN type = 'C' THEN score ELSE 0 END) as score_c,
      MAX(CASE WHEN model_type = 'theme' THEN label_score ELSE 0 END) as 
      -- labels
      (CASE WHEN type = 'A' THEN label ELSE '' END) as label_a,
      (CASE WHEN type = 'B' THEN label ELSE '' END) as label_b,
      (CASE WHEN type = 'C' THEN label ELSE '' END) as label_c,
    FROM table
    GROUP id, label_a, label_b, label_c

Do you think the intermediate step is unnecessary to get to the final solution?

maverick
  • 2,185
  • 2
  • 16
  • 22

1 Answers1

1

You can do conditional aggregation. In Big Query, arrays come handy for this:

select
    id,
    max(score_a) score_a,
    array_agg(label_a order by score_a desc limit 1)[offset(0)] label_a,
    max(score_b) score_b,
    array_agg(label_b order by score_b desc limit 1)[offset(0)] label_b,
    max(score_c) score_c,
    array_agg(label_c order by score_c desc limit 1)[offset(0)] label_c
from mytable
group by id

Note: in terms of design, you should not have multiple columns to store the scores and labels per types; you already have a column that represents the types, so you should have just two columns for the store and type.

GMB
  • 216,147
  • 25
  • 84
  • 135
  • yes, we do have a table that stores the scores and labels per types, what I pasted on this question is a projection of that table that expands on top over that. Do you think using that as a base instead of this expanded version somehow eases the queries? – maverick Jul 29 '20 at 23:31
  • @maerick . . . You might ask a *new* question with the data that you are are starting with. It is probably simpler to use. – Gordon Linoff Jul 30 '20 at 00:41