Insert data and fill columns with rank

Question

original data: orginal_table

MID STATE CALL_TIME         RECORD_RANK
a   1    2020-12-18 09:00:00        1
a   2    2020-12-19 09:00:00        2
b   1    2020-12-18 09:00:02        1
c   1    2020-12-18 09:00:03        1
c   1    2020-12-19 09:00:03        2
c   1    2020-12-20 09:00:03        3
d   1    2020-12-19 09:00:00        1

The data I wanted to insert: insert_table

   MID  STATE      CALL_TIME  
   a    2     2020-12-30 09:00:00      
   b    2     2020-12-19 09:00:02   
   c    1     2020-12-21 09:00:03 
   e    1     2020-12-30 09:00:00 
   f    1     2020-12-30 09:00:00 
   f    2     2020-12-31 09:00:00

Goal

The original data will be inserted from the second data.
For original and inserted data, the pair MID and CALL_TIME is unique.
There is no RECORD_RANK column in the inserted data but RECORD_RANK will be calculated based on MID and CALL_TIME columns when inserted. When duplicated MID with different CALL_TIME, the value of RECORD_RANK with MID will be added by 1. The initial value is 1.
The earliest row in in insert_table is always later than the latest row in orginal_table with the same MID.

The expected example result as below:

MID  STATE      CALL_TIME         RECORD_RANK
a    1    2020-12-18 09:00:00        1
a    2    2020-12-19 09:00:00        2
b    1    2020-12-18 09:00:02        1
c    1    2020-12-18 09:00:03        1
c    1    2020-12-19 09:00:03        2
c    1    2020-12-20 09:00:03        3
d    1    2020-12-19 09:00:00        1
a    2    2020-12-30 09:00:00        3
b    2    2020-12-19 09:00:02        2  
c    1    2020-12-21 09:00:03        4
e    1    2020-12-30 09:00:00        1
f    1    2020-12-30 09:00:00        1 
f    2    2020-12-31 09:00:00        2

Note

mysql version: 5.5.47-log

If call_times are always increasing, I see no need to store record_rank. Just calculate it when you need it — Strawberry, Dec 21 '20 at 08:52
Incidentally, by convention, the term 'id' implies a surrogate PRIMARY KEY. That's not the case here so it might be an idea to rename that column. — Strawberry, Dec 21 '20 at 08:54
Does the ealiest row in `insert_table` is always later than the latest row in `orginal_table` with the same `MID`? i.e. does the `RECORD_RANK` values already present in `orginal_table` wil be recalculated never? — Akina, Dec 21 '20 at 09:10
@Akina yes, the earliest row in in insert_table is always later than the latest row in orginal_table with the same MID. — Jack, Dec 21 '20 at 23:58
Perhaps more importantly, Akina's trigger provides a way to set `MID` for _future_ `INSERTs`. (It does not take care of the rank changing due to deletes or updates.) — Rick James, Dec 23 '20 at 17:53

Akina · Answer 1 · 2020-12-21T09:55:40.510

3

If the ealiest row in insert_table is always later than the latest row in orginal_table with the same MID then you may use BEFORE INSERT trigger:

CREATE TRIGGER tr_bi_original
BEFORE INSERT
ON orginal_table
FOR EACH ROW
SET NEW.RECORD_RANK = (SELECT COALESCE(COUNT(*), 0) + 1
                       FROM orginal_table
                       WHERE NEW.MID = orginal_table.MID)

After trigger creation you may simply add new rows by

INSERT INTO orginal_table
SELECT *, NULL FROM insert_table;

New values for RECORD_RANK will be added by the trigger.

fiddle

If the rows from both tables will mix and RECORD_RANK for some rows which are already present in orginal_table must be altered then the operation cannot be performed using one query (because both insert for new rows and update for existing ones needed). In this case I recommend to insert the rows with any (NULL) value for RECORD_RANK column, then recalculate the column value for all rows in a table.

edited Dec 21 '20 at 09:55

answered Dec 21 '20 at 09:16

Akina

39,301
5
14
25

I guess we just see it differently – Strawberry Dec 21 '20 at 10:00
Why? The OP will be operating under the principle of confirmation bias. – Strawberry Dec 21 '20 at 10:48
@Akina is it necessary to use trigger? – Jack Dec 21 '20 at 23:59
2

@jack No, of course, you may use any other method. But its usage simplifies the process, it seems. – Akina Dec 22 '20 at 09:03
@Akina - Your cleaver technique might be improved by changing `COUNT(*)` to `MAX(mid)`? – Rick James Dec 23 '20 at 17:55
@RickJames ??? ... fiddle is avaliable - you may try. – Akina Dec 23 '20 at 18:50
@Akina "INSERT INTO orginal_table" without (column1, column2, column3, ...) seems risky if I just want to insert specific columns like GMB 's script . And I do not know your script because I do not know how to use trigger. Would you mind explaining it? – Jack Dec 30 '20 at 08:45
@Jack *seems risky* If your data for insertion may be wrong (may have wrong structure) then add columns list - no problems. *I do not know how to use trigger.* The fiddle demanstrates this completely. Please do not ask general question, point to the place which is not clear for you. – Akina Dec 30 '20 at 08:53

score 1 · Accepted Answer · answered Dec 25 '20 at 19:41

I think it is possible to handle the logic in a single insert, even in MySQL 5.x.

The target rank is the number of rows that already exists in the target table for the same mid, plus the number of mid rows in the source table prior to the current row. You can compute that with correlated subqueries:

insert into orginal_table (mid, state, call_time, record_rank)
select mid, state, call_time,
    1 + (
        select count(*)
        from orginal_table o
        where o.mid = i.mid
    ) + (
        select count(*) 
        from insert_table i1 
        where i1.mid = i.mid and i1.call_time < i.call_time
    ) as record_rank
from insert_table i

This assumes that all new rows are more recent that existing rows, as mentioned in your question. But if you want otherwise, that's an easy fix to the first subquery:

(
    select count(*)
    from orginal_table o
    where o.mid = i.mid and o.call_time < i.call_time
)

Here is a demo based on the nice test case built by Akina.

Side note: in MySQL 8.0, we would use a window function instead of the second subquery, which would make the query much more efficient:

insert into orginal_table (mid, state, call_time, record_rank)
select mid, state, call_time,
    row_number() over(partition by mid order by call_time) 
    + (
        select count(*)
        from orginal_table o
        where o.mid = i.mid
    ) as record_rank
from insert_table i

Insert data and fill columns with rank

2 Answers2