0

I have the table with JSON-field (example)

# table1

id | json_column
---+------------------------
1  | {'table2_ids':[1,2,3], 'sone_other_data':'foo'}
---+------------------------
2  | {'foo_data':'bar', 'table2_ids':[3,5,11]}

And

# table2

id | title
---+------------------------
1  | title1
---+------------------------
2  | title2
---+------------------------
...
---+------------------------
11 | title11

Yes, I know about stored many-to-many relation in the third table. But it's a duplication data (in first case relations in Json_column, in second in third-table)

I know about generated columns in MySQL, but I don't understand how to use it for stored m2m relations. Maybe I have use views to get pairs of table1.id <-> table2.id. But how use index in this case?

Kas Elvirov
  • 7,394
  • 4
  • 40
  • 62
Dmitry
  • 551
  • 6
  • 19

1 Answers1

0

I can't understand your explanation for why you can't use a third table to represent the many-to-many pairs. Using a third table is of course the best solution.

I think views have no relevance to this problem.

You could use JSON_EXTRACT() to access individual members of the array. You can use a generated column to pull each member out so you can easily reference it as an individual value.

create table table1 (
  id int auto_increment primary key,
  json_column json,
  first_table2_id int as (json_extract(json_column, '$.table2_ids[0]'))
);

insert into table1 set json_column = '{"table2_ids":[1,2,3], "sone_other_data":"foo"}'

(You must use double-quotes inside a JSON string, and single-quotes to delimit the whole JSON string.)

select * from table1;
+----+-----------------------------------------------------+-----------------+
| id | json_column                                         | first_table2_id |
+----+-----------------------------------------------------+-----------------+
|  1 | {"table2_ids": [1, 2, 3], "sone_other_data": "foo"} |               1 |
+----+-----------------------------------------------------+-----------------+

But this is still a problem: In SQL, the table must have the columns defined by the table metadata, and all rows therefore have the same columns. There no such thing as each row populating additional columns based on the data.

So you need to create another extra column for each potential member of the array of table2_ids. If the array has fewer elements than the number of columns, JSON_EXTRACT() will fill in NULL when the expression returns nothing.

alter table table1 add column second_table2_id int as (json_extract(json_column, '$.table2_ids[1]'));
alter table table1 add column third_table2_id int as (json_extract(json_column, '$.table2_ids[2]'));
alter table table1 add column fourth_table2_id int as (json_extract(json_column, '$.table2_ids[3]'));

I'll query using vertical output, so the columns will be easier to read:

select * from table1\G
*************************** 1. row ***************************
              id: 1
     json_column: {"table2_ids": [1, 2, 3], "sone_other_data": "foo"}
 first_table2_id: 1
second_table2_id: 2
 third_table2_id: 3
fourth_table2_id: NULL

This is going to get very awkward. How many columns do you need? That depends on how many table2_ids is the maximum length of the array.

If you need to search for rows in table1 that reference some specific table2 id, which column should you search? Any of the columns may have that value.

select * from table1
where first_table2_id = 2
or second_table2_id = 2
or third_table2_id = 2
or fourth_table2_id = 2;

You could put an index on each of these generated columns, but the optimizer won't use them.

These are some reasons why storing comma-separated lists is a bad idea, even inside a JSON string, if you need to reference individual elements.

The better solution is to use a traditional third table to store the many-to-many data. Each value is stored on its own row, so you don't need many columns or many indexes. You can search one column if you need to look up references to a given value.

select * from table1_table2 where table2_id = 2;
Bill Karwin
  • 538,548
  • 86
  • 673
  • 828
  • `I can't understand your explanation for why you can't use a third table to represent the many-to-many pairs. Using a third table is of course the best solution.` I dont want to use the relation table because the data about relations already exsist in json_column. But it's seems is not possible. Thanks for you response – Dmitry Apr 04 '18 at 08:44
  • By that reasoning, you shouldn't store the "table2_ids" array in the `json_column` either, because the values already exist as the primary keys in `table2.id`. Storing foreign key values is a normal and required part of a relational database. – Bill Karwin Apr 04 '18 at 14:14
  • I use the data in the json_column for other purposes: I storage the serialized text with links (on data in table2) in this column. And now I want to get related data in table2 for each publication. Yes, I know, It's not the best idea. I just try to understand it is possible or not – Dmitry Apr 06 '18 at 12:04