0

So far, I've read this and this links regarding to my issue, but can't quite comprehend what is going on here. Here is the query:

 explain SELECT
    b.fk_gds_busline_commission,
    COUNT(1)
  FROM dw.ft_booking b
  INNER JOIN dw.dm_date d ON b.fk_date_booking = d.sk_date  
  LEFT JOIN dw.dm_gds_busline_commission bl ON bl.sk_gds_busline_commission = b.fk_gds_busline_commission
  LEFT JOIN dw.dm_routes r ON r.sk_route = b.fk_route
  WHERE d.date >= '2018-01-01'
  GROUP BY fk_gds_busline_commission
  ORDER BY COUNT(1) DESC;

Tables information:

ft_booking -> 10M rows

dm_gds_busline_commission -> 500 rows

dm_date -> 10k rows

dm_routes -> 200k rows

The query above results in:

result of original query

There are two main issues here, although I'm only interested in one. The first is that the possible key (which is related to the corresponding column, as suggested by its name) indicated is not being used. About that, there is a lot of discussion here on SO, the cause is very likely to be related to table statistics not updated, and so on. I can handle that. In this situation, I tried, then, to use FORCE INDEX.

explain SELECT
    b.fk_gds_busline_commission,
    COUNT(1)
  FROM dw.ft_booking b FORCE INDEX (fk_date_booking_idx)
  LEFT JOIN dw.dm_gds_busline_commission bl ON bl.sk_gds_busline_commission = b.fk_gds_busline_commission
  INNER JOIN dw.dm_date d ON b.fk_date_booking = d.sk_date  
  LEFT JOIN dw.dm_routes r ON r.sk_route = b.fk_route
  WHERE d.date >= '2018-01-01'
  GROUP BY fk_gds_busline_commission
  ORDER BY COUNT(1) DESC;

This led to the very same result as before. I noticed, however, that, if I remove dm_gds_busline_commission or dm_routes tables, the results change dramatically. The select order is changed, and mysql chooses to use the index (unless I omit the FORCE INDEX clause - in that case, only the SELECT order is changed). Then, for instance, for

explain SELECT
    b.fk_gds_busline_commission,
    COUNT(1)
  FROM dw.ft_booking b FORCE INDEX (fk_date_booking_idx)
  INNER JOIN dw.dm_date d ON b.fk_date_booking = d.sk_date    
  LEFT JOIN dw.dm_routes r ON r.sk_route = b.fk_route
  WHERE d.date >= '2018-01-01'
  GROUP BY fk_gds_busline_commission
  ORDER BY COUNT(1) DESC;

the result is

changed result, now using index

The equivalent happens when I remove dm_routes table from the query.

In order to get all tables I need joined, I've had to remove one of the tables of the join, and then place this select query as a subquery, and, then, join the missing tables to the already aggregated result set. This means, then, I've performed two workarounds (usage of FORCE INDEX, and then the creation of a subquery) in order to bypass the MySQL query optimizer failure.

Comparing results, using my workarounds, the query, reading data from one week ago to now, runs in around 2 seconds; using the poor query optimizer choice, it takes about 6 and a half minutes.

The question, then, is: why is it doing so? Why doesn't it accept the force index condition with 3 tables present in the JOIN clauses, but does with only 2? My MySQL server version is 5.6.34.

EDIT: suggestions applied. Added Link to the create queries of all tables involded.

EDIT2: changed the queries to use not dm_date_booking, which is a view, but dm_date table, which is the actual table from the view data was pulled from.

Lucas Lima
  • 832
  • 11
  • 23
  • 1
    Side note: `GROUP BY 1 ORDER BY 2 DESC;` isn't syntax that you should use.. Use colum names or sql expression dont use column number.. This syntax is hard to read and maintain if your query is large. – Raymond Nijland Apr 17 '18 at 16:51
  • Show `SHOW CREATE TABLE [table]` output for every table involved. – Raymond Nijland Apr 17 '18 at 16:53
  • Try changing `LEFT JOIN dw.dm_date_booking d` into an `INNER JOIN` instead; with the current `WHERE` condition, having it has a left join makes very little sense and could be confusing the query optimizer. – Uueerdo Apr 17 '18 at 17:08
  • Changes suggested made. No results change. Please see edit. I used INNER mainly because, in other MySQL databases, I've already seen queries taking even 10 times longer to run because an INNER JOIN was being used, rathen than a LEFT. – Lucas Lima Apr 17 '18 at 17:45
  • INNER JOINs should generally take less time than LEFT JOINs; but that would be highly dependent on data. – Uueerdo Apr 17 '18 at 18:43
  • Those CREATEs you linked do not reflect the tables involved in the query (ex: `sk_date_booking` is nowhere in them, nor is the `dm_date_booking` table it should be a part of.) – Uueerdo Apr 17 '18 at 18:49
  • Please see edit. The table dm_date_booking was only a view (without which I tested the queries for the same result, just in case) of the table dm_date. – Lucas Lima Apr 18 '18 at 00:02
  • side note - you are left joining to `routes`, but not using anything from it in the select, where or group by clauses. Isn't it redundant (assuming that `fk_gds_busline_commission` is from `ft_booking` or `dm_gds_busline_commission`) – Tomer Shay Apr 18 '18 at 21:35
  • @Tomer kinda. I want to select columns from it, and I did, after my workaround. I just don't feel it was necessary here, since it would only add information which is not directly related to my issue. It will be necessary, but, for the comprehension of this subject, I did not include those columns. – Lucas Lima Apr 19 '18 at 01:19

0 Answers0