So far, I've read this and this links regarding to my issue, but can't quite comprehend what is going on here. Here is the query:
explain SELECT
b.fk_gds_busline_commission,
COUNT(1)
FROM dw.ft_booking b
INNER JOIN dw.dm_date d ON b.fk_date_booking = d.sk_date
LEFT JOIN dw.dm_gds_busline_commission bl ON bl.sk_gds_busline_commission = b.fk_gds_busline_commission
LEFT JOIN dw.dm_routes r ON r.sk_route = b.fk_route
WHERE d.date >= '2018-01-01'
GROUP BY fk_gds_busline_commission
ORDER BY COUNT(1) DESC;
Tables information:
ft_booking -> 10M rows
dm_gds_busline_commission -> 500 rows
dm_date -> 10k rows
dm_routes -> 200k rows
The query above results in:
There are two main issues here, although I'm only interested in one. The first is that the possible key (which is related to the corresponding column, as suggested by its name) indicated is not being used. About that, there is a lot of discussion here on SO, the cause is very likely to be related to table statistics not updated, and so on. I can handle that. In this situation, I tried, then, to use FORCE INDEX.
explain SELECT
b.fk_gds_busline_commission,
COUNT(1)
FROM dw.ft_booking b FORCE INDEX (fk_date_booking_idx)
LEFT JOIN dw.dm_gds_busline_commission bl ON bl.sk_gds_busline_commission = b.fk_gds_busline_commission
INNER JOIN dw.dm_date d ON b.fk_date_booking = d.sk_date
LEFT JOIN dw.dm_routes r ON r.sk_route = b.fk_route
WHERE d.date >= '2018-01-01'
GROUP BY fk_gds_busline_commission
ORDER BY COUNT(1) DESC;
This led to the very same result as before. I noticed, however, that, if I remove dm_gds_busline_commission or dm_routes tables, the results change dramatically. The select order is changed, and mysql chooses to use the index (unless I omit the FORCE INDEX clause - in that case, only the SELECT order is changed). Then, for instance, for
explain SELECT
b.fk_gds_busline_commission,
COUNT(1)
FROM dw.ft_booking b FORCE INDEX (fk_date_booking_idx)
INNER JOIN dw.dm_date d ON b.fk_date_booking = d.sk_date
LEFT JOIN dw.dm_routes r ON r.sk_route = b.fk_route
WHERE d.date >= '2018-01-01'
GROUP BY fk_gds_busline_commission
ORDER BY COUNT(1) DESC;
the result is
The equivalent happens when I remove dm_routes table from the query.
In order to get all tables I need joined, I've had to remove one of the tables of the join, and then place this select query as a subquery, and, then, join the missing tables to the already aggregated result set. This means, then, I've performed two workarounds (usage of FORCE INDEX, and then the creation of a subquery) in order to bypass the MySQL query optimizer failure.
Comparing results, using my workarounds, the query, reading data from one week ago to now, runs in around 2 seconds; using the poor query optimizer choice, it takes about 6 and a half minutes.
The question, then, is: why is it doing so? Why doesn't it accept the force index condition with 3 tables present in the JOIN clauses, but does with only 2? My MySQL server version is 5.6.34.
EDIT: suggestions applied. Added Link to the create queries of all tables involded.
EDIT2: changed the queries to use not dm_date_booking, which is a view, but dm_date table, which is the actual table from the view data was pulled from.