I have a large query in mysql that involves joining multiple tables together. It's too slow, so I've done "explain" and see that it's creating a temporary table, which I suspect of taking most of the execution time. I found some related information:
- The mysql docs describe conditions when a temporary table might be created. ("The server creates temporary tables under conditions such as these..." [Emphasis added])
- This related SO question Using index, using temporary, using filesort - how to fix this?, which provides a link to the doc and applies it in a specific case.
- This related SO question Order of join conditions important? that talks about the order in which joins are evaluated.
My query does not appear to meet any of the conditions listed in the docs #1, in the order that the joins were written by me. By experimentation, however, I find that if I remove my order by
clause, the temporary table is not created. That makes me look at this rule from the doc:
Evaluation of statements that contain an ORDER BY clause and a different GROUP BY clause, or for which the ORDER BY or GROUP BY contains columns from tables other than the first table in the join queue.
This is the same rule that played in the example at #2 above, but in #2, the OP explicitly had columns from multiple tables in the order by
clause, so that's at least superficially different.
Moreover, when I look at the output from explain
, it appears that the table that I listed first is not used first by the optimizer. Putting down a pseudo-query for example:
select * from A
join B on A.c1=B.c1
join C on A.c2=C.c2
where A.c3='value'
order by A.c4
I would say that my order by
clause does use only columns from the "first table in the join queue" based on the order that I've written the query. On the other hand, the output from explain
suggests that it first considers table B then A.
So here are the questions:
- Does the quoted rule above for use of temporary tables refer to the order that I write the tables or the order that the software chooses to evaluate them?
- If it's the order that I write them, does this mean that the order of the joins does impact performance? (Seems to contradict the claims at #3 above.)
- If it's the order that the software chooses to evaluate them, is there any way to coerce or trick it into selecting and order that doesn't use the table?