2

I have two questions:

  1. As my title, which one is more efficient for running large query on large volume of data?

I've looked at MySQL doc which explains about workbench's performance in https://www.mysql.com/products/workbench/performance/ however I cannot seem to find any resource that specifically talks about efficiency difference between running query on cmd and query on workbench.

  1. How to optimize this query

    select
      r.user_id,
      k.id as kickscooter_id,
      st_astext(k.location) as location,
      k.created_at,
      k.serial_number,
      k_st.serial_number as states_serial_number,
      st_astext(k_st.gps) as gps_location,
      k_st.gps_updated_at,
      r.start_time,
      r.end_time
    from kickscooters k
    join rents r
      on k.id= r.kickscooter_id
    join kickscooter_states_190614 k_st
      on k.serial_number = k_st.serial_number
    order by r.rent_date
    limit 999;
    

I've learned that creating an index allows mysql to sort things quickly therefore I've added index by

ALTER TABLE `tablename` ADD INDEX `indexname` (`columnname`);

following answers from one of SO post "order by" taking too much time in mysql

As suggested in comments I've executed

analyze <my query> 

since my server is MariaDB.

which gave me ERROR CODE 2013: LOST connection to server during query.

When I ran

explain <my query>

It worked and outputs:

id    select_type   table   type          possible_keys                                                  
1     SIMPLE        k_st    ALL  kickscooter_states_190614_serial_number_date_index             
1     SIMPLE        k       ref  PRIMARY,kickscooters_serial_number_unique,kickscooters_serial_number_index     
1     SIMPLE        r        ref    rents_kickscooter_id_foreign    

-table continued
/       key                       key_len      ref                        rows         extra

        null                        null       null                      192818947      Using temporary; Using filesort
   kickscooters_serial_number_unique 27     kickgoing_db.k_st.serial_number 1   
    rents_kickscooter_id_foreign     4      kickgoing_db.k.id              143  
haneulkim
  • 4,406
  • 9
  • 38
  • 80
  • 2
    Please run `EXPLAIN ` first and then post the result of this explain plan to the question. – Madhur Bhaiya Oct 07 '19 at 09:14
  • 1
    Is the question "why is my query slow", or is the question "why is it slower on CMD than in workbench?". You seem to suggest they are both slow... – Neville Kuyt Oct 07 '19 at 10:17
  • @MadhurBhaiya I've ran it however it seems to take a while and gives me Error Code:2013: Lost connection – haneulkim Oct 08 '19 at 00:24
  • @NevilleKuyt I have two questions and yes I am suggesting both are taking a long time therefore I cannot measure which one is faster which is what I want to ask. – haneulkim Oct 08 '19 at 00:24
  • @makewhite Replace `JOIN` with `Straight_Join` and add a index on `rent_date` in `rents` table – Madhur Bhaiya Oct 08 '19 at 03:53
  • @MadhurBhaiya rent_date in rents table is already set as an non-unique index with 724 cardinality – haneulkim Oct 08 '19 at 03:56
  • 1
    @makewhite please change the order of joins from table with low cardinality to high. And then specify `Straight_Join` so that optimizer does not change the join order – Madhur Bhaiya Oct 08 '19 at 04:02
  • @MadhurBhaiya Thanks Madhur, could you recommend some resource that cover query optimizations that explains why one is faster than the other? – haneulkim Oct 08 '19 at 04:03
  • @makewhite Did the solution worked ? No official resource(s), but I find these two quite enlightening: mysql.rjweb.org/doc.php/index_cookbook_mysql and http://www.unofficialmysqlguide.com/ – Madhur Bhaiya Oct 08 '19 at 04:59
  • @MadhurBhaiya Nope, I got rid of order by and limit, ordered joins from low to high cardinality however still taking a long time to finish., – haneulkim Oct 08 '19 at 05:05
  • Please provide `SHOW CREATE TABLE` for each of the 3 tables. – Rick James Oct 15 '19 at 00:48

2 Answers2

1

Based on the Explain plan, optimizer is not able to use any index for ORDER BY rent. So try the following:

  1. Ensure that an index exists on the rent_date column of the rents table. This index will be used to optimize the ORDER BY clause. It can be a single column index, or a multi-column one (used in other scenarios). But, in case of multi-column one, you need to ensure that the rent column is the first column in the index order.
  2. Ensure that an index exists on the id column of the kickscooters table. Details about single-column / multi-column index remains the same as in the point #1.
  3. Ensure that an index exists on the serial_number column of the kickscooter_states_190614 table. Details about single-column / multi-column index remains the same as in the point #1.

Now, after ensuring these indexes, try your original query. Most likely, optimizer should be able to optimize the Join Order. Besides, the above query, you can enforce join order by using STRAIGHT_JOIN optimizer hint. So, try the following query as well, and benchmark between the two of them:

select
  r.user_id,
  k.id as kickscooter_id,
  st_astext(k.location) as location,
  k.created_at,
  k.serial_number,
  k_st.serial_number as states_serial_number,
  st_astext(k_st.gps) as gps_location,
  k_st.gps_updated_at,
  r.start_time,
  r.end_time
from kickscooters k
straight_join rents r
  on k.id= r.kickscooter_id
straight_join kickscooter_states_190614 k_st
  on k.serial_number = k_st.serial_number
order by r.rent_date
limit 999;
Madhur Bhaiya
  • 28,155
  • 10
  • 49
  • 57
  • Thanks for your help. however when I run show index from table_name; for all the tables it satisfies all the conditions. rent_date in rents table is non_unique index however its sequence is 1 therefore satisfies your condition. Id in kickscooters table is a primary key, serial_number on kickscooter_states_190614 table is also non_unique with sequence of 1. Maybe it is because rent_date has cardinality of 724 which is very low compared to 2M rows? – haneulkim Oct 08 '19 at 09:09
  • @makewhite check your original query please and my answer again. I am not asking you to index on `rent_date`. but I am asking you to index on `rent` column (the column which is in your `ORDER BY` clause) – Madhur Bhaiya Oct 08 '19 at 09:12
  • Sorry, actually it is supposed to be grouped by on rent_date I wrote it wrong in my query I will fix it. – haneulkim Oct 08 '19 at 09:14
  • @makewhite Please edit the question and add the result for `SHOW CREATE TABLE ` for all the three tables; this will give an idea of indexes. – Madhur Bhaiya Oct 08 '19 at 09:20
  • @MadhurBhaiya - Please show us the `EXPLAIN` rather than talking about it. – Rick James Oct 15 '19 at 00:49
  • @RickJames ?? I think you misunderstood. I am suggesting solution "based on the explain plan" in the original problem statement – Madhur Bhaiya Oct 15 '19 at 03:58
  • @MadhurBhaiya - Alas `EXPLAIN` usually fails to tell you how to improve performance. About all it does is provide statistics on what is going on _with the chosen execution plan_. `EXPLAIN FORMAT=JSON` and `OPTIMIZER TRACE` provide some extra clues, but still not enough to know, for example, what index to add. – Rick James Dec 27 '19 at 01:48
0

There is no WHERE clause, but there is an ORDER BY and LIMIT involving r. So, the Optimizer would like to start with r. But... There seems to be no index starting with serial_number for k_st (or there is a type or collation conflict), so it abandoned r.

Please provide SHOW CREATE TABLE for all 3 tables.

Rick James
  • 135,179
  • 13
  • 127
  • 222