2

I am trying to optimize my query therefore join from low-cardinality index to high-cardinality index.

On my table called "rents" I've executed

show index from rents;

which tells me that column_name "rent_date" has cardinality of 724. Rather when I execute:

select
  user_id, 
  kickscooter_id, 
  start_time, 
  rent_date 
from rents
group by rent_date;

It returns 300 rows

From what I understand cardinality of index refers to number of unique values according to http://www.mysqltutorial.org/mysql-index/mysql-show-indexes/

From the resource I've mentioned above it says it gives an estimate. How does it estimate the cardinality and how can mine be off by such large number?

Query that I want to optimize:

select
  r.user_id,
  k.id as kickscooter_id,
  st_astext(k.location) as location,
  k.created_at,
  k.serial_number,
  k_st.serial_number as states_serial_number,
  st_astext(k_st.gps) as gps_location,
  k_st.gps_updated_at,
  r.start_time,
  r.end_time
from kickscooters k 
Straight_Join kickscooter_states_190614 k_st
  on k.serial_number = k_st.serial_number
Straight_Join rents r
  on k.id= r.kickscooter_id

;

when I r.kickscooter_id is set as non_unique index with cardinality of 10395,

k.id is primary key with cardinality 4636

k.serial_number has two index: one as unique and other as non-unique, both have cardinality of 4636.

k_st.serial_number in non-unique and has cardinality of 4162

iminiki
  • 2,549
  • 12
  • 35
  • 45
haneulkim
  • 4,406
  • 9
  • 38
  • 80
  • 1
    You need to show us the query relating to your doubt (assuming there is a query). Note that MySQL, like most other SQL databases, maintains some internal statistics about your database and its tables. It isn't always 100% accurate, but should be accurate most of the time. – Tim Biegeleisen Oct 08 '19 at 04:35
  • 1
    InnoDB statistics is a large topic. One important observation, InnoDB doesn't check every row in the table. InnoDB uses a sampling technique, with some random deep dives to get some sample pages, and then extrapolates statistics for the whole table. There are cases where the samples are not representative of the entire table. Sampling can be influenced by innodb parameters, such as `innodb_stats_persistent_sample_pages` (or `innodb_stats_sample_pages` which depends whether persistent stats is enabled.) (If the table is not InnoDB storage engine, then that's a whole 'nuther ballgame.) – spencer7593 Oct 08 '19 at 04:35
  • Reference: https://dev.mysql.com/doc/refman/8.0/en/innodb-parameters.html#sysvar_innodb_stats_persistent_sample_pages and Reference: https://dev.mysql.com/doc/refman/8.0/en/innodb-performance-optimizer-statistics.html – spencer7593 Oct 08 '19 at 04:39
  • 1
    Forcing a specific join order is only a workaround (as it e.g. can't adept to changed data) for cases where MySQL decided incorrectly, normally, you just add indexes. What did MySQL decide on its own? MySQL might be smarter than you think, and while stats are relevant, incorrect stats are only a problem if they lead to wrong choices. The stats are usually in the right ballpark (to e.g. decide between 300 and 500k). If you need help optimizing (and not just wonder about the stats), add the explain plan (add `explain` infront of your query) with and without `straight_join` and execution times. – Solarflare Oct 08 '19 at 06:34
  • @Solarflare I've added index and also executed explain infront of my query w/o straight_join https://stackoverflow.com/questions/58266389/performance-difference-in-query-between-cmd-and-workbench-mysql/58280546#58280546 here is my previous questions. Thanks! – haneulkim Oct 08 '19 at 07:11
  • (Not) adding `order by ... limit ...` can make a completely different query. As Madhur said in his answer, "Most likely, optimizer should be able to optimize the Join Order." Test that first. Straight join is just in case it doesn't. The forced join there was for a different query (with "order by"), you'd need to reevaluate the best join order here. If you need help with that, please add the explain plan for *this* query (or add the "order by" if you forgot it here) or clarify that your question is about why stats are wrong/how to fix that and not about how to optimize your query. – Solarflare Oct 08 '19 at 08:21
  • Please provide `SHOW CREATE TABLE`, the `EXPLAIN SELECT` -- with `STRAIGHT_JOIN` and with just `JOIN`, and `SHOW INDEXES`. There are too many ambiguities in the prose you provided. – Rick James Dec 27 '19 at 01:36
  • This is not the same as the other SO question -- That had `ORDER BY` and `LIMIT`; this does not. – Rick James Dec 27 '19 at 01:50

0 Answers0