1

A table raw_data has an index ix_raw_data_timestamp:

CREATE TABLE IF NOT EXISTS public.raw_data
(
    ts timestamp without time zone NOT NULL,
    log_msg character varying COLLATE pg_catalog."default",
    log_image bytea
)
CREATE INDEX IF NOT EXISTS ix_raw_data_timestamp
    ON public.raw_data USING btree
    (ts ASC NULLS LAST)
    TABLESPACE pg_default;

For some reason the index is not used for the following query (and therefore is very slow):

SELECT ts,
    log_msg
FROM raw_data
ORDER BY ts ASC
LIMIT 5e6;

The result of EXPLAIN (analyze, buffers, format text) for the query above:

                                                                       QUERY PLAN                                                                       
--------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=9752787.07..10336161.14 rows=5000000 width=50) (actual time=789124.600..859046.614 rows=5000000 loops=1)
   Buffers: shared hit=12234 read=888521, temp read=2039471 written=2664654
   ->  Gather Merge  (cost=9752787.07..18421031.89 rows=74294054 width=50) (actual time=789085.442..822547.099 rows=5000000 loops=1)
         Workers Planned: 2
         Workers Launched: 2
         Buffers: shared hit=12234 read=888521, temp read=2039471 written=2664654
         ->  Sort  (cost=9751787.05..9844654.62 rows=37147027 width=50) (actual time=788203.880..795491.054 rows=1667070 loops=3)
               Sort Key: "ts"
               Sort Method: external merge  Disk: 1758904kB
               Worker 0:  Sort Method: external merge  Disk: 1762872kB
               Worker 1:  Sort Method: external merge  Disk: 1756216kB
               Buffers: shared hit=12234 read=888521, temp read=2039471 written=2664654
               ->  Parallel Seq Scan on raw_data  (cost=0.00..1272131.27 rows=37147027 width=50) (actual time=25.436..119352.861 rows=29717641 loops=3)
                     Buffers: shared hit=12141 read=888520
 Planning Time: 5.240 ms
 JIT:
   Functions: 7
   Options: Inlining true, Optimization true, Expressions true, Deforming true
   Timing: Generation 0.578 ms, Inlining 76.678 ms, Optimization 24.578 ms, Emission 13.060 ms, Total 114.894 ms
 Execution Time: 877489.531 ms
(20 rows)

But it is used for this one:

SELECT ts,
    log_msg
FROM raw_data
ORDER BY ts ASC
LIMIT 4e6;

EXPLAIN (analyze, buffers, format text) of the query above is:

                                                                           QUERY PLAN                                                                           
----------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.57..9408157.15 rows=4000000 width=50) (actual time=15.081..44747.127 rows=4000000 loops=1)
   Buffers: shared hit=24775 read=61155
   ->  Index Scan using ix_raw_data_timestamp on raw_data  (cost=0.57..209691026.73 rows=89152864 width=50) (actual time=2.218..16077.755 rows=4000000 loops=1)
         Buffers: shared hit=24775 read=61155
 Planning Time: 1.306 ms
 JIT:
   Functions: 3
   Options: Inlining true, Optimization true, Expressions true, Deforming true
   Timing: Generation 0.406 ms, Inlining 1.121 ms, Optimization 7.917 ms, Emission 3.721 ms, Total 13.165 ms
 Execution Time: 59028.951 ms
(10 rows)

Needless to say that the aim is to get all queries to use the index no matter the size, but I cannot seem to find a solution.

PS:

  • There's about 89152922 rows in the database.

Edit:

After increasing the memory to 2G (SET work_mem = '2GB';), the query is a little faster (doesn't use disk anymore) but still nowhere as fast:

                                                                       QUERY PLAN                                                                       
--------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=5592250.54..6175624.61 rows=5000000 width=50) (actual time=215887.445..282393.743 rows=5000000 loops=1)
   Buffers: shared hit=12224 read=888531
   ->  Gather Merge  (cost=5592250.54..14260731.75 rows=74296080 width=50) (actual time=215874.072..247030.062 rows=5000000 loops=1)
         Workers Planned: 2
         Workers Launched: 2
         Buffers: shared hit=12224 read=888531
         ->  Sort  (cost=5591250.52..5684120.62 rows=37148040 width=50) (actual time=215854.323..221828.921 rows=1667147 loops=3)
               Sort Key: "ts"
               Sort Method: top-N heapsort  Memory: 924472kB
               Worker 0:  Sort Method: top-N heapsort  Memory: 924379kB
               Worker 1:  Sort Method: top-N heapsort  Memory: 924281kB
               Buffers: shared hit=12224 read=888531
               ->  Parallel Seq Scan on raw_data  (cost=0.00..1272141.40 rows=37148040 width=50) (actual time=25.899..107034.903 rows=29717641 loops=3)
                     Buffers: shared hit=12130 read=888531
 Planning Time: 0.058 ms
 JIT:
   Functions: 7
   Options: Inlining true, Optimization true, Expressions true, Deforming true
   Timing: Generation 0.642 ms, Inlining 53.860 ms, Optimization 23.848 ms, Emission 11.768 ms, Total 90.119 ms
 Execution Time: 300281.654 ms
(20 rows)

MSIS
  • 302
  • 2
  • 13
  • 3
    Please **[edit]** your question and add the [execution plan](https://www.postgresql.org/docs/current/static/using-explain.html) generated using **`explain (analyze, buffers, format text)`** (_not_ just a "simple" explain) as [formatted text](http://stackoverflow.com/help/formatting) and make sure you preserve the indention of the plan. Paste the text, then put `\`\`\`` on the line before the plan and on a line after the plan. –  Nov 24 '21 at 19:19
  • @jjanes - I'm trying to have it use the index so it is faster, which is the case for the `4e6` query. – MSIS Nov 24 '21 at 19:30
  • You can force it to use the index by doing `set enable_sort=off`. But that is not a good permanent solution, so then show the ` explain (analyze, buffers, format text)` for the query run both ways, so we can advice on how to make it make better decisions on its own. Turn track_io_timing to on first, if it is not already on. – jjanes Nov 24 '21 at 19:34
  • 1
    Have you tried [including](https://www.postgresql.org/docs/14/sql-createindex.html#:~:text=EXISTS%20is%20specified.-,INCLUDE,included%20in%20upper%2Dlevel%20index%20entries%20used%20for%20tree%20navigation.,-name) `log_msg` as a non-key index column? This could do something if the messages are small/short enough, but also slow things down otherwise. – Zegarek Nov 24 '21 at 19:40
  • Why do you think that using the index is always (!) faster? Without the results from the explain, there is just the assumption. – Frank Heikens Nov 24 '21 at 19:40
  • @a_horse_with_no_name , @jjanes , @Frank Heikens - I'm still waiting for the `EXPLAIN` on the `5e6` to complete... – MSIS Nov 24 '21 at 19:43
  • @FrankHeikens I guess "EXPLAIN clearly shows that for the first query goes: Gather Merge -> Sort -> Seq Scan, while the second query only uses Index Scan." can easily lead to that assumption in this case. I don't see anyone suggesting it's always faster, but those who do, do deserve your exclamation mark :) – Zegarek Nov 24 '21 at 19:53
  • @MSIS If this is the most commonly issued query and/or leveraged index against this table, a periodic [CLUSTER](https://www.postgresql.org/docs/14/sql-cluster.html) and subsequent `ANALYZE` could also help improve the situation, especially at this size. However, if the [non-key column inclusion](https://www.postgresql.org/docs/14/sql-createindex.html#:~:text=EXISTS%20is%20specified.-,INCLUDE,included%20in%20upper%2Dlevel%20index%20entries%20used%20for%20tree%20navigation.,-name) of `log_msg` works, `cluster` shouldn't be necessary in this case, as planner won't ever reach into the actual table – Zegarek Nov 24 '21 at 20:08
  • @Zegarek, I already have run a `CLUSTER` on the table and `ANALYZE`. What is small/short enough for the `log_msg`? – MSIS Nov 24 '21 at 20:14
  • Could you redo the sorting plan with max_parallel_workers_per_gather=0 and add that? Non-parallel plans are easier to think about, plus I would like to see if you actually get any benefit from the parallel operations – jjanes Nov 24 '21 at 20:21
  • @MSIS As long as it's [below the limit](https://stackoverflow.com/q/57945442/5298879) it's a philosophical question. – Zegarek Nov 24 '21 at 20:22
  • Can you give a brief summary of your hardware, and any changes you have already made to the cost settings? Also, what is work_mem? – jjanes Nov 24 '21 at 20:25

1 Answers1

1

The problem here is you're going to PARALLEL SEQ SCAN and GATHER_MERGE. The gather merge is taking in 74,294,054 rows to output 5,000,000. Which makes sense, because you're saying there are 89,152,922 rows in the DB, and you have no conditional for which to limit them.

Why would it choose this plan, probably because it is forcing materialization because you're over work_mem. So increase your work_mem. If PostgreSQL thinks it can fit all this in memory and that it doesn't have to do this on disk then it will move massively faster.

Evan Carroll
  • 78,363
  • 46
  • 261
  • 468
  • 1
    Then does `work_mem` need to be set to something greater than `external merge Disk: 1758904kB` to have the best results? – MSIS Nov 29 '21 at 15:20
  • It may have to be set for more than that even. I'm not sure if that number is the swap space on the query (ie, after it has exhausted work_mem) or the total space including work_mem. But you can crank up your work_mem to an obscene number to figure out if that fixes it, and play with it from there. – Evan Carroll Nov 29 '21 at 16:24
  • I've updated the question to include the new setting for `work_mem`. However, it is still doing a Gather->Sort, and not an Index Scan. Is there another setting? – MSIS Nov 30 '21 at 00:51