2

Given a string column with a value similar to /123/12/34/56/5/, what is the optimal way of querying for all the records that include the given number (12 for example)?

The solution from top of my head is:

SELECT id FROM things WHERE things.path LIKE '%/12/%'

But AFAIK this query can't use indexes on the column due to the leading %.

There must be something better. What is it?

Using PostgreSQL, but would prefer the solution that would work across other DBs too.

Dmytrii Nagirniak
  • 23,696
  • 13
  • 75
  • 130
  • 2
    It would be better not to have a multivalued field :) – Mosty Mostacho Apr 16 '12 at 03:49
  • I can go with additional table 'thing_paths' with path in it. Then join it and query it ~ `select DISTINCT things.id from things inner join thing_paths on thing_path.thing_id = things.id WHERE thing.path LIKE '/12/%'`. But at this stage it is more than I ideally want to do. – Dmytrii Nagirniak Apr 16 '12 at 03:56

2 Answers2

4

If you're happy turning that column into an array of integers, like:

'/123/12/34/56/5/' becomes ARRAY[123,12,34,56,5]

So that path_arr is a column of type INTEGER[], then you can create a GIN index on that column:

CREATE INDEX ON things USING gin(path_arr);

A query for all items containing 12 then becomes:

SELECT * FROM things WHERE ARRAY[12] <@ path_arr;

Which will use the index. In my test (with a million rows), I get plans like:

EXPLAIN SELECT * FROM things WHERE ARRAY[12]  <@ path_arr;
                                      QUERY PLAN
----------------------------------------------------------------------------------------
 Bitmap Heap Scan on things  (cost=5915.75..9216.99 rows=1000 width=92)
   Recheck Cond: (path_arr <@ '{12}'::integer[])
   ->  Bitmap Index Scan on things_path_arr_idx  (cost=0.00..5915.50 rows=1000 width=0)
         Index Cond: ('{12}'::integer[] <@ path_arr)
(4 rows)
Edmund
  • 10,533
  • 3
  • 39
  • 57
  • Thanks. That's another option. But AFAIK it is technically the same as `LIKE` + GIN index as Erwin suggested, just expressed differently. – Dmytrii Nagirniak Apr 16 '12 at 07:03
  • It is another GIN index. But it's one that more closely corresponds to the actual data. In principle the index will be more efficient because it won't be recording items containing "27/" and "1/2", etc. – Edmund Apr 16 '12 at 07:06
  • Edmund, I accepted Erwin's answer simply because it was simpler (didn't have to change any code yet). But I wish I could accept 2 answers since your solution is more elegant. – Dmytrii Nagirniak Apr 18 '12 at 03:04
  • There is often a tradeoff between pragmatism and elegance. And the syntax for indexes over lists doesn't help, so I might have picked Erwin's answer too (tbh). Thanks for your comment. – Edmund Apr 18 '12 at 03:27
3

In PostgreSQL 9.1 you could utilize the pg_trgm module and build a GIN index with it.

CREATE EXTENSION pg_trgm; -- once per database

CREATE INDEX things_path_trgm_gin_idx ON things USING gin (path gin_trgm_ops);

Your LIKE expression can use this index even if it is not left-anchored.

See a detailed demo by depesz here.

Normalize it If you can, though.

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228