Redis SCAN - optimization possibilites?

Question

I am quite new to Redis and initially have used KEYS to iterate through my dataset, but from what I can read in the documents Redis worst practices , it's actually not recommended - especially in bigger datasets containing many keys, since KEYS iterate through the whole dataset, blocking for a long time, while SCAN iterate through chunks of data from the dataset and thereby only blocking in less time than KEYS. If that is understood correctly, I am wondering if there is any way to optimize the SCAN iteration, so that instead of iterating randomly (let's say) 10.000 datas, it would iterate from a given point.

Example:

a1
a2
a3
b1 < --- start iterating from here instead of from a1
b2
b3

and that way save "us" a lot of performance?

You will need a bit of a mental reframing to tackle this. Instead of scanning natural keys, it's actually better to use meaningless random or sequential values for keys. In order to do smart lookups, you instead should use lexicographical sorted sets that act like btree indexes in relational dbs. Your natural keys should become values in those indexes, with id appended at the end. If you read [this article](https://redis.io/topics/indexes) very carefully, esp. from "Adding auxiliary information in the index" part, it will become revelatory, just trust that it addresses your use case. — Max Chernyak, Jun 13 '20 at 06:33

score 0 · Accepted Answer · answered Mar 25 '20 at 04:25

The SCAN commands traverses the database hash map table, ordered by hash value.

You control where the SCAN starts with the cursor argument, but at best you can control where does it start in the hash-ordered hashmap table. See Redis `SCAN`: how to maintain a balance between newcomming keys that might match and ensure eventual result in a reasonable time?.

But this is rather impractical, because the hashes for the keys can be considered pseudo-random respect to the key itself. It is not like they follow a lexicographical order or any logical useful order of any kind. The hash purpose is exactly to evenly distribute the keys in the hashtable.

So, even if you try SCAN 0 MATCH b1*, the implementation still need to go through all the entries of the hash table to make a full scan, and hence you need to call SCAN multiple times until the returned cursor is back to zero.

Redis SCAN - optimization possibilites?

1 Answers1