Multiprocessing search in a dict in python

Question

I got a huge dict adding data in it. I am trying to search if already a key exists in the dict but takes to long when the dictionary grows. how can I get this search in parallel in a multiprocesser system?

 def __getVal(self, key, val):
        ret= 0
        if key in self.mydict:
            ret= val +  self.mydict[key]
        else:
            ret = val
        return  ret

Looking something up in a dictionary should be an O(1) operation unless you have a lot of collisions. How many entries are in the dictionary? Aslo what is `valor` and why are you returning ti instead of `ret`? — kylieCatt, May 23 '15 at 19:43
Almost certainly your problem is not search speed, but the cost of making the dictionary larger as you continue to add elements. — Nick Bastin, May 23 '15 at 19:47
There was a typo in the return (changed valor for ret). The dict gets slow with 90 million entries. — user3630531, May 23 '15 at 19:57
This question might be helpful: http://stackoverflow.com/questions/16256913/improving-performance-of-very-large-dictionary-in-python — dano, May 23 '15 at 21:17
You are going through the operation of searching for the key twice. Once to see if the key is in the dictionary, once to access it if it is. I'd suggest a try/catch approach, as already suggested below. — Jblasco, May 23 '15 at 22:33

score 0 · Answer 1 · answered May 23 '15 at 20:00

Perhaps before trying to split in multiprocess, you should try this:

Instead of looking if the key is in the dictionnary, access it, in a try...catch block.

On my various computer, it's so far faster than looking in the key list.

So your final code would be something like:

try:
    ret = val +  self.mydict[key]
catch:
    ret = val

score 0 · Answer 2 · answered May 23 '15 at 20:33

0

Just use .get with `a default value of 0

 return self.mydict.get(key, 0)  + val

Using ret = 0 and adding to it is pointless, just return as above.

answered May 23 '15 at 20:33

Padraic Cunningham

176,452
29
245
321

score 0 · Answer 3 · answered Oct 30 '15 at 15:06

The problem is how Nick Bastin said, "it is not search speed, but the cost of making the dictionary larger as you continue to add elements".

The cost is caused by the hashmap that creates for a new item. Due the hashmap is a short eventually collision and makes other proccesses to insert.

One solution is recompile the Hashmap to make the hashmap larger.

In this case changing for a List was sufficient, this grows without the inconvenient of the collision.

Multiprocessing search in a dict in python

3 Answers3