0

I got a huge dict adding data in it. I am trying to search if already a key exists in the dict but takes to long when the dictionary grows. how can I get this search in parallel in a multiprocesser system?

 def __getVal(self, key, val):
        ret= 0
        if key in self.mydict:
            ret= val +  self.mydict[key]
        else:
            ret = val
        return  ret
  • 3
    Looking something up in a dictionary should be an O(1) operation unless you have a lot of collisions. How many entries are in the dictionary? Aslo what is `valor` and why are you returning ti instead of `ret`? – kylieCatt May 23 '15 at 19:43
  • 3
    Almost certainly your problem is not search speed, but the cost of making the dictionary larger as you continue to add elements. – Nick Bastin May 23 '15 at 19:47
  • Profile your code before optimizing. – jwilner May 23 '15 at 19:50
  • There was a typo in the return (changed valor for ret). The dict gets slow with 90 million entries. – user3630531 May 23 '15 at 19:57
  • This question might be helpful: http://stackoverflow.com/questions/16256913/improving-performance-of-very-large-dictionary-in-python – dano May 23 '15 at 21:17
  • You are going through the operation of searching for the key twice. Once to see if the key is in the dictionary, once to access it if it is. I'd suggest a try/catch approach, as already suggested below. – Jblasco May 23 '15 at 22:33

3 Answers3

0

Perhaps before trying to split in multiprocess, you should try this:

Instead of looking if the key is in the dictionnary, access it, in a try...catch block.

On my various computer, it's so far faster than looking in the key list.

So your final code would be something like:

try:
    ret = val +  self.mydict[key]
catch:
    ret = val
Alexandre Mazel
  • 2,462
  • 20
  • 26
0

Just use .get with `a default value of 0

 return self.mydict.get(key, 0)  + val

Using ret = 0 and adding to it is pointless, just return as above.

Padraic Cunningham
  • 176,452
  • 29
  • 245
  • 321
0

The problem is how Nick Bastin said, "it is not search speed, but the cost of making the dictionary larger as you continue to add elements".

The cost is caused by the hashmap that creates for a new item. Due the hashmap is a short eventually collision and makes other proccesses to insert.

One solution is recompile the Hashmap to make the hashmap larger.

In this case changing for a List was sufficient, this grows without the inconvenient of the collision.