Dictionary Optimization

Question

I am trying to create a q-table in a dictionary for an AI I am trying to make but when trying to make the dictionary after about 40,000,000 possible positions are inputted into the q-table (dicitonary) the process starts to really slow down and at about 80,000,000 and is going as slow as a snail (about 18 hours to get to 80,000,000) and seems to keep slowing down. I would like to know if there would be a way to optimize my dictionary or my code in some way to speed this process up because as this rate it is going to take a year to finish the creation of the q-table (about 160,000,000 positions on the q-table).

Here is my code if it helps:

start_q_table = None

if start_q_table is None:
    q_table = {}
    # All possible height differences between the bird and the bottom pipe
    for i in range(-display_height, display_height):
               #     ^^^ = -800         ^^^ = 800
        # All possible distances between the bird and the end of the nearest pipe
        for ii in range(-bird_size, display_height + pipe_distance):
                     #    ^^^ = 15     ^^^ = ~ 1000 total
            # Bird speed
            for iii in speed_range:
              #           ^^^ = range(1000)
                q_table[(i, ii, iii)] = [np.random.uniform(-1, 0) for i in range(3)]

some ideas here: https://stackoverflow.com/questions/16256913/improving-performance-of-very-large-dictionary-in-python — Joe, Oct 08 '19 at 02:12
It seems 4D array. I think you should use at list `list` or `ndarray`, and threading...? — ghchoi, Oct 08 '19 at 02:13
`np.random.unifor(-1, 0, size=3)` would probably be slightly faster if you are ok with dictionay values being ndarrays — Matt Eding, Oct 08 '19 at 02:13
GyuHyeon Choi: I would but it would be easier to do this and would help me a lot long term so I wouldn't have to worry about lists every time. — , Oct 08 '19 at 02:14
Also you could consider using Cython or Numba to make the loop iteration really fast (or bypass the GIL) if you need to use a dictionary to store the values. — Matt Eding, Oct 08 '19 at 02:19
Dictionary is hash table. You need quite a lot of memory for it to randomly access fast enough... If you do not have enough memory, I think you can try database... — ghchoi, Oct 08 '19 at 02:45
You are definitely going to want to use a database for this. A `dict` of that size is going to be slow to initialize and slow to work with, unless you've got access to some very high end equipment. — mypetlion, Oct 09 '19 at 23:46

Pi Marillion · Accepted Answer · 2019-10-09T23:22:02.300

2

Even if you were only storing the values (64 bits each), you'd be topping out close to 40 GB of RAM usage for a 1600 * 1000 * 1000 * 3 array. Adding in the overhead from the dict means you're almost certainly running out of RAM.

Check to see if your page file is going up (visible from Ctrl + Alt + Del on Windows, Activity Monitor on Mac, or the free command on Linux).

Technically, you can just increase your memory to compensate, but you might need a lot.

Here's an example on my machine:

import numpy
v = numpy.zeros([1600, 1000, 1000, 3], dtype='float32')
for i in xrange(1600):
    v[i, :, :, :] = numpy.random.uniform([1000, 1000, 3])

That took 10.4 seconds and about 19 GB of RAM on my system (which has 40 GB of RAM and 3.6 GHz CPU).

edited Oct 09 '19 at 23:22

answered Oct 08 '19 at 03:01

Pi Marillion

4,465
1
19
20

So my best bet would be to just use a list for data of this size or would I run into this problem with a list also? – Oct 08 '19 at 04:58
@LeonShams-Schaal You would probably run into issues with any data type except possibly a Numpy array... In fact, I would suggest a numpy array of 32-bit floats dimensioned to 1600*1000*1000*3 or thereabouts. It would only take up about 19 or 20 GB of RAM that way. – Pi Marillion Oct 08 '19 at 23:31
Would it work better if I used numpy.random.uniform() then used size to create a list of such great size because I know the size that it needs to be? – Oct 09 '19 at 05:22
Would I even be able to create an array of that size? – Oct 09 '19 at 05:25
@LeonShams-Schaal `numpy.random.uniform()` is a good idea. I'll update the answer. – Pi Marillion Oct 09 '19 at 23:17

Dictionary Optimization

1 Answers1