2

I have a block of code that I need to optimize as much as possible since I have to run it several thousand times.

What it does is it finds the closest float in a sub-list of a given array for a random float and stores the corresponding float (ie: with the same index) stored in another sub-list of that array. It repeats the process until the sum of floats stored reaches a certain limit.

Here's the MWE to make it clearer:

import numpy as np

# Define array with two sub-lists.
a = [np.random.uniform(0., 100., 10000), np.random.random(10000)]

# Initialize empty final list.
b = []

# Run until the condition is met.
while (sum(b) < 10000):

    # Draw random [0,1) value.
    u = np.random.random()
    # Find closest value in sub-list a[1].
    idx = np.argmin(np.abs(u - a[1]))
    # Store value located in sub-list a[0].
    b.append(a[0][idx])

The code is reasonably simple but I haven't found a way to speed it up. I tried to adapt the great (and very fast) answer given in a similar question I made some time ago, to no avail.

Community
  • 1
  • 1
Gabriel
  • 40,504
  • 73
  • 230
  • 404

4 Answers4

4

OK, here's a slightly left-field suggestion. As I understand it, you are just trying to sample uniformally from the elements in a[0] until you have a list whose sum exceeds some limit.

Although it will be more costly memory-wise, I think you'll probably find it's much faster to generate a large random sample from a[0] first, then take the cumsum and find where it first exceeds your limit.

For example:

import numpy as np

# array of reference float values, equivalent to a[0]
refs = np.random.uniform(0, 100, 10000)

def fast_samp_1(refs, lim=10000, blocksize=10000):

    # sample uniformally from refs
    samp = np.random.choice(refs, size=blocksize, replace=True)
    samp_sum = np.cumsum(samp)

    # find where the cumsum first exceeds your limit
    last = np.searchsorted(samp_sum, lim, side='right')
    return samp[:last + 1]

    # # if it's ok to be just under lim rather than just over then this might
    # # be quicker
    # return samp[samp_sum <= lim]

Of course, if the sum of the sample of blocksize elements is < lim then this will fail to give you a sample whose sum is >= lim. You could check whether this is the case, and append to your sample in a loop if necessary.

def fast_samp_2(refs, lim=10000, blocksize=10000):

    samp = np.random.choice(refs, size=blocksize, replace=True)
    samp_sum = np.cumsum(samp)

    # is the sum of our current block of samples >= lim?
    while samp_sum[-1] < lim:

        # if not, we'll sample another block and try again until it is
        newsamp = np.random.choice(refs, size=blocksize, replace=True)
        samp = np.hstack((samp, newsamp))
        samp_sum = np.hstack((samp_sum, np.cumsum(newsamp) +  samp_sum[-1]))

    last = np.searchsorted(samp_sum, lim, side='right')
    return samp[:last + 1]

Note that concatenating arrays is pretty slow, so it would probably be better to make blocksize large enough to be reasonably sure that the sum of a single block will be >= to your limit, without being excessively large.

Update

I've adapted your original function a little bit so that its syntax more closely resembles mine.

def orig_samp(refs, lim=10000):

    # Initialize empty final list.
    b = []

    a1 = np.random.random(10000)

    # Run until the condition is met.
    while (sum(b) < lim):

        # Draw random [0,1) value.
        u = np.random.random()
        # Find closest value in sub-list a[1].
        idx = np.argmin(np.abs(u - a1))
        # Store value located in sub-list a[0].
        b.append(refs[idx])

    return b

Here's some benchmarking data.

%timeit orig_samp(refs, lim=10000)
# 100 loops, best of 3: 11 ms per loop

%timeit fast_samp_2(refs, lim=10000, blocksize=1000)
# 10000 loops, best of 3: 62.9 µs per loop

That's a good 3 orders of magnitude faster. You can do a bit better by reducing the blocksize a fraction - you basically want it to be comfortably larger than the length of the arrays you're getting out. In this case, you know that on average the output will be about 200 elements long, since the mean of all real numbers between 0 and 100 is 50, and 10000 / 50 = 200.

Update 2

It's easy to get a weighted sample rather than a uniform sample - you can just pass the p= parameter to np.random.choice:

def weighted_fast_samp(refs, weights=None, lim=10000, blocksize=10000):

    samp = np.random.choice(refs, size=blocksize, replace=True, p=weights)
    samp_sum = np.cumsum(samp)

    # is the sum of our current block of samples >= lim?
    while samp_sum[-1] < lim:

        # if not, we'll sample another block and try again until it is
        newsamp = np.random.choice(refs, size=blocksize, replace=True, 
                                   p=weights)
        samp = np.hstack((samp, newsamp))
        samp_sum = np.hstack((samp_sum, np.cumsum(newsamp) +  samp_sum[-1]))

    last = np.searchsorted(samp_sum, lim, side='right')
    return samp[:last + 1]
ali_m
  • 71,714
  • 23
  • 223
  • 298
  • This answer is indeed much faster than my original code @ali_m, but I see that you make no use of the `a[1]` sub-list. I believe the reason for this is that you assume I want to sample `a[0]` _uniformly_ when I actually need to sample it according to `a[1]` which in my real code is a CDF and not simply random numbers. Could you modify your code to make use of `a[1]`? – Gabriel Feb 06 '14 at 13:09
  • I'm getting a `ValueError: probabilities do not sum to 1` probably because I use a CDF, not a probability distribution but I'll try to work around this. One thing is not clear: how do you set the `blocksize` value? It appears that by setting it as a large number I could end up by pure chance with a sample that sums up to a _much_ larger number than `lim`, when I need that value to be as close as possible to it (hence the condition to stop adding elements as soon as the sum passes the `lim` limit in my original code) – Gabriel Feb 06 '14 at 15:56
  • 1
    The PDF is the first derivative of the CDF - you could estimate it by doing something like `np.diff(cdf)`. Don't get too hung up on trying to choose `blocksize` perfectly - you just want it to be sufficiently large that you're unlikely to need to take a second sample. If it's a bit too big there will be some performance penalty because of greater memory allocation etc., but the largest possible sum you could get out will still be <= to `lim` + the largest value in `refs`. Generating a larger sample will still probably be cheaper than generating multiple smaller ones and concatenating them. – ali_m Feb 06 '14 at 17:15
  • Thanks @ali_m, your comments helped me understand your code a lot better. Cheers! – Gabriel Feb 07 '14 at 02:07
0

Sort your reference array.

That allows log(n) lookups instead of needing to browse the whole list. (using bisect for example to find the closest elements)

For starters, I reverse a[0] and a[1] to simplify the sort:

a = np.sort([np.random.random(10000), np.random.uniform(0., 100., 10000)])

Now, a is sorted by order of a[0], meaning if you are looking for the closest value to an arbitrary number, you can start by a bisect:

while (sum(b) < 10000):
    # Draw random [0,1) value.
    u = np.random.random()
    # Find closest value in sub-list a[0].
    idx = bisect.bisect(a[0], u)
    # now, idx can either be idx or idx-1
    if idx is not 0 and np.abs(a[0][idx] - u) > np.abs(a[0][idx - 1] - u):
        idx = idx - 1
    # Store value located in sub-list a[1].
    b.append(a[1][idx])
njzk2
  • 38,969
  • 7
  • 69
  • 107
0

One obvious optimization - don't re-calculate sum on each iteration, accumulate it

b_sum = 0
while b_sum<10000:
    ....
    idx = np.argmin(np.abs(u - a[1]))
    add_val = a[0][idx]
    b.append(add_val)
    b_sum += add_val

EDIT:

I think some minor improvement (check it out if you feel like it) may be achieved by pre-referencing sublists before the loop

a_0 = a[0]
a_1 = a[1]
...
while ...:
    ....
    idx = np.argmin(np.abs(u - a_1))
    b.append(a_0[idx])

It may save some on run time - though I don't believe it will matter that much.

volcano
  • 3,578
  • 21
  • 28
0

Write it in cython. That's going to get you a lot more for a high iteration operation.

http://cython.org/

joel3000
  • 1,249
  • 11
  • 22