Speed up loop to fill an array with closest values from another array

Question

I have a block of code that I need to optimize as much as possible since I have to run it several thousand times.

What it does is it finds the closest float in a sub-list of a given array for a random float and stores the corresponding float (ie: with the same index) stored in another sub-list of that array. It repeats the process until the sum of floats stored reaches a certain limit.

Here's the MWE to make it clearer:

import numpy as np

# Define array with two sub-lists.
a = [np.random.uniform(0., 100., 10000), np.random.random(10000)]

# Initialize empty final list.
b = []

# Run until the condition is met.
while (sum(b) < 10000):

    # Draw random [0,1) value.
    u = np.random.random()
    # Find closest value in sub-list a[1].
    idx = np.argmin(np.abs(u - a[1]))
    # Store value located in sub-list a[0].
    b.append(a[0][idx])

The code is reasonably simple but I haven't found a way to speed it up. I tried to adapt the great (and very fast) answer given in a similar question I made some time ago, to no avail.

ali_m · Accepted Answer · 2014-02-06T13:34:29.187

OK, here's a slightly left-field suggestion. As I understand it, you are just trying to sample uniformally from the elements in a[0] until you have a list whose sum exceeds some limit.

Although it will be more costly memory-wise, I think you'll probably find it's much faster to generate a large random sample from a[0] first, then take the cumsum and find where it first exceeds your limit.

For example:

import numpy as np

# array of reference float values, equivalent to a[0]
refs = np.random.uniform(0, 100, 10000)

def fast_samp_1(refs, lim=10000, blocksize=10000):

    # sample uniformally from refs
    samp = np.random.choice(refs, size=blocksize, replace=True)
    samp_sum = np.cumsum(samp)

    # find where the cumsum first exceeds your limit
    last = np.searchsorted(samp_sum, lim, side='right')
    return samp[:last + 1]

    # # if it's ok to be just under lim rather than just over then this might
    # # be quicker
    # return samp[samp_sum <= lim]

Of course, if the sum of the sample of blocksize elements is < lim then this will fail to give you a sample whose sum is >= lim. You could check whether this is the case, and append to your sample in a loop if necessary.

def fast_samp_2(refs, lim=10000, blocksize=10000):

    samp = np.random.choice(refs, size=blocksize, replace=True)
    samp_sum = np.cumsum(samp)

    # is the sum of our current block of samples >= lim?
    while samp_sum[-1] < lim:

        # if not, we'll sample another block and try again until it is
        newsamp = np.random.choice(refs, size=blocksize, replace=True)
        samp = np.hstack((samp, newsamp))
        samp_sum = np.hstack((samp_sum, np.cumsum(newsamp) +  samp_sum[-1]))

    last = np.searchsorted(samp_sum, lim, side='right')
    return samp[:last + 1]

Note that concatenating arrays is pretty slow, so it would probably be better to make blocksize large enough to be reasonably sure that the sum of a single block will be >= to your limit, without being excessively large.

Update

I've adapted your original function a little bit so that its syntax more closely resembles mine.

def orig_samp(refs, lim=10000):

    # Initialize empty final list.
    b = []

    a1 = np.random.random(10000)

    # Run until the condition is met.
    while (sum(b) < lim):

        # Draw random [0,1) value.
        u = np.random.random()
        # Find closest value in sub-list a[1].
        idx = np.argmin(np.abs(u - a1))
        # Store value located in sub-list a[0].
        b.append(refs[idx])

    return b

Here's some benchmarking data.

%timeit orig_samp(refs, lim=10000)
# 100 loops, best of 3: 11 ms per loop

%timeit fast_samp_2(refs, lim=10000, blocksize=1000)
# 10000 loops, best of 3: 62.9 µs per loop

That's a good 3 orders of magnitude faster. You can do a bit better by reducing the blocksize a fraction - you basically want it to be comfortably larger than the length of the arrays you're getting out. In this case, you know that on average the output will be about 200 elements long, since the mean of all real numbers between 0 and 100 is 50, and 10000 / 50 = 200.

Update 2

It's easy to get a weighted sample rather than a uniform sample - you can just pass the p= parameter to np.random.choice:

def weighted_fast_samp(refs, weights=None, lim=10000, blocksize=10000):

    samp = np.random.choice(refs, size=blocksize, replace=True, p=weights)
    samp_sum = np.cumsum(samp)

    # is the sum of our current block of samples >= lim?
    while samp_sum[-1] < lim:

        # if not, we'll sample another block and try again until it is
        newsamp = np.random.choice(refs, size=blocksize, replace=True, 
                                   p=weights)
        samp = np.hstack((samp, newsamp))
        samp_sum = np.hstack((samp_sum, np.cumsum(newsamp) +  samp_sum[-1]))

    last = np.searchsorted(samp_sum, lim, side='right')
    return samp[:last + 1]

This answer is indeed much faster than my original code @ali_m, but I see that you make no use of the `a[1]` sub-list. I believe the reason for this is that you assume I want to sample `a[0]` _uniformly_ when I actually need to sample it according to `a[1]` which in my real code is a CDF and not simply random numbers. Could you modify your code to make use of `a[1]`? — Gabriel, Feb 06 '14 at 13:09
I'm getting a `ValueError: probabilities do not sum to 1` probably because I use a CDF, not a probability distribution but I'll try to work around this. One thing is not clear: how do you set the `blocksize` value? It appears that by setting it as a large number I could end up by pure chance with a sample that sums up to a _much_ larger number than `lim`, when I need that value to be as close as possible to it (hence the condition to stop adding elements as soon as the sum passes the `lim` limit in my original code) — Gabriel, Feb 06 '14 at 15:56
The PDF is the first derivative of the CDF - you could estimate it by doing something like `np.diff(cdf)`. Don't get too hung up on trying to choose `blocksize` perfectly - you just want it to be sufficiently large that you're unlikely to need to take a second sample. If it's a bit too big there will be some performance penalty because of greater memory allocation etc., but the largest possible sum you could get out will still be <= to `lim` + the largest value in `refs`. Generating a larger sample will still probably be cheaper than generating multiple smaller ones and concatenating them. — ali_m, Feb 06 '14 at 17:15
Thanks @ali_m, your comments helped me understand your code a lot better. Cheers! — Gabriel, Feb 07 '14 at 02:07

njzk2 · Answer 2 · 2014-02-05T23:20:27.247

Sort your reference array.

That allows log(n) lookups instead of needing to browse the whole list. (using bisect for example to find the closest elements)

For starters, I reverse a[0] and a[1] to simplify the sort:

a = np.sort([np.random.random(10000), np.random.uniform(0., 100., 10000)])

Now, a is sorted by order of a[0], meaning if you are looking for the closest value to an arbitrary number, you can start by a bisect:

while (sum(b) < 10000):
    # Draw random [0,1) value.
    u = np.random.random()
    # Find closest value in sub-list a[0].
    idx = bisect.bisect(a[0], u)
    # now, idx can either be idx or idx-1
    if idx is not 0 and np.abs(a[0][idx] - u) > np.abs(a[0][idx - 1] - u):
        idx = idx - 1
    # Store value located in sub-list a[1].
    b.append(a[1][idx])

volcano · Answer 3 · 2014-02-05T22:57:06.417

0

One obvious optimization - don't re-calculate sum on each iteration, accumulate it

b_sum = 0
while b_sum<10000:
    ....
    idx = np.argmin(np.abs(u - a[1]))
    add_val = a[0][idx]
    b.append(add_val)
    b_sum += add_val

EDIT:

I think some minor improvement (check it out if you feel like it) may be achieved by pre-referencing sublists before the loop

a_0 = a[0]
a_1 = a[1]
...
while ...:
    ....
    idx = np.argmin(np.abs(u - a_1))
    b.append(a_0[idx])

It may save some on run time - though I don't believe it will matter that much.

edited Feb 05 '14 at 22:57

answered Feb 05 '14 at 22:19

volcano

3,578
21
28

It may sound funny, but... your first optimization actually penalizes (in my computer, the best time went from ~59µs to ~12.9ms). – Ricardo Cárdenes Feb 05 '14 at 22:53
@RicardoCárdenes, well, if you say so - but I find it hard to believe. – volcano Feb 05 '14 at 22:55
@RicardoCárdenes Try slightly improved form – volcano Feb 05 '14 at 22:58
Scratch that... I don't know what I did before when testing. I copied the snippets again and OP's code takes now a few more milliseconds. – Ricardo Cárdenes Feb 05 '14 at 23:09

score 0 · Answer 4 · answered Feb 05 '14 at 22:46

0

Write it in cython. That's going to get you a lot more for a high iteration operation.

http://cython.org/

answered Feb 05 '14 at 22:46

joel3000

1,249
11
22

Speed up loop to fill an array with closest values from another array

4 Answers4

Update

Update 2