Efficiently choosing x different elements in a list at random

Question

I have a list:

a = [1,2,1,1,3,5,6,2]

I want to select, say 3 elements at random from this list, but they must all be different.

I need to preserve the 'weight' of each element, so sampling from set(a) is not possible.

So far, my solution is:

while condition == False:
    mysample = random.sample(a, 3)
    if len(set(mysample)) - len(mysample) !=0:
        condition = False
    else:
        condition = True

But this forces me to re-sample as many times as it takes for the elements to all be different. This works fine for small sampling, but for large sampling, my code becomes very inefficient...

It can return duplicates if you give it a list with duplicates. The OP wants a list with no duplicates but that is still weighted such that more common elements are more likely to appear than less common ones. — John Kugelman, Mar 03 '15 at 21:51
@aruisdante `random.sample(a, 3)` yielded `[1, 1, 6]` on my machine. It returned two of the three `1`s in the list. — John Kugelman, Mar 03 '15 at 21:52
sample 1, then remove that all occurences of that in the original list. Repeat until you have the required number of elements. Have a look here: http://stackoverflow.com/questions/1157106/remove-all-occurences-of-a-value-from-a-python-list — Calum, Mar 03 '15 at 21:53
@JohnKugelman Ah wait, they do say later that it's uniqueness based on index, not value. They don't highlight that well. My bad. — aruisdante, Mar 03 '15 at 21:53

score 3 · Answer 1 · edited Mar 03 '15 at 22:02

3

You can shuffle and take the first three non repeated elements:

import random
random.shuffle(your_list)
three_elements = set()
for v in your_list:
  if len(three_elements) == 3: break
  three_elements.add(v)

edited Mar 03 '15 at 22:02

John Kugelman

349,597
67
533
578

answered Mar 03 '15 at 21:48

Luís Bianchin

2,327
1
28
36

1

Most efficient (and elegant!) for small lists. – F. Evlangeli Mar 04 '15 at 07:24

Padraic Cunningham · Answer 2 · 2015-03-03T22:54:54.323

l = []
seen = set()
while len(l) < 3:
    ch = choice(a)
    if ch not in seen:
        l.append(ch)
        seen.add(ch)
print(l)

Depending on the ratio of actual different numbers to elements different approaches will have different advantages:

In [7]: a = [choice(range(10000)) for _ in range(100000)]

In [6]: import random

In [7]: a = [choice(range(10000)) for _ in range(100000)]

In [8]: %%timeit
random.shuffle(a)
three_elements = set()
for v in a:
    if len(three_elements) == 5000:
        break
    if not v in three_elements:
        three_elements.add(v)
   ...: 
10 loops, best of 3: 36.5 ms per loop

In [9]: %%timeit                          
l = []
seen = set()
while len(l) < 5000:
    ch = choice(a)
    if ch not in seen:
        l.append(ch)
        seen.add(ch)
   ...: 
100 loops, best of 3: 5.16 ms per loop

Running your code after 10 mins I had to exit the process as it was still going so whatever you choose will be a major improvement.

Using shuffle would be more efficient if you had a greater ratio of repeats to actual items in the list and you wanted a very large sample size, otherwise the cost of shuffling will make it less efficient than simply using a set and choice,

score 0 · Answer 3 · answered Mar 03 '15 at 22:01

0

while count < sampleSize: # where sampeSize is the number of values you want
    s = random.sample(a, 1)
    filter(lambda x: x != s, a)
    mysample.append(s)
    count += 1

answered Mar 03 '15 at 22:01

Calum

2,110
2
22
39

score 0 · Answer 4 · answered Mar 03 '15 at 23:09

This is probably more complicated than necessary, but here is an implementation that uses a modified version of reservoir sampling.

import itertools
import random

def element_at(iterable, index, default=None):
    return next(itertools.islice(iterable, index, None), default)

def sample_unique(iterable, size):
    S = set()
    for index, item in enumerate(iterable):
        if len(S) < size:
            S.add(item)
        else:
            r = random.randint(0, index)
            if r < size:
                other = element_at(S, r)
                if item not in S:
                    S.remove(other)
                    S.add(item)
    return S

Efficiently choosing x *different* elements in a list at random

4 Answers4

Efficiently choosing x different elements in a list at random