Exclude runs of the same element from itertools.product

Question

I have the following code:

possible_keys = map(lambda combo: ''.join(combo), itertools.product(LETTERS, repeat=key_len))

It generates all the possible combinations in the alphabet based on the key length I'm passing it. For example, if I pass in 2, it will generate aa, ab, ac, and so on. If I pass in 3, it will generate aaa, aab, aac, and so on.

I would like to optimize my code a little better by removing instances where the string is all the same letters, e.g. aaa, bbbb, ccccccc. Essentially, removing the diagonals if this was a matrix. How can I accomplish that with this or if there a better way?

EDIT: I'm working on the Vigenere Cipher by dictionary attack. I didn't want to reveal big problem I was working on because I was afraid people would give answers to that instead haha. Though, now any suggestions welcomed :) This is my first iteration of my program so it's really inefficient because I'm going through all possible keys and matching it with what's in the dictionary provided..

You could possibly use `len(set(x)) != 1`. Try this: `[i for i in possible_keys if len(set(i)) != 1]` — Anton vBR, Feb 13 '18 at 21:07
Could you provide more information about the problem? I mean, the real problem. This seems to be a solution for something inside another thing you want to do. If you tell us, we could give you a even better solution! :) (read this for more info: http://xyproblem.info/) — Mikael Souza, Feb 13 '18 at 21:14
@heyyo Sorry if you didn't want to tell us the big problem. It is just that most people do this by accident. I hope we can help you without spoiling it! :D — Mikael Souza, Feb 13 '18 at 21:43

ivan_pozdeev · Accepted Answer · 2018-02-14T00:47:25.463

3

Using the fact that the runs of the same element are located

L⁰+L¹+...+L^L-1=(L^L-1)/(L-1)

elements apart, and there are L of them in total, we can exclude them from the product at the end without encumbering the inner loop or computing hashes:

LETTERS='abc'
l = len(LETTERS)
p = [''.join(i) for i in itertools.product(LETTERS,repeat=l)]
step=(l**l-1)/(l-1)
for i in range(l):
    del p[i*step-i]    #allow for the fact that each time we delete an element,
                       #all indices shift backward by one

Comparative performance:

In [88]: letters=string.ascii_lowercase[:8]    # 8**8=16777216 elements in product

In [89]: timeit ex(letters)           # this solution
1 loop, best of 3: 6.1 s per loop  

In [90]: timeit minus_set(letters)    # subtracting a set at the end
1 loop, best of 3: 28.1 s per loop

In [92]: timeit ck_len(letters)       # checking len(set(i))
1 loop, best of 3: 15.1 s per loop

In [94]: timeit not_set(letters)      # checking `not in exclude'
1 loop, best of 3: 7.54 s per loop

def ex_mod_iter(letters):   # counter in the loop like it'd be done in C
    l = len(letters)
    step=(l**l-1)/(l-1)
    p = [''.join(v) for i,v in enumerate(itertools.product(letters,repeat=l)) if i % step]
    return p

In [5]: timeit ex_mod_iter(letters)
1 loop, best of 3: 6.61 s per loop

edited Feb 14 '18 at 00:47

answered Feb 13 '18 at 21:54

ivan_pozdeev

33,874
19
107
152

Nice! I thought about it theoretically (popping the indexes). But you made it possible. Most likely the fastest too (but not most readable I'd say). – Anton vBR Feb 13 '18 at 21:58
@AntonvBR To my surprise, not everyone else is left in the dust :-) – ivan_pozdeev Feb 13 '18 at 23:06
Fastest indeed!! However my belief is that ppl don’t use Python for computation speed and in that case I’d vote for the exclude code for readabiity. – Anton vBR Feb 13 '18 at 23:11
Filtering earlier: `p = [''.join(p) for (i, p) in enumerate(itertools.product(LETTERS,repeat=l)) if i % step]` – Steven Rumbalski Feb 14 '18 at 00:31
I realize that's similar to your `ex_mod_iter` except you composed `enumerate()` using `izip()` and `count()` and explicitly checked the mod as not equal zero. I wonder if that impacted performance. (I'll check tomorrow when I'm at a computer.) – Steven Rumbalski Feb 14 '18 at 00:43
1

@StevenRumbalski thanks, I forgot that `enumerate()` returns an iterator, so it's safe to use for humongous sequences. Yes, the change reduced the time by about a second. – ivan_pozdeev Feb 14 '18 at 00:48

score 2 · Answer 2 · answered Feb 13 '18 at 21:26

You could simply check if the first letter occupies the entire string and exclude strings that match the criterion:

possible_keys = [''.join(x) for x in product(L, repeat=key_len) 
                                      if len(x) != x.count(x[0])]

Although tuple.count has the same O(n) complexity as set(), counting is relatively cheaper and thus likely to be much faster than building sets from the tuples.

Anton vBR · Answer 3 · 2018-02-13T21:48:37.923

1

Using set (which can only hold unique values), we can do this e.g.:

import itertools
possible_keys = [''.join(i) for i in itertools.product('AB', repeat=2) if len(set(i)) !=1]
print(possible_keys)

Returns:

['AB', 'BA']

Side note: lambda is not necessary here

For more speed: we could also make an exception list first if speed is more important.

exclude = {(x,)*3 for x in 'ABC'}
possible_keys= [''.join(i) for i in itertools.product('ABC', repeat=3) if i not in exclude]

For [A-Z] you can use:

import string
n = 4
letters = string.ascii_uppercase
exc = {(x,)*n for x in letters}
l = [''.join(i) for i in itertools.product(letters, repeat=n) if i not in exc]

timings

%timeit l = {(x,)*3 for x in 'ABC'};[''.join(i) for i in itertools.product('ABC', repeat=3) if i not in l ]
%timeit [''.join(i) for i in itertools.product('ABC', repeat=3) if len(set(i)) !=1]
%timeit [''.join(x) for x in itertools.product('ABC', repeat=3) if len(x) != x.count(x[0])]

100000 loops, best of 3: 5.48 µs per loop
100000 loops, best of 3: 10.7 µs per loop
100000 loops, best of 3: 8.19 µs per loop

edited Feb 13 '18 at 21:48

answered Feb 13 '18 at 21:11

Anton vBR

18,287
5
40
46

1

This has quadratic complexity. – ivan_pozdeev Feb 13 '18 at 21:17
@ivan_pozdeev: that is correct, but the problem is inherently quadratic, since if the charset contains 3 elements, there will be 6 elements, if it has 4, it will be 12, etc. Strictly speaking it has *n^k * k* complexity. – Willem Van Onsem Feb 13 '18 at 21:18
@WillemVanOnsem Hmm... it's cubic then? `set()` adds `N` additional steps at each iteration of `product`. – ivan_pozdeev Feb 13 '18 at 21:20
@ivan_pozdeev: no it is exponential, but there is not much we can do about it, because the result set has that size, and you can not generate *n* items in a time complexity less than *O(n)*. – Willem Van Onsem Feb 13 '18 at 21:22
Your timing is **unfair**. Why have you neglected the time for building the `exclude` list for the first approach? – Moses Koledoye Feb 13 '18 at 21:40
nice. I'm using A-Z in my implementation so either way it will take a long time ;) – heyyo Feb 13 '18 at 21:41
@AntonvBR Quantitatively, there is no such thing as it builds fast. Include it and see what happens. – Moses Koledoye Feb 13 '18 at 21:41
@MosesKoledoye Done. – Anton vBR Feb 13 '18 at 21:41
1

@AntonvBR Fair enough. However, you're using a generator expression which gets exhausted after the first membership check and just skips over the rest. Use a set comprehension instead. – Moses Koledoye Feb 13 '18 at 21:48

score 0 · Answer 4 · answered Feb 13 '18 at 21:17

Not sure how efficient this is but it should do the trick.

# Remove combos that are composed of the same letter.
possible_keys = map(lambda combo: ''.join(combo) if len(set(combo)) > 1 else False, itertools.product(LETTERS, repeat=key_len))

It also uses set but does the filtering in the lambda expression.

score 0 · Answer 5 · edited Jun 20 '20 at 09:12

0

Solution

You can use the itertools.repeat to get a list of the sequences you wish to take out of your original list.

import itertools
import string

# get all sequences with all same letters
all_same_sequences = [''.join(list(itertools.repeat(char, key_len))) for char in string.ascii_lowercase]

# take those sequences out of the original list
possible_keys = sort(list(set(possible_keys ) - set(all_same_sequences)))

edited Jun 20 '20 at 09:12

Community

1
1

answered Feb 13 '18 at 21:22

Amir Biran

54
1
8

True, you can either use list(set(x) - set(y)) or to use the solution given here to make this work rather easily: https://stackoverflow.com/questions/3428536/python-list-subtraction-operation – Amir Biran Feb 13 '18 at 21:28

Exclude runs of the same element from itertools.product

5 Answers5

Solution