0

I have the following code:

possible_keys = map(lambda combo: ''.join(combo), itertools.product(LETTERS, repeat=key_len))

It generates all the possible combinations in the alphabet based on the key length I'm passing it. For example, if I pass in 2, it will generate aa, ab, ac, and so on. If I pass in 3, it will generate aaa, aab, aac, and so on.

I would like to optimize my code a little better by removing instances where the string is all the same letters, e.g. aaa, bbbb, ccccccc. Essentially, removing the diagonals if this was a matrix. How can I accomplish that with this or if there a better way?

EDIT: I'm working on the Vigenere Cipher by dictionary attack. I didn't want to reveal big problem I was working on because I was afraid people would give answers to that instead haha. Though, now any suggestions welcomed :) This is my first iteration of my program so it's really inefficient because I'm going through all possible keys and matching it with what's in the dictionary provided..

ivan_pozdeev
  • 33,874
  • 19
  • 107
  • 152
heyyo
  • 139
  • 2
  • 13

5 Answers5

3

Using the fact that the runs of the same element are located

L0+L1+...+LL-1=(LL-1)/(L-1)

elements apart, and there are L of them in total, we can exclude them from the product at the end without encumbering the inner loop or computing hashes:

LETTERS='abc'
l = len(LETTERS)
p = [''.join(i) for i in itertools.product(LETTERS,repeat=l)]
step=(l**l-1)/(l-1)
for i in range(l):
    del p[i*step-i]    #allow for the fact that each time we delete an element,
                       #all indices shift backward by one

Comparative performance:

In [88]: letters=string.ascii_lowercase[:8]    # 8**8=16777216 elements in product

In [89]: timeit ex(letters)           # this solution
1 loop, best of 3: 6.1 s per loop  

In [90]: timeit minus_set(letters)    # subtracting a set at the end
1 loop, best of 3: 28.1 s per loop

In [92]: timeit ck_len(letters)       # checking len(set(i))
1 loop, best of 3: 15.1 s per loop

In [94]: timeit not_set(letters)      # checking `not in exclude'
1 loop, best of 3: 7.54 s per loop

def ex_mod_iter(letters):   # counter in the loop like it'd be done in C
    l = len(letters)
    step=(l**l-1)/(l-1)
    p = [''.join(v) for i,v in enumerate(itertools.product(letters,repeat=l)) if i % step]
    return p

In [5]: timeit ex_mod_iter(letters)
1 loop, best of 3: 6.61 s per loop
ivan_pozdeev
  • 33,874
  • 19
  • 107
  • 152
  • Nice! I thought about it theoretically (popping the indexes). But you made it possible. Most likely the fastest too (but not most readable I'd say). – Anton vBR Feb 13 '18 at 21:58
  • @AntonvBR To my surprise, not everyone else is left in the dust :-) – ivan_pozdeev Feb 13 '18 at 23:06
  • Fastest indeed!! However my belief is that ppl don’t use Python for computation speed and in that case I’d vote for the exclude code for readabiity. – Anton vBR Feb 13 '18 at 23:11
  • Filtering earlier: `p = [''.join(p) for (i, p) in enumerate(itertools.product(LETTERS,repeat=l)) if i % step]` – Steven Rumbalski Feb 14 '18 at 00:31
  • I realize that's similar to your `ex_mod_iter` except you composed `enumerate()` using `izip()` and `count()` and explicitly checked the mod as not equal zero. I wonder if that impacted performance. (I'll check tomorrow when I'm at a computer.) – Steven Rumbalski Feb 14 '18 at 00:43
  • 1
    @StevenRumbalski thanks, I forgot that `enumerate()` returns an iterator, so it's safe to use for humongous sequences. Yes, the change reduced the time by about a second. – ivan_pozdeev Feb 14 '18 at 00:48
2

You could simply check if the first letter occupies the entire string and exclude strings that match the criterion:

possible_keys = [''.join(x) for x in product(L, repeat=key_len) 
                                      if len(x) != x.count(x[0])]

Although tuple.count has the same O(n) complexity as set(), counting is relatively cheaper and thus likely to be much faster than building sets from the tuples.

Moses Koledoye
  • 77,341
  • 8
  • 133
  • 139
1

Using set (which can only hold unique values), we can do this e.g.:

import itertools
possible_keys = [''.join(i) for i in itertools.product('AB', repeat=2) if len(set(i)) !=1]
print(possible_keys)

Returns:

['AB', 'BA']

Side note: lambda is not necessary here

For more speed: we could also make an exception list first if speed is more important.

exclude = {(x,)*3 for x in 'ABC'}
possible_keys= [''.join(i) for i in itertools.product('ABC', repeat=3) if i not in exclude]

For [A-Z] you can use:

import string
n = 4
letters = string.ascii_uppercase
exc = {(x,)*n for x in letters}
l = [''.join(i) for i in itertools.product(letters, repeat=n) if i not in exc]

timings

%timeit l = {(x,)*3 for x in 'ABC'};[''.join(i) for i in itertools.product('ABC', repeat=3) if i not in l ]
%timeit [''.join(i) for i in itertools.product('ABC', repeat=3) if len(set(i)) !=1]
%timeit [''.join(x) for x in itertools.product('ABC', repeat=3) if len(x) != x.count(x[0])]

100000 loops, best of 3: 5.48 µs per loop
100000 loops, best of 3: 10.7 µs per loop
100000 loops, best of 3: 8.19 µs per loop
Anton vBR
  • 18,287
  • 5
  • 40
  • 46
0

Not sure how efficient this is but it should do the trick.

# Remove combos that are composed of the same letter.
possible_keys = map(lambda combo: ''.join(combo) if len(set(combo)) > 1 else False, itertools.product(LETTERS, repeat=key_len))

It also uses set but does the filtering in the lambda expression.

Joe Linoff
  • 761
  • 9
  • 13
0

Solution

You can use the itertools.repeat to get a list of the sequences you wish to take out of your original list.

import itertools
import string

# get all sequences with all same letters
all_same_sequences = [''.join(list(itertools.repeat(char, key_len))) for char in string.ascii_lowercase]

# take those sequences out of the original list
possible_keys = sort(list(set(possible_keys ) - set(all_same_sequences)))
Community
  • 1
  • 1
Amir Biran
  • 54
  • 1
  • 8
  • True, you can either use list(set(x) - set(y)) or to use the solution given here to make this work rather easily: https://stackoverflow.com/questions/3428536/python-list-subtraction-operation – Amir Biran Feb 13 '18 at 21:28