This is going to search every element of b
for each element of a
. It's also going to do a linear remove
on each list for each element that matches. So, your algorithm takes quadratic time—O(max(N, M)^2)
where N
is the length of a
and M
is the length of b
.
If you just copy b
into a set
instead of a list
, that solves the problem. Now you're just doing a constant-time set lookup for each element in a
, and a constant-time set remove instead of a list remove. But you've still got the problem with the linear-time and incorrect removing from the a
copy. And you can't just copy a
into a set
, because that loses duplicates.
On top of that, a_sub_b.remove(e)
removes an element matching e
. That isn't necessarily the same element as the element you just looked up. It's going to be an equal element, and if identity doesn't matter at all, that's fine… but if it does, then remove
may do the wrong thing.
At any rate, performance is already a good enough reason not to use remove
. Once you've solved the problems above, this is the only thing making your algorithm quadratic instead of linear.
The easiest way to solve this problem is to build up a new list, rather than copying the list and removing from it.
Solving both problems, you have O(2N+M)
time, which is linear.
So, putting the two together:
b_set = set(b)
new_a = []
for element in a:
if a in b_set:
b_set.remove(element)
else:
new_a.append(element)
However, this still may have a problem. You haven't stated things very clearly, so it's hard to be sure, but can b
contain duplicates, and, if so, does that mean the duplicated elements should be removed from a
multiple times? If so, you need a multi-set, not a set
. The easiest way to do that in Python is with a Counter
:
from collections import Counter
b_counts = Counter(b)
new_a = []
for element in a:
if b_counts[element]:
b_counts[element] -= 1
else:
new_a.append(element)
On the other hand, if the order of neither a
nor b
matters, this just reduces to multiset difference, which makes it even easier:
new_a = list((Counter(a) - Counter(b)).elements())
But really, if the order of both is meaningless, you probably should have been using a Counter
or other multiset representation in the first place, not a list…