1

Possible Duplicate:
Python list subtraction operation

In Python you can concatenate lists like so:

print([3,4,5]+[4,5])

which gives this output:

[3,4,5,4,5]

But what I'm looking for is an equivalent 'subtraction' operation, so that doing something like this:

print([3,4,5]-[4,5])

Will output this:

[3]

However, the subtraction operator isn't defined for lists. I've tried this:

a = [3,4,5]
b = [4,5]
print(list(filter(lambda x : x not in b,a)))

Which works, but I'm uncertain whether or not this is the best way to do this. I would also like to preserve the original item positions

Community
  • 1
  • 1
Max
  • 7,957
  • 10
  • 33
  • 39
  • 7
    Turn list into set and do the subtraction? It is not very well-defined to do subtraction with list, especially if you have duplicate. – nhahtdh Sep 26 '12 at 13:07
  • 1
    What should happen in the case of `a = [3,4,5]`, `b = [5,4]` since you want order to matter ... – mgilson Sep 26 '12 at 13:09
  • 1
    Adding lists like that is called 'concatenation'. Subtraction is not the opposite procedure. – deadly Sep 26 '12 at 13:17
  • 2
    What happens if the first list contains duplicates? if `a = [3, 4, 4, 5]` and `b = [4, 5]`, does only one 4 get removed? Would the answer be `[3, 4]` or `[3]` – jsvk Sep 26 '12 at 13:17
  • 1
    It's also unclear to me what the OP wants. Should ``[3, 4, 5]-[4]`` produce ``[3, 5]`` or not be a valid operation (as in, does this only work on tails - as my answer presumes)? – Gareth Latty Sep 26 '12 at 13:27
  • 1
    @nhahtdh: There is an established an well defined technique in Prolog programming that does exactly that - subtracting lists. Such lists are called [Difference Lists](http://homepages.inf.ed.ac.uk/pbrna/prologbook/node180.html). I'm not sure though if the OP had something like that in mind. – pillmuncher Sep 26 '12 at 14:07

7 Answers7

5

You can easily do this with a list comprehension:

nl = [elem for elem in a if elem not in b]

Edit

Better to use a set to test against. This will remove duplicates from your list.

bb= set(b)
nl = [elem for elem in a if elem not in bb]
LarsVegas
  • 6,522
  • 10
  • 43
  • 67
  • If `a` and `b` are large lists, it'd be better to convert `b` to set before using this – Anton Guryanov Sep 26 '12 at 13:10
  • yeah this is n^2, better to use a set – jterrace Sep 26 '12 at 13:19
  • 2
    @larsvegas -- I'm not positive, but I'm pretty sure that the (Cpython) interpreter isn't smart enough to know to avoid creating a set for each element in a. Better put `bb = set(b)` on a separate line, and then test if `elem in bb` – mgilson Sep 26 '12 at 13:30
  • @mgilson Doesn't appear to be - or at least, it runs print statements for each item in my test. ``[_ for _ in [1, 2, 3] if not print("Run")]`` – Gareth Latty Sep 26 '12 at 13:32
  • @mgilson, edited my answer anyway. – LarsVegas Sep 26 '12 at 13:33
  • @Lattyware -- Yeah, but `print` is an unpure function (it has side-effects) -- but `set` doesn't have side effects. It's conceivable that the interpreter could have optimizations to check if the function is builtin and then if it is pure and then use that in the list comp. The best way to know would be to use `timeit` with moderately sized lists and see what happens I'd think. Anyway, I *highly* doubt Cpython does any of this, and maybe it never will. pypy *might*, but I'm not even sure if it is that smart about this stuff. – mgilson Sep 26 '12 at 13:38
  • @mgilson Yeah, I was making the presumption that given Python's dynamic nature, there was going to be no good way for the interpreter to know if a function had side effects or not. You are right it could be hard coded for built-ins though, I hadn't thought of that. – Gareth Latty Sep 26 '12 at 13:47
4

This is a somewhat poorly-defined problem. I can think of several non-equivalent definitions of list "subtraction," two of which are already represented: truncating (via slicing) -- a true inverse of concatenation; and filtering, which resembles the definition of "subtraction" (really relative complementation) for sets. For filtering, using a list comprehension over a with b converted to a set is the best approach. (I.e. larsvegas's answer.)

But one version that hasn't been considered is the multiset definition of subtraction. Python's Counter type provides us with a multiset:

>>> from collections import Counter
>>> a = [3, 4, 5]
>>> b = [4, 5]
>>> a_counter = Counter(a)
>>> b_counter = Counter(b)
>>> a_counter
Counter({3: 1, 4: 1, 5: 1})
>>> b_counter
Counter({4: 1, 5: 1})
>>> a_counter - b_counter
Counter({3: 1})

Of course, this doesn't preserve order, but we can fix that by filtering a based on the result of a_counter - b_counter:

def subtract_lists(a, b):
    multiset_difference = Counter(a) - Counter(b)
    result = []
    for i in a:
        if i in multiset_difference:
            result.append(i)
            multiset_difference -= Counter((i,))
    return result

This has several nice properties. It preserves order; it functions as a true inverse of concatenation; it implements an intuitively consistent version of subtraction on a datatype that can contain duplicates; and it works in linear time.

>>> subtract_lists(a, b)
[3]
>>> subtract_lists([1, 2, 3, 4], [2, 3, 4])
[1]
>>> subtract_lists([1, 2, 3, 4], [2, 4])
[1, 3]
>>> subtract_lists([1, 2, 3, 4, 4, 4], [2, 4])
[1, 3, 4, 4]
Community
  • 1
  • 1
senderle
  • 145,869
  • 36
  • 209
  • 233
2
a = [3,4,5]
b = [4,5]

list(set(a) -  set(b))
[3]
Rakesh
  • 81,458
  • 17
  • 76
  • 113
2

If you mean subtraction as in removing the last elements from the list, then it's quite a simple operation using list slicing:

def list_subtraction(seq, remove):
    l = len(remove)
    if seq[-l:] == remove:
        return seq[:-l]
    else:
        raise ValueError("Subtraction not possible, "
                         "{} is not a tail of {}.".format(remove, seq))
Gareth Latty
  • 86,389
  • 17
  • 178
  • 183
1

This is of course since it's just appending, which is why the duplicates aren't removed or affected at all.

Subtraction would be just slicing off the end:

a = [3, 4, 5]
b = [4, 5]
c = a + b

d = c[:-len(b)]

This will make d equal a, i.e. [3, 4, 5].

unwind
  • 391,730
  • 64
  • 469
  • 606
1

Given:

a = [3, 4, 5]
b = [4, 5]

Then one of the following should work, depending on what you want.

# remove 'b' from the end of 'a' if it's there (strict de-concatenation)
if a[-len(b):] == b:
    a = a[:-len(b)]

# remove any elements from 'a' that are in `b` (including multiples)
bset = set(b)
a = [x for x in a if x not in bset]

# faster version of above but doesn't preserve order
a = list(set(a) - set(b))

# remove elements from 'a' that are in 'b' (one leftmost item only)
bset = set(b)
a = [x for x in a if x not in bset or bset.remove(x)]

# remove elements from 'a' that are in 'b' (one rightmost item only)
bset = set(b)
a = list(reversed([x for x in reversed(a) if x not in bset or bset.remove(x)]))
kindall
  • 178,883
  • 35
  • 278
  • 309
0

If you want this to remove things from anywhere in the list, and only remove them as many times as they appear in the second list (so that sub([1, 2, 3, 3, 4, 4, 5], [3, 4, 5]) == [1, 2, 3, 4]), you need to be a little trickier and remove each element from (a copy of) the right-hand list as you use it:

def sub(l, r):
    '''
    Remove all elements in r from l
    '''
    r = r[:]
    res = []
    for a in l:
        try:
            i = r.index(a)
        except ValueError:
            res.append(a)
        else:
            del r[i]
    return res

If you want, eg, [1, 2, 3] - [4] to be an error, you can check after the loop if r is non-empty.

lvc
  • 34,233
  • 10
  • 73
  • 98