How do I subtract one list from another?

Question

I want to take the difference between lists x and y:

>>> x = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> y = [1, 3, 5, 7, 9]  
>>> x - y
# should return [0, 2, 4, 6, 8]

What should [2, 1, 2, 3, 2, 4, 2] - [2, 3, 2] return, and why? Should it find the 232 in the middle and return 2142? or should it find the first each time and return 1242? Or something else? What I'm saying is that these are not obvious answers and depend on need. — McKay, Jul 05 '17 at 15:07

score 488 · Accepted Answer · edited Jan 31 '23 at 01:34

488

Use a list comprehension to compute the difference while maintaining the original order from x:

[item for item in x if item not in y]

If you don't need list properties (e.g. ordering), use a set difference, as the other answers suggest:

list(set(x) - set(y))

To allow x - y infix syntax, override __sub__ on a class inheriting from list:

class MyList(list):
    def __init__(self, *args):
        super(MyList, self).__init__(args)

    def __sub__(self, other):
        return self.__class__(*[item for item in self if item not in other])

Usage:

x = MyList(1, 2, 3, 4)
y = MyList(2, 5, 2)
z = x - y

edited Jan 31 '23 at 01:34

Mateen Ulhaq

24,552
19
101
135

answered Aug 07 '10 at 00:19

aaronasterling

68,820
20
127
125

37

If you do `[1,1,2,2] - [1,2]` you will get empty list. `[1,1,2,2] - [2]` gives `[1,1]` So it is not really list substraction, it is more like _"List from List **X** without elements from set **Y**"_. – Alfred Zien Feb 06 '16 at 10:25
The list comprehension method is way slower (in my example) than the set difference method. – redfiloux Feb 26 '19 at 10:22
2

@BarnabasSzabolcs: That won't save a thing, because it will convert `y` to a `set` before *every* check (which is similar cost to original work). You'd need to either do `yset = set(y)` outside the listcomp, then test `if item not in yset`, or as an egregious hack, do `[item for yset in [set(y)] for item in x if item not in yset]` which abuses nested listcomps to cache the `yset` as a one-liner. A slightly less ugly one-liner solution that performs adequately would be to use `list(itertools.filterfalse(set(y).__contains__, x))` because the argument to `filterfalse` is only constructed once. – ShadowRanger Sep 05 '19 at 22:03
A solution that is O(n) [fast and preserves order](https://stackoverflow.com/a/75292565/365102): `ys_set = set(ys); result = [x for x in xs if x not in ys_set]`. – Mateen Ulhaq Feb 05 '23 at 22:37

score 368 · Answer 2 · answered Aug 06 '10 at 23:45

368

Use set difference

>>> z = list(set(x) - set(y))
>>> z
[0, 8, 2, 4, 6]

Or you might just have x and y be sets so you don't have to do any conversions.

answered Aug 06 '10 at 23:45

quantumSoup

27,197
9
43
57

96

this will lose any ordering. That may or may not matter depending on context. – aaronasterling Aug 07 '10 at 00:19
96

This will also loose any possible duplicates that may need/want maintaining. – Opal Jun 24 '11 at 05:31
2

This is way faster in cases where the lists being compared are large – JqueryToAddNumbers Oct 06 '18 at 02:57

nguyên · Answer 3 · 2013-11-28T06:54:51.217

47

if duplicate and ordering items are problem :

[i for i in a if not i in b or b.remove(i)]

a = [1,2,3,3,3,3,4]
b = [1,3]
result: [2, 3, 3, 3, 4]

edited Nov 28 '13 at 06:54

answered Nov 28 '13 at 06:49

nguyên

5,156
5
43
45

4

This works, though it's `O(m * n)` runtime (and I cringe whenever a listcomp includes side-effects); [you can improve on it using `collections.Counter`](https://stackoverflow.com/a/57827145/364696) to get `O(m + n)` runtime. – ShadowRanger Sep 06 '19 at 18:50
1

@anushka Rather than `[item for item in a if not item in b]` (which works more like set subtraction), this has `... if not item in b or b.remove(item)`. `b.remove(item)` returns `false` if `item` is not in `b` and removes `item` from `b` otherwise. This prevents items in the second list (`a - b`, in this case) from being subtracted more than once for each occurrence. This prevents de-duping, which happens if you follow some of the other answers. It's not super-efficient (definitely follow @ShaworRangers suggestion for efficiency), but I think this is probably the most-correct answer. – jmorganmartin Jul 01 '22 at 00:38

Santa · Answer 4 · 2015-07-27T20:08:04.907

42

That is a "set subtraction" operation. Use the set data structure for that.

In Python 2.7:

x = {1,2,3,4,5,6,7,8,9,0}
y = {1,3,5,7,9}
print x - y

Output:

>>> print x - y
set([0, 8, 2, 4, 6])

edited Jul 27 '15 at 20:08

answered Aug 06 '10 at 23:46

Santa

11,381
8
51
64

1

list(set([1,2,3,4,5]) - set([1,2,3])) = [4, 5] so that's lists each to set first, then subtract (or one-way diff) and back to list. – gseattle Jun 05 '17 at 08:22
4

Not good if you like to maintain original item order of the x set. – Zahran Aug 26 '17 at 17:16

score 25 · Answer 5 · edited May 23 '17 at 11:47

For many use cases, the answer you want is:

ys = set(y)
[item for item in x if item not in ys]

This is a hybrid between aaronasterling's answer and quantumSoup's answer.

aaronasterling's version does len(y) item comparisons for each element in x, so it takes quadratic time. quantumSoup's version uses sets, so it does a single constant-time set lookup for each element in x—but, because it converts both x and y into sets, it loses the order of your elements.

By converting only y into a set, and iterating x in order, you get the best of both worlds—linear time, and order preservation.*

However, this still has a problem from quantumSoup's version: It requires your elements to be hashable. That's pretty much built into the nature of sets.** If you're trying to, e.g., subtract a list of dicts from another list of dicts, but the list to subtract is large, what do you do?

If you can decorate your values in some way that they're hashable, that solves the problem. For example, with a flat dictionary whose values are themselves hashable:

ys = {tuple(item.items()) for item in y}
[item for item in x if tuple(item.items()) not in ys]

If your types are a bit more complicated (e.g., often you're dealing with JSON-compatible values, which are hashable, or lists or dicts whose values are recursively the same type), you can still use this solution. But some types just can't be converted into anything hashable.

If your items aren't, and can't be made, hashable, but they are comparable, you can at least get log-linear time (O(N*log M), which is a lot better than the O(N*M) time of the list solution, but not as good as the O(N+M) time of the set solution) by sorting and using bisect:

ys = sorted(y)
def bisect_contains(seq, item):
    index = bisect.bisect(seq, item)
    return index < len(seq) and seq[index] == item
[item for item in x if bisect_contains(ys, item)]

If your items are neither hashable nor comparable, then you're stuck with the quadratic solution.

_{* Note that you could also do this by using a pair of OrderedSet objects, for which you can find recipes and third-party modules. But I think this is simpler.}

_{** The reason set lookups are constant time is that all it has to do is hash the value and see if there's an entry for that hash. If it can't hash the value, this won't work.}

Alain T. · Answer 6 · 2019-09-06T20:21:58.560

16

If the lists allow duplicate elements, you can use Counter from collections:

from collections import Counter
result = list((Counter(x)-Counter(y)).elements())

If you need to preserve the order of elements from x:

result = [ v for c in [Counter(y)] for v in x if not c[v] or c.subtract([v]) ]

edited Sep 06 '19 at 20:21

answered Mar 06 '19 at 03:53

Alain T.

40,517
4
31
51

This is good, though it does lose ordering; fixing that [is a bit more complicated](https://stackoverflow.com/a/57827145/364696). – ShadowRanger Sep 06 '19 at 18:47
Don't mind me, I'm just going to shudder at listcomps with caching and side-effects (although I suppose the combination of the two removes the externally visible side-effects?). :-) – ShadowRanger Sep 06 '19 at 19:41
1

Also, this code won't work as written; `Counter.subtract` doesn't remove zero valued elements (`-` and `-=` do, but not `subtract`), so you'd never stop removing elements. You'd want to replace `not v in c` with `not c[v]` (which returns zero for non-existent elements, so you can safely test the return for "zeroiness" via `not`). – ShadowRanger Sep 06 '19 at 19:44
.elements() is unncessary. list((Counter(x) - Counter(y))) works for my case – Machine Yadav Sep 05 '21 at 16:16
As long as you don't have duplicate values in x, it does indeed, but in that case, using a set would probably be more efficient: `[vx for ySet in [set(y)] for vx in x if vx not in ySet]` (preserving the order of x), or `list(set(x)-set(y))` if the order doesn't matter – Alain T. Sep 05 '21 at 16:22

ShadowRanger · Answer 7 · 2019-09-06T18:57:35.187

The other solutions have one of a few problems:

They don't preserve order, or
They don't remove a precise count of elements, e.g. for x = [1, 2, 2, 2] and y = [2, 2] they convert y to a set, and either remove all matching elements (leaving [1] only) or remove one of each unique element (leaving [1, 2, 2]), when the proper behavior would be to remove 2 twice, leaving [1, 2], or
They do O(m * n) work, where an optimal solution can do O(m + n) work

Alain was on the right track with Counter to solve #2 and #3, but that solution will lose ordering. The solution that preserves order (removing the first n copies of each value for n repetitions in the list of values to remove) is:

from collections import Counter

x = [1,2,3,4,3,2,1]  
y = [1,2,2]  
remaining = Counter(y)

out = []
for val in x:
    if remaining[val]:
        remaining[val] -= 1
    else:
        out.append(val)
# out is now [3, 4, 3, 1], having removed the first 1 and both 2s.

Try it online!

To make it remove the last copies of each element, just change the for loop to for val in reversed(x): and add out.reverse() immediately after exiting the for loop.

Constructing the Counter is O(n) in terms of y's length, iterating x is O(n) in terms of x's length, and Counter membership testing and mutation are O(1), while list.append is amortized O(1) (a given append can be O(n), but for many appends, the overall big-O averages O(1) since fewer and fewer of them require a reallocation), so the overall work done is O(m + n).

You can also test for to determine if there were any elements in y that were not removed from x by testing:

remaining = +remaining  # Removes all keys with zero counts from Counter
if remaining:
    # remaining contained elements with non-zero counts

Note: This *does* require the values to be hashable, but any solution that doesn't require hashable objects either isn't general purpose (e.g. can count `int`s into fixed length array) or has to do more than `O(m + n)` work (e.g. the next best big-O would be to make a sorted `list` of unique value/count pairs, changing `O(1)` `dict` lookups into `O(log n)` binary searches; you'd need unique values with their counts, not just sorted non-unique values, because otherwise you'd be paying `O(n)` costs to remove the elements from the sorted `list`). — ShadowRanger, Sep 06 '19 at 18:47
I think this is the best answer so far, but for reference purposes I think it would be better if it was refactored into a function, as I presume typing out 5+ lines of code every time one wants to subtract two lists is a bit cumbersome. — Somebody Out There, May 06 '22 at 14:31

rudolfbyker · Answer 8 · 2022-05-08T13:04:31.340

11

Looking up values in sets are faster than looking them up in lists:

[item for item in x if item not in set(y)]

I believe this will scale slightly better than:

[item for item in x if item not in y]

Both preserve the order of the lists.

edited May 08 '22 at 13:04

answered Jul 21 '15 at 14:44

rudolfbyker

2,034
1
20
25

Will it cache `set(y)` and not convert `y` to a new set on each loop? Otherwise, you'd need abarnert's answer: `ys = set(y); [i for i in x if i not in ys]`. – Jacktose May 23 '19 at 23:02
3

Some rough testing suggests that `if i not in set(y)` takes 25% longer than `if i not in y` (where `y` is a list). Pre-converting the set takes 55% less time. Tested with pretty short `x` and `y`, but differences should get more pronounced with length, if anything. – Jacktose May 23 '19 at 23:17
1

@Jacktose: Yeah, this solution does more work, because it has to iterate and hash *every* element of `y` for *every* element of `x`; unless the equality comparison is really expensive relative to the hash computation, this will always lose to plain `item not in y`. – ShadowRanger Sep 06 '19 at 18:28
@ShadowRanger which makes sense. If set conversion was a reliably quicker way to do that check, you'd think the compiler would just always do the check that way. – Jacktose Sep 10 '19 at 15:54

score 2 · Answer 9 · edited Sep 23 '21 at 15:28

2

We can use set methods as well to find the difference between two list

x = [1, 2, 3, 4, 5, 6, 7, 8, 9, 0]
y = [1, 3, 5, 7, 9]
list(set(x).difference(y))
[0, 2, 4, 6, 8]

edited Sep 23 '21 at 15:28

Simas Joneliunas

2,890
20
28
35

answered Sep 23 '21 at 05:02

Python Developer

21
3

Mateen Ulhaq · Answer 10 · 2023-01-31T04:08:10.890

Let:

>>> xs = [1, 2, 3, 4, 3, 2, 1]
>>> ys = [1, 3, 3]

Keep each unique item only once `xs - ys == {2, 4}`

Take the set difference:

>>> set(xs) - set(ys)
{2, 4}

Remove all occurrences `xs - ys == [2, 4, 2]`

>>> [x for x in xs if x not in ys]
[2, 4, 2]

If ys is large, convert only¹ ys into a set for better performance:

>>> ys_set = set(ys)
>>> [x for x in xs if x not in ys_set]
[2, 4, 2]

Only remove same number of occurrences `xs - ys == [2, 4, 2, 1]`

from collections import Counter, defaultdict

def diff(xs, ys):
    counter = Counter(ys)
    for x in xs:
        if counter[x] > 0:
            counter[x] -= 1
            continue
        yield x

>>> list(diff(xs, ys))
[2, 4, 2, 1]

¹ Converting xs to set and taking the set difference is unnecessary (and slower, as well as order-destroying) since we only need to iterate once over xs.

score 1 · Answer 11 · answered Aug 22 '13 at 14:06

Try this.

def subtract_lists(a, b):
    """ Subtracts two lists. Throws ValueError if b contains items not in a """
    # Terminate if b is empty, otherwise remove b[0] from a and recurse
    return a if len(b) == 0 else [a[:i] + subtract_lists(a[i+1:], b[1:]) 
                                  for i in [a.index(b[0])]][0]

>>> x = [1,2,3,4,5,6,7,8,9,0]
>>> y = [1,3,5,7,9]
>>> subtract_lists(x,y)
[2, 4, 6, 8, 0]
>>> x = [1,2,3,4,5,6,7,8,9,0,9]
>>> subtract_lists(x,y)
[2, 4, 6, 8, 0, 9]     #9 is only deleted once
>>>

score 1 · Answer 12 · edited Sep 26 '17 at 08:54

The answer provided by @aaronasterling looks good, however, it is not compatible with the default interface of list: x = MyList(1, 2, 3, 4) vs x = MyList([1, 2, 3, 4]). Thus, the below code can be used as a more python-list friendly:

class MyList(list):
    def __init__(self, *args):
        super(MyList, self).__init__(*args)

    def __sub__(self, other):
        return self.__class__([item for item in self if item not in other])

Example:

x = MyList([1, 2, 3, 4])
y = MyList([2, 5, 2])
z = x - y

score 0 · Answer 13 · edited Feb 20 '22 at 11:49

0

from collections import Counter

y = Counter(y)
x = Counter(x)

print(list(x-y))

edited Feb 20 '22 at 11:49

RiveN

2,595
11
13
26

answered Feb 20 '22 at 08:54

abi

9
3

score 0 · Answer 14 · answered May 16 '22 at 10:44

list1 = ['a', 'c', 'a', 'b', 'k'] 
list2 = ['a', 'a', 'a', 'a', 'b', 'c', 'c', 'd', 'e', 'f'] 
for e in list1: 
    try: 
        list2.remove(e) 
    except ValueError: 
        print(f'{e} not in list') 
list2 
# ['a', 'a', 'c', 'd', 'e', 'f']

This will change list2. if you want to protect list2 just copy it and use the copy of list2 in this code.

score -1 · Answer 15 · answered Jan 13 '17 at 15:24

This example subtracts two lists:

# List of pairs of points
list = []
list.append([(602, 336), (624, 365)])
list.append([(635, 336), (654, 365)])
list.append([(642, 342), (648, 358)])
list.append([(644, 344), (646, 356)])
list.append([(653, 337), (671, 365)])
list.append([(728, 13), (739, 32)])
list.append([(756, 59), (767, 79)])

itens_to_remove = []
itens_to_remove.append([(642, 342), (648, 358)])
itens_to_remove.append([(644, 344), (646, 356)])

print("Initial List Size: ", len(list))

for a in itens_to_remove:
    for b in list:
        if a == b :
            list.remove(b)

print("Final List Size: ", len(list))

13

Avoid this, it's O(N^2) – Alexander Mar 10 '17 at 05:38

score -1 · Answer 16 · answered Nov 29 '22 at 08:58

-1

def listsubtraction(parent,child):
    answer=[]
    for element in parent:
        if element not in child:
            answer.append(element)
    return answer

I think this should work. I am a beginner so pardon me for any mistakes

answered Nov 29 '22 at 08:58

Harshit Manocha

1

1

This will work, but it will be very slow for large lists, because `child` must be scanned once for each element in `parent`. – snakecharmerb Dec 02 '22 at 17:24

How do I subtract one list from another?

16 Answers16

Keep each unique item only once `xs - ys == {2, 4}`

Remove all occurrences `xs - ys == [2, 4, 2]`

Only remove same number of occurrences `xs - ys == [2, 4, 2, 1]`

Linked

Related

How do I subtract one list from another?

16 Answers16

Keep each unique item only once xs - ys == {2, 4}

Remove all occurrences xs - ys == [2, 4, 2]

Only remove same number of occurrences xs - ys == [2, 4, 2, 1]

Linked

Related

Keep each unique item only once `xs - ys == {2, 4}`

Remove all occurrences `xs - ys == [2, 4, 2]`

Only remove same number of occurrences `xs - ys == [2, 4, 2, 1]`