0

I have 2 lists of strings

List1 = [["Hey there"], ["hi"], ["hello"]]
List2 = [["hi"], ["hello"]]

Is there a O(n) way to remove elements of List2 from List1?

Desired output = [["Hey there"]]

  • List1 - List2 maybe? – rv.kvetch Sep 03 '22 at 02:15
  • 3
    I think you'll have to convert them to sets of tuples to get O(n), using basic language structures. – Kenny Ostrom Sep 03 '22 at 02:17
  • 1
    FWIW, [here's an *O(n\*m)* method](/a/3428637/4518341) – wjandrea Sep 03 '22 at 02:18
  • @Codingamethyst I don't see a nested loop in the question statement. – rv.kvetch Sep 03 '22 at 02:19
  • Is there a way to conserve the list of list format and do it? – Codingamethyst Sep 03 '22 at 02:19
  • Yes, exactly as how it appears in the linked answer. – rv.kvetch Sep 03 '22 at 02:20
  • @rv.kvetch sorry I meant to say O(n) way only, I'll edit it – Codingamethyst Sep 03 '22 at 02:20
  • Are the lists all one element? I tried out Kenny's suggestion (actually a variation on it), but I realized that converting to tuple is *O(k)*, which means `set(map(tuple, List2))` is *O(n\*k)*, and I'm basically doing the same thing for `List1`, so in total it's *O(n\*k + m\*j)*. But if all the lists are one-element, *k=1* and *j=1*. In that case I'd also question why they're lists in the first place instead of just the strings, but maybe that's beside the point. – wjandrea Sep 03 '22 at 02:35
  • 1
    @wjandrea No, the lists actually contain more than one elements, long sentences rather. I used 1 element lists just as an example. – Codingamethyst Sep 03 '22 at 02:40
  • 1
    Then I don't think it's possible to get *O(n)* since the data itself is *n\*k* size, so anything that consumes all of it must be *O(n\*k)*. Even comparison for example: `List1 == List2` is *O(n\*k + m\*j)*. – wjandrea Sep 03 '22 at 02:53

1 Answers1

1

You can do this with two O(n) steps:

List2_set = set(map(tuple, List2))
List1_filtered = [row for row in List1 if tuple(row) not in List2_set]
  1. Convert the list of items to exclude to a set of tuples

    • This conversion is O(n)
    • set is required because checking membership for sets is O(1) instead of O(n)
    • tuple is required for set items instead of list, because tuple is hashable
  2. Check membership for each element of List1

    • This check is also O(n)
    • The set collection uses a hash table to allow O(1) membership testing

The total then is O(n) + O(n) => O(n), that is, the performance is linear with number of total elements of List1 + List2.

Pi Marillion
  • 4,465
  • 1
  • 19
  • 20