I have a table notifications
which contain a payload
column of type jsonb
with a gin index on this column. The talbe currently contains 2,742,691 rows
The table looks something like this:
id | payload | created_at |
---|---|---|
1 | {"customer": {"email": "foo@example.com", "externalId": 111 } | 2022-06-21 |
2 | {"customer": {"email": "foo@example.com", "externalId": 222 } | 2022-06-20 |
3 | {"customer": {"email": "bar@example.com", "externalId": 333 } | 2022-06-20 |
4 | {"customer": {"email": "baz@example.com", "externalId": 444 } | 2022-04-14 |
5 | {"customer": {"email": "baz@example.com", "externalId": 555 } | 2022-04-12 |
6 | {"customer": {"email": "gna@example.com", "externalId": 666 } | 2022-06-10 |
7 | {"customer": {"email": "gna@example.com", "externalId": 666 } | 2022-06-11 |
I am trying to query a list of email addresses that match the following condition:
- multiple rows for the same
email
address exist - one of those rows does have a different
externalId
than one of the previous ones created_at
is within the last month
For the example table contents, this should only return foo@example.com
because
bar@example.com
only appears oncebaz@example.com
doesnt have a row that was created within the last monthgna@example.com
has multiple rows but all of them have the sameexternalId
What I was trying is using a LEFT JOIN LATERAL
like this:
select
n.payload -> 'customer' -> 'email'
from
notifications n
left join lateral (
select
n2.payload -> 'customer' ->> 'externalId' tid
from
notifications n2
where
n2.payload @> jsonb_build_object(
'customer',
jsonb_build_object('email', n.payload -> 'customer' -> 'email')
)
and not (n2.payload @> jsonb_build_object(
'customer',
jsonb_build_object('externalId', n.payload -> 'customer' -> 'externalId')
))
and n2.created_at > NOW() - INTERVAL '1 month'
limit
1
) sub on true
where
n.created_at > NOW() - INTERVAL '1 month'
and sub.tid is not null;
however, this is taking ages to run. The Query plan for this looks like https://explain.depesz.com/s/mriB
QUERY PLAN
Nested Loop (cost=0.17..53386349.38 rows=259398 width=32)
-> Index Scan using index_notifications_created_at on notifications n (cost=0.09..51931.08 rows=259398 width=514)
Index Cond: (created_at > (now() - '1 mon'::interval))
-> Subquery Scan on sub (cost=0.09..205.60 rows=1 width=0)
Filter: (sub.tid IS NOT NULL)
-> Limit (cost=0.09..205.60 rows=1 width=32)
-> Index Scan using index_notifications_created_at on notifications n2 (cost=0.09..53228.33 rows=259 width=32)
Index Cond: (created_at > (now() - '1 mon'::interval))
Filter: ((payload @> jsonb_build_object('customer', jsonb_build_object('email', ((n.payload -> 'customer'::text) -> 'email'::text)))) AND (NOT (payload @> jsonb_build_object('customer', jsonb_build_object('externalId', ((n.payload -> 'customer'::text) -> 'externalId'::text))))))
JIT:
Functions: 13
Options: Inlining true, Optimization true, Expressions true, Deforming true
Any pointers what I'm doing wrong here / how to optimize this?