Bulk update Postgres table

Question

I have a table with around 200 million records and I have added 2 new columns to it. Now the 2 columns need values from a different table. Nearly 80% of the rows will be updated.

I tried update but it takes more than 2 hours to complete.

The main table has a composite primary key of 4 columns. I have dropped it and dropped an index that is present on a column before updating. Now the update takes little over than 1 hour.

Is there any other way to speed up this update process (like batch processing).

Edit: I used the other table(from where values will be matched for update) in from clause of the update statement.

score 1 · Answer 1 · answered Feb 19 '20 at 11:35

Not really. Make sure that max_wal_size is high enough that you don't get too many checkpoints.

After the update, the table will be bloated to about twice its original size. That bloat can be avoided if you update in batches and VACUUM in between, but that will not make processing faster.

score 1 · Answer 2 · answered Feb 19 '20 at 12:07

Do you need whole update in single transaction? I had quite similar problem, with table that was under heavy load, and column required not null constraint. Do deal with it - I did some steps:

Add columns without constraints like not null, but with defaults. That way it went really fast.
Update columns in steps like 1000 entries per transaction. In my case load of the DB rise, so I had to put small delay.
Update columns to have not null constraints.

That way you don't block table for long time, but that is not an answer to your question.

First to validate where you are - I would check iostats to see if that is not the limit... To speed up, I would consider:

higher free space map - to be sure DB is aware of entries that can be removed, but note that if pages are packed to the limit it would not bring much...
maybe foreign keys referring to the table can be also removed? To stop locking the table,
removing all indices since they are slowing down, and create them afterwords - that looks like slicing problem but other way, but is an option, so counts...

Thank you. Can you let me know how to do like 1000 entries per transaction? Here my primary key has 4 columns. — Mano, Feb 19 '20 at 12:14
@Manoharan https://stackoverflow.com/questions/22351039/committing-transactions-while-executing-a-postgreql-function - you can't do that directly from PG - I have Python script for that... — Michał Zaborowski, Feb 19 '20 at 12:31

score 1 · Answer 3 · answered Feb 21 '20 at 14:54

There is a 2 type of solution to your problem.

1) This approach work if your main table doesn't update or inserted during this process

First create the same table schema without composite primary key and index with a different name.
Then insert the data in the new table with join table data.
Apply all constraints and indexes on the new table after insert.
Drop the old table and rename the new table with the old table name.

2) Or you can use a trigger to update that two-column on insert or update event. (This will make insert update operation slightly slow)

Thanks a lot for this suggestion. Will try and post if it brings down the execution time. — Mano, Feb 21 '20 at 15:07

Bulk update Postgres table

3 Answers3