Fast deletion of many rows in data warehouse data

Question

In SQL Server 2008 I have some million rows of data which needs be deleted. They are scattered across a handful of tables. Deletion takes up to 20 seconds which I think is way to slow! The data to be deleted is identified by a timestamp column. Here is what I have done so far in order to optimize:

Using isolation level read uncommitted. I don't care about transactions. If we fail the user will issue the delete operation again. And new data is ensured not to have the timestamp we are deleting.
Deleting leaf tables before parent tables.
The timestamp column is part of the PK clustered index, in fact its the first position of the PK/index.
Each table is emptied using a loop which deletes top 200000 entries in order to reduce the transaction log overhead.
Neither I/O nor CPU is maxed out on the server

What have I overlooked?

Also I am in doubt of the effect of moving the timestamp column to the first position in the PK. After doing so, must I reorganize the tables or is SQL Server smart enough to do this itself. My understanding of clustered index is that since it defines the physical layout of the rows, it is force into reorganizing the data. But we have no complaints from the customer that the changing clustered index operation took a long time to perform.

Why does this need to be fast? Have you considered partitioning? — Mat, Oct 28 '12 at 12:33
My understanding was that delete operations taking more than a few seconds were done wrong. Thus my thinking was that I've overlooked something. — Carlo V. Dango, Oct 28 '12 at 12:48
Delete is an expensive operation. 20s for millions of lines does not seem "wrong" on the face of it. — Mat, Oct 28 '12 at 12:51
I agree with Mat - you should consider table partitioning by the timestamp mentioned above. You can do a monthly of weekly partition, whatever you need. This will make your deletes easier because you can "drop" the entire partition that you do not want anymore. There are two ways to do it - 1). you can delete using the partition function so that the query only affects one paritition. 2). you can swap the partition to a different table and truncate it. — Nabheet, Dec 13 '12 at 23:04
Optimal deletes speeds are about 200k rows per second. You might be getting an optimal case here. Are you deleting 4m rows? If yes there is little that can be done. Are you interested in modest increases or do you need radical improvements? — usr, Aug 24 '15 at 10:47
Try to drop/recreate the indexes before/after delete and/or change the fill factor. — Dexion, Oct 20 '15 at 15:56

score 0 · Answer 1 · answered Feb 12 '18 at 22:07

Please make sure the tables you want to delete data from has "primary key" specifically indicated.

Wrong: create table myTable (ID int)

True: create table myTable (ID int PRIMARY KEY)

In addition to that, please try to add "option (recompile)", which will help the performance:

DELETE FROM myTable 
WHERE timestamp in (select timestamp from other_table)
OPTION (RECOMPILE)

Fast deletion of many rows in data warehouse data

1 Answers1