I was doing some simple tuning of procedures for a client when I came across some delete statements that I wanted to look into some more. They were filtering based on values in a temp table.
In a majority of cases I have come across at clients a row is deleted from a single table based on some static and unique identifying characteristic that is directly related to that row and that’s it. In this case we were looking to remove a more complex item from a shopping cart and had to make sure any ‘child’ items were removed as well. This can be done in a variety of ways but I wanted to look into 3 methods in particular. Below was a quick test I threw together to test out some timings and IO for different methods of performing the same delete transaction using a sub-select, exists and an inner join.
First – I setup a couple of tables, loaded the first with 50k random values and the second I randomly selected 5k form the first table to be deleted
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | SET NOCOUNT ON; CREATE TABLE dbo.DEL1 (ident INT IDENTITY PRIMARY key, Descript VARCHAR(10)); CREATE TABLE dbo.del2 (ident2 INT); go -- insert 50k rows into first table INSERT dbo.DEL1 (descript) SELECT (LEFT(NEWID(),10)); GO 50000 -- insert random 5k of the first table's rows into second table INSERT dbo.del2 (ident2) SELECT TOP 5000 ident FROM dbo.del1 ORDER BY NEWID(); -- verify values there SELECT * FROM dbo.del1 SELECT * FROM dbo.del2 |
Then I went and wrapped a series of delete statements in transactions so I could capture the info and not have to reload tables constantly.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 | -- start testing the deletes SET STATISTICS TIME ON; SET STATISTICS IO ON; -- trial 1 - subselect ~ 60 ms BEGIN TRANSACTION DELETE FROM dbo.del1 WHERE ident IN (SELECT ident2 FROM dbo.del2); ROLLBACK transaction; -- trial 2 - Exists ~ 1750 ms BEGIN TRANSACTION DELETE FROM dbo.del1 WHERE EXISTS (SELECT 1 FROM dbo.del2 where del1.ident = ident2); ROLLBACK transaction; -- trial 3 - Inner join ~70 ms BEGIN TRANSACTION DELETE D1 FROM del1 D1 INNER JOIN del2 ON d1.ident = del2.ident2; ROLLBACK TRANSACTION; COMMIT TRANSACTION; SET STATISTICS TIME OFF; SET STATISTICS IO OFF; |
Running the above I was getting the following results:
*Note – I have a multi-core laptop, so CPU time is collective of all threads & cores and may show higher than the actual execution time. I did not show parse or compile times since they were both zero
Trial 1: Sub-select
1 2 3 4 5 6 7 | SQL Server Execution Times: CPU time = 0 ms, elapsed time = 0 ms. Table 'DEL1'. Scan count 1, logical reads 256, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. Table 'del2'. Scan count 1, logical reads 17, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. SQL Server Execution Times: CPU time = 47 ms, elapsed time = 72 ms. |
Trial 2: Exists
1 2 3 4 5 6 | Table 'del2'. Scan count 4, logical reads 256, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. Table 'DEL1'. Scan count 5, logical reads 17, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. SQL Server Execution Times: CPU time = 48 ms, elapsed time = 72 ms. |
Trial 3: Inner join
1 2 3 4 5 | Table 'DEL1'. Scan count 1, logical reads 256, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. Table 'del2'. Scan count 1, logical reads 17, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. SQL Server Execution Times: CPU time = 78 ms, elapsed time = 71 ms. |
With the way the SQL Engine tunes things I was thinking that the inner join could be quicker, but in this example it was running in about the same time as the sub-select statement. but was taking more CPU to perform. Due to that simple little bit the sub-select was more efficient at doing what it had to do, even if it took the same amount of time to complete. By using less CPU resources & time the system has more resources to use for other transactions and tasks.
Just goes to show how a simple little test while showing the statistics from the execution of the statements can reveal what may seem like just as efficient a process in time spent really is taking more resource. SQL isn’t just all about the time it takes to finish something, but it is about the sharing of resources as well.
After having run the tests I cleaned up the tables from my test DB
1 2 | DROP TABLE dbo.del1; drop table dbo.del2; |
Hopefully this will help to inspire you to run some tests of your own when you have multiple ways you can perform a task in SQL.
Comments 2
Thanks for the article. I learned a couple of things I didn’t know before.
There is one mistake however, the EXISTS method will delete all the rows in del1. I think what you wanted is this:
DELETE FROM dbo.del1
WHERE EXISTS (SELECT 1 FROM dbo.del2 where del1.ident = ident2);
Author
Yay for typos! Now you know I’m human! You also know to never just trust anyone’s code online. proof it yourself first and understand what’s going on!
Good catch! I have updated the article with what you provided here.
When I reran the test on a newer laptop I received the following results:
Trial 1: (Sub select)
Table ‘DEL1’. Scan count 1, logical reads 240, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table ‘del2’. Scan count 1, logical reads 9, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 32 ms, elapsed time = 31 ms.
Trial 2: (Exists)
Table ‘DEL1’. Scan count 1, logical reads 240, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table ‘del2’. Scan count 1, logical reads 9, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 31 ms, elapsed time = 37 ms.
Trial 3: (Inner Join)
Table ‘DEL1’. Scan count 1, logical reads 240, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table ‘del2’. Scan count 1, logical reads 9, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 31 ms, elapsed time = 28 ms.
Which in this scenario showed that all steps had the same number of reads, but with varying times and that the Inner join check was the least resource intensive. Does this mean it’s the best. Actually, no! When I reran the tests I would get varying times as different things were running on my system. The key is to try different methods, and make sure you are picking the one that works best for your environment and situation. Test it a few times and figure out how consistent it is and you are certainly on the path to a better query overall.