10-minute SQL: Filtered Delete: join vs. sub-select vs. exist

I was doing some simple tuning of procedures for a client when I came across some delete statements that I wanted to look into some more. They were filtering based on values in a temp table.

In a majority of cases I have come across at clients a row is deleted from a single table based on some static and unique identifying characteristic that is directly related to that row and that’s it. In this case we were looking to remove a more complex item from a shopping cart and had to make sure any ‘child’ items were removed as well. This can be done in a variety of ways but I wanted to look into 3 methods in particular. Below was a quick test I threw together to test out some timings and IO for different methods of performing the same delete transaction using a sub-select, exists and an inner join.

First – I setup a couple of tables, loaded the first with 50k random values and the second I randomly selected 5k form the first table to be deleted

Then I went and wrapped a series of delete statements in transactions so I could capture the info and not have to reload tables constantly.

Running the above I was getting the following results:

*Note – I have a multi-core laptop, so CPU time is collective of all threads & cores and may show higher than the actual execution time. I did not show parse or compile times since they were both zero

Trial 1: Sub-select

Trial 2: Exists

Trial 3: Inner join

With the way the SQL Engine tunes things I was thinking that the inner join could be quicker, but in this example it was running in about the same time as the sub-select statement. but was  taking more CPU to perform. Due to that simple little bit the sub-select was more efficient at doing what it had to do, even if it took the same amount of time to complete. By using less CPU resources  & time the system has more resources to use for other transactions and tasks.

Just goes to show how a simple little test while showing the statistics from the execution of the statements can reveal what may seem like just as efficient a process in time spent really is taking more resource. SQL isn’t just all about the time it takes to finish something, but it is about the sharing of resources as well.

After having run the tests I cleaned up the tables from my test DB

Hopefully this will help to inspire you to run some tests of your own when you have multiple ways you can perform a task in SQL.

Comments 2

  1. Thanks for the article. I learned a couple of things I didn’t know before.

    There is one mistake however, the EXISTS method will delete all the rows in del1. I think what you wanted is this:

    DELETE FROM dbo.del1
    WHERE EXISTS (SELECT 1 FROM dbo.del2 where del1.ident = ident2);

    1. Post
      Author

      Yay for typos! Now you know I’m human! You also know to never just trust anyone’s code online. proof it yourself first and understand what’s going on!
      Good catch! I have updated the article with what you provided here.
      When I reran the test on a newer laptop I received the following results:

      Trial 1: (Sub select)
      Table ‘DEL1’. Scan count 1, logical reads 240, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
      Table ‘del2’. Scan count 1, logical reads 9, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

      SQL Server Execution Times:
      CPU time = 32 ms, elapsed time = 31 ms.

      Trial 2: (Exists)
      Table ‘DEL1’. Scan count 1, logical reads 240, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
      Table ‘del2’. Scan count 1, logical reads 9, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

      SQL Server Execution Times:
      CPU time = 31 ms, elapsed time = 37 ms.


      Trial 3: (Inner Join)

      Table ‘DEL1’. Scan count 1, logical reads 240, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
      Table ‘del2’. Scan count 1, logical reads 9, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

      SQL Server Execution Times:
      CPU time = 31 ms, elapsed time = 28 ms.

      Which in this scenario showed that all steps had the same number of reads, but with varying times and that the Inner join check was the least resource intensive. Does this mean it’s the best. Actually, no! When I reran the tests I would get varying times as different things were running on my system. The key is to try different methods, and make sure you are picking the one that works best for your environment and situation. Test it a few times and figure out how consistent it is and you are certainly on the path to a better query overall.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.