Type 2 SCD with SQL Merge

I was going through some notes I had from previous projects and came across a sample script for created a Type 2 Slow Changing Dimension (SCD) in a database or data warehouse.
A Type 2 SCD is one where new records are added, but old ones are marked as archived and then a new row with the change is inserted. This allows for a complete historical trail of the row’s changes in detail. Sometimes this can be overkill, but in some cases it is required. The client was doing medical claims processing, so we were required to keep a full history of key information.

In the past (read: pre-2008 versions of SQL) this used to have to be done with a bunch of little code and triggers to make sure the old data was kept.
On one of my recent projects I decided I wanted to use the new MERGE statement to create this same effect, with a LOT less code to maintain.

What actually made this possible is the MERGE statement’s output option. The merge statement actually captured all the changes it makes and tags them in an output table that you can reference. This feature let me capture information that was in the merge portion and pass it through to another statement.

Ok, enough blather, here’s the code sample:

I know, it’s a little jumbled, but let’s break it into its basic parts:

First, the main insert statement:

That should seem pretty familiar – just a simple insert statement from a source that contains a field (action_out) we can filter on.

Now for the more tricky part, the merge statement:

OK, still a little confusing so I’m going to describe what’s going on in plain English but first a quick note:
The current_flag field was added to the SCD so that an index could be created on it for real time reporting rather than spending the time searching for the current record through the detailed history. This is one of those cases that knowing what a customer really wants, regardless of what they ask for can make for simple performance tuning.

Now for the logic of the “Joe on the street” description of the merge statement:

I am going to attempt to merge data into the Customer Master table from the Customer Source table where the ID values in each table match.

If I do not find a match in the customer source table I am going to insert a row with the values I require and move to the next record.

If I do find a match, and the current_flag is ‘y’ and the last names do not match, then I want to update the Master table with the new values and then move to the next record.
Every other match/no match scenario in this sample, I don’t need so there are no further checks done.

When I’m done with the merge, I want to access the internal table and pass certain values out to any other statements. Note the $Action field in the output, this is what tells us the action the merge statement performed and will be what is used on the outer insert statement to perform the new record insert.

So putting it all together you get this:

Hopefully this will help you see just how powerful the merge statement in SQL Server can be.
By using this process we streamlined not only our code base for the data warehouse being built, but also the performance of maintaining a Type 2 SCD.

 

Helpful links:

SQL Merge Statement: http://technet.microsoft.com/en-us/library/bb510625.aspx

SCD Types: http://www.folkstalk.com/2012/03/slowly-changing-dimensions-scd-types.html

Comments 7

  1. Pingback: Type 2 SCD with SQL Merge - SQL Server - SQL Server - Toad World

  2. awesome post – thank you – exactly what I was looking for and so much cleaner and more efficient than an SSIS SCD type 2. The beauty of this is that it’s sooooo easy to replicate across other tables. Can’t thank you enough.

  3. Hello,

    thanks a lot for explaining the Merge. However there’s certain scenario and a question that I have: can you please help me?

    Question: is it OK to have the expired row’s ENDing timestamp same as the STARTing of the newer version? or should there be a intentional lag? (this will affect how the READ program is coded for: the former cannot be written with a BETWEEN clause.

    Scenario: for example,

    we loaded records on Business day 1 so all are current
    On business day 2, there were few records “updated” hence SCD 2 generated expired / current rows

    now if there are some retro adjustments needed for Business Day 1, how can this be handled?

    Best,

    GS

  4. very useful. I have a question though. if merge fails, then how to log errors in table saying which row,column data caused what error, so that support could take a look and fix data and rerun job?

  5. Post
    Author

    GS – for setting the dates, that is entirely up to you and your company/system requirements. This post is more to show you the tools and scripts and the basics of the idea so you can take it and customize it any way you need. Maybe not the exact answer you are looking for, but hopefully, it heads you down the right path for your needs.

     

    Rajen – There is no error handling in this sample code. You could add in try logic and then redirect if there is an error. If you mean a row that does not match, it would be handled by the ‘WHEN NOT MATCHED’ logic, which could just write it off to a different table if you wanted. If that doesn’t cover what you were looking for, I’m not sure I understand the question entirely.

  6. Nice post, saved my butt just now :), spend a whole day banging my head as to why why why! it was closing the rows but not inserting new rows. found out i was missing the whole wrapper around the merge parts that does the inserting of new rows.

     

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.