SQL 2008 CDC – Net vs All and Slow Changing Dimensions of type 2

As noted before I am using SQL 2008 CDC to incrementally update a data warehouse for the project I’m currently working on. I found this snippet of code that inserts a new record. If matching an existing row, it marks that row as no longer active and then inserts the new row as the new active row. A process I’ve seen referenced as Type 2 Slow Changing Dimensions (SCD).

The Code Snippet:

8<—————————————————————————————————————

( Assumes that the DW.<Tablename> has 3 additional columns appended:

        DW_Insert_DT (datetime),
        DW_Change_Dt  (datetime)
        and DW_Active (Bit)
)

INSERT INTO DW.<Tablename>
    (Key_1,
     Field_1,
     Field_2)

SELECT
    Key_1,
    Field_1,
    Field_2
FROM (
    MERGE INTO DW.<tablename> as trgt
    USING Src.<tablename> as src
        ON (trgt.Key_1 = src.Key_1)
    WHEN NOT MATCHED
        THEN
            INSERT
            VALUES (
                Key_1,
                Field_1,
                Field_2
                ,getdate() — DW_Insert_Dt
                ,NULL — DW_Change_Dt
                ,1 — DW_Active
                )
    WHEN MATCHED
        AND trgt.dw_active = 1
        THEN
            UPDATE
            SET trgt.dw_active = 0
                ,trgt.dw_date_changed = getdate()
                
    OUTPUT $ACTION Action_Out
            ,src.Key_1
            ,src.Field_1
            ,src.Field_2
    ) AS MERGE_OUT
WHERE MERGE_OUT.Action_Out = ‘UPDATE’;

8<—————————————————————————————————————

At first the nested statements did not make sense to me, and I had to pick it apart to really get it. I tend to do this by staging a conversation between me and the SQL server process. Sometimes humanizing it makes it easier to understand, even if you do get some funny looks from co-workers.

The Conversation:

Me: Hey SQL, check this record I’m trying to merge. Does it match en existing one?

SQL: Nope, no match.

Me: OK, cool. Insert the record and populate the additional DW change auditing columns appropriately.

SQL: Sure thing boss

Me: Hey SQL, are you keeping track of what we are doing?

SQL: Yes, I am building an output table that records all the things you are doing in this merge. I’m even categorizing them so you can look at just the updates, inserts, deletes etc.

Me: Ok, next record: check this record I’m trying to merge. Does it match en existing one?

SQL: Yepp. It does

Me: Is that record marked active?

SQL: Yes

Me: OK, Change the active flag to 0 (false) and mark the DW_Updated column with the current datetime.

SQL: Sure thing boss

Me: This is going overly well, you still have that list of actions right?

SQL: Yes

Me: Great, I want to see just those lines that I updated and marked as inactive because they matched on the key. Can I get that easily?

SQL: Yepp, just query my output from the merge and specify you want the $Action to be ‘UPDATE’

Me: What if I want the values from the source that we didn’t do anything with?

SQL: Just add them to the output clause

Me: So, if I do that, we could just insert them into the Data Warehouse table right?

SQL: Yepp

Me: Awesome! This is such a time and code saver! Thank you!

SQL: Sure thing boss

 

So it went, we worked our way through the various merged data and things were good.

Then we came across a table that was updated using the cdc.fn_cdc_get_all_changes_capture_instance which I mentioned briefly in a previous post. This table did not have a unique key of any type, so all changes were captured, not just the NET changes using the cdc.fn_cdc_get_net_changes_capture_instance function. This meant the row that matched, had multiple records in the source table that would have to merge with the DW table.

Needless to say SQL didn’t like that

 

Me: Ok, next table and recordset, let’s do this thing!

SQL: Uhmmm… No.

Me: Wha?

SQL: The MERGE statement attempted to UPDATE or DELETE the same row more than once. This happens when a target row matches more than one source row. A MERGE statement cannot UPDATE/DELETE the same row of the target table multiple times. Refine the ON clause to ensure a target row matches at most one source row, or use the GROUP BY clause to group the source rows.

Me: So, I have to rework this wonderful snippet to make ti work differently with cdc.fn_cdc_get_all_changes_capture_instance data?

SQL: Yes

Me: DAMMIT!

 

So, here I am, heading of to figure how I want to work around this for the few tables I need in the Data Warehouse that do not have unique keys in the source. I’ll be sure to update the conversation with what I come up with, when I get to that point.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.