×
INTELLIGENT WORK FORUMS
FOR COMPUTER PROFESSIONALS

Log In

Come Join Us!

Are you a
Computer / IT professional?
Join Tek-Tips Forums!
  • Talk With Other Members
  • Be Notified Of Responses
    To Your Posts
  • Keyword Search
  • One-Click Access To Your
    Favorite Forums
  • Automated Signatures
    On Your Posts
  • Best Of All, It's Free!
  • Students Click Here

*Tek-Tips's functionality depends on members receiving e-mail. By joining you are opting in to receive e-mail.

Posting Guidelines

Promoting, selling, recruiting, coursework and thesis posting is forbidden.

Students Click Here

Jobs

SCD 2 and fact tables.

SCD 2 and fact tables.

SCD 2 and fact tables.

(OP)
Hello Members,
I am in the process of learning datawarehouse concepts. I have read through few books. However, i have one question about SCD Type 2 dimensions. For example, if there is an person X and he is located in CO and he has some data in fact table. These two tables are joined by personid key. Now person x has moved to CA. So a new record has been inserted into the person dimension which will be having a new id. How the fact data of the person x is now linked to person dimension for person x for the newly inserted record ?


Thanks in Advance!

RE: SCD 2 and fact tables.

The primary key of your Person dimension will be a surrogate key.  This will be something with no meaning, like an incrementing integer.

So let's say PersonID #5 has moved from CO to CA.  His row for CO may have PersonKey #100.  His row for CA may have PersonKey #800.  But they both have PersonID #5.  Your fact table will have PersonKey as the foreign key, not PersonID.

Does this make sense?

RE: SCD 2 and fact tables.

To build on RiverGuy's example

When the fact record was inserted, the surrogate key pointed to the first PersonId #5 record (i.e with surrogate key #100).

When Person#5 moved to "CA" a new record was inserted into the Person dimension table with a surrogate key of #800.

The fact record however continues to point to record #100 in the Person dimension table because that record represents the way that person looked like when the fact record was created. This is how the fact record continues to maintain an accurate representation of history. In your example, the person was a salesman for the CO area at the time the sale was made. Therefore his sales get credited for the CO area.

Type 2 Dimension tables can be a pain:

For example, you may not always want your fact record to hang on to all the history of the dimension record. For example, you are likely to want to have the most current mailing address of the person which would be represented by the last generation of the record.

Then there is the problem of corrections being made to fields that trigger the creation of a new record. I have seen fields toggle back and forth.

 

RE: SCD 2 and fact tables.

(OP)
RiverGuy and dkyrtata,

Thanks so much for valuable inputs. The examples you guys have given make perfect sense. How ever the issue mentioned by dkyrtata thrown me off again. So with SCD type2, if the person has a different address after moving to CA, if you want your facts tied to the most recent address, how to manage the facts then ? Do you insert a new record in fact table as well ? If yes, how would the load the data in facts table ? What should be the approach ?

Thanks in Advance!

RE: SCD 2 and fact tables.

Good Question. I would like to hear what others do to handle this. Because of the problems associated with Type 2 dimensions, I avoid using them - preferring to stick with Type 1 when possible

No, do not insert a new fact record. That will just mess up your aggregations when you inadvertently double-count your sales from the redundant records. Plus you would create a maintenance nightmare when you have to traverse all your fact tables to make adjustments caused by changes (if any) to your dimension records. Keep in mind, your dimension tables will be referenced by any number of fact tables. And your fact tables will increase in number and size as your data warehouse/mart matures.

Here is something you can consider:

Add a surrogate key field, CurrPersonKey, to your dimension table that would point to the last generation of each record. This means that every time you insert a new record, the CurrPersonKey field would be updated for all previous generations of the record. So in the example, set CurrPersonKey=800

Here is the SQL code that would update all generations except the last (Last generation would have an expiry-date of 31-Dec-9999)

CODE

UPDATE Person prev
   SET prev.currPersonKey=(SELECT personKey FROM person curr
                            WHERE curr.personID = prev.personID
                              AND curr.row_expiry_date = TO_DATE('31Dec9999','DDMonYYYY') -- Current generation
                          )
 WHERE prev.PersonID='#5'
   AND prev.row_expiry_date != TO_DATE('31Dec9999','DDMonYYYY') -- Expired (previous) generation

RE: SCD 2 and fact tables.

Quote:


If yes, how would the load the data in facts table ? What should be the approach ?

You handle it in your ETL.

Here is how your dimension looks (note that I am not displaying the state columns):

CODE

PersonKey     PersonID     ValidFrom     ValidTo
---------     --------     ---------     -------
100           5            1900-01-01    2011-03-31
800           5            2011-04-01    NULL


When your ETL runs to populate your dimension, if you detect a Type-2 change, you pick your dimension row which has a NULL ValidTo.  You then "close out" that row by putting in the date that the row is good until.  You then insert a new row with today's date as ValidFrom, and a NULL for ValidTo.

When your ETL runs to populate your fact table, you lookup or  join on the business key, which is PersonID in the case.  But you also lookup on the date range between ValidFrom and ValidTo.  So if your fact row is from 2010, your going to pick up PersonKey # 100 because 2010 is between 1900 and 2011-03-31.  If your fact row is from 2011-04-25, you're going to pick up PersonKey #800.  Get it?
 

Red Flag This Post

Please let us know here why this post is inappropriate. Reasons such as off-topic, duplicates, flames, illegal, vulgar, or students posting their homework.

Red Flag Submitted

Thank you for helping keep Tek-Tips Forums free from inappropriate posts.
The Tek-Tips staff will check this out and take appropriate action.

Reply To This Thread

Posting in the Tek-Tips forums is a member-only feature.

Click Here to join Tek-Tips and talk with other members! Already a Member? Login

Close Box

Join Tek-Tips® Today!

Join your peers on the Internet's largest technical computer professional community.
It's easy to join and it's free.

Here's Why Members Love Tek-Tips Forums:

Register now while it's still free!

Already a member? Close this window and log in.

Join Us             Close