Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations gmmastros on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Populating Dimension and Fact Tables 1

Status
Not open for further replies.

Jony77

IS-IT--Management
May 4, 2008
9
0
0
LU
Hello,

I am a newbie in datawarehousing and gradually working hard to understand how to implement it. Assume one of my dimesion table is called Location which has the following columns: Location_id, Country, State, District, Town. My question is if I want to populate the above table will each row have distinct Location_id or will all the rows have the same Location_id?

The same question applies on the Fact_Table.

Thanks,

Jony
 
Hello,

I think I was able to figure-out the solution. I previously wanted to know if when populating a dimension table if all the rows will have the same surrogate key which is yes.
 
Each row should have a different Surrogate Key. Your surrogate key should be your primary key on the dimension table. Even if you have have Slowly Changing Dimensions and a type 2 event occurs to cause a new dimension member to be added the keys should be unique.

Paul
---------------------------------------
Shoot Me! Shoot Me NOW!!!
- Daffy Duck
 
Hello MDXer,

You said and I quote "Each row should have a different Surrogate Key". So, do you mean that each row in a dimension table should have different surrogate key or do you mean in the fact table? If each row in a dimension table should have a different surrogate key then how do we populate the fact table?

Thanks,

Jony
 
The fact table is tied to the dimension table via keys. You can use natural keys (not recommended) or surrogate (also known as synthetic) keys. These are integers, which are efficient at joining. The SK's also allow for changing dimensions across time. Let's say you have a customer address dimension, and the customer moves. Well, you just start a new customer address record with a new SK and retire the obsolete customer address record. However, all history associated with the old address remains. And you can still unify everything if needed via a UNION query.

-------------------------
The trouble with doing something right the first time is that nobody appreciates how difficult it was - Steven Wright
 
Hello,
I still want to get it right. Assume my Sales Dimension table is:

sales_key smallint not null,
sales_id smallint not null,
sales_date date not null,
product_id smallint not null,
sold_by varchar(200) not null

The above table is populated with the following:

1,4,2007-07-01,36,Marley
2,5,2008-06-05,37,Peter
3,8,2008-06-08,38,Marley

Branch dimension:
branch_key smallint not null,
branch_id smallint not null,
branch_name varchar(200) not null,
country varchar(200) not null,

The above table is populated with the following:

37,17,Brisbane,Australia
38,99,Berlin,Germany
39,44,Tokio,Japan

Fact Table:
sales_key smallint not null,
branch_key smallint not null,
total_sales integer

My problem is each dimension row in the dimension table will have different surrogate key so how will my fact table look like. Just an example.

Thanks,

Jony
 
Take this example:

Product dimension:
Key
Natural_key
description
valid_from
valid_to

Products:
key natural key description valid_from valid_to
1 BNA Bananas 1-1-1800 15-12-2007
2 APL Apples 1-1-1800 31-12-9999
3 ORA Oranges 1-1-1800 27-08-2007
...
30 BNA Chiquitas 16-12-2007 31-12-9999
31 ORA Jellybeans 28-08-2007 31-12-9999

The date 31-12-9999 is a date to indicate infinity, the end date is not yet known.

Dimension Salesmanager
Key
Natural_key
Name
Title
Valid_from
Valid_to

Salesmanagers
Key Natural_key Name Valid_from Valid_to
1 19600301001 George Bush 03-01-1960 15-11-2007
2 19631126002 Arnold Schwarz 26-11-1963 31-12-9999
..
432 19600301001 George W Bush 16-11-2007 31-12-9999

And the actual sales:
19600301001 sold 20 BNA and 10 APL on october 25 2007
19600301001 sold 5 ORA and 7 APL on january 4 2008
19631126002 sold 102 APL on april 21 2008

The fact table:
Product_key
Salesmanager_Key
Sales_date
Number_sold

Both keys here refer to the respective dimensions and are surrogate keys.

Contents of the fact table:
Product_key Salesmanager_key Sales_date Number_sold
1 1 20071025 20
2 1 20071025 10
31 432 20080104 5
2 432 20080104 7
2 2 20080421 102

So you have 2 records in the product dimension with a natural key of BNA, but with a different description.
Same goes for ORA.

There are 2 records in the salesmanager dimension for George (W) Bush.
The fact table refers to the correct record, which is based on the natural key and the sales_date.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top