Smart questions
Smart answers
Smart people
INTELLIGENT WORK FORUMS
FOR COMPUTER PROFESSIONALS

Member Login

Come Join Us!

Are you a
Computer / IT professional?
Join Tek-Tips now!
  • Talk With Other Members
  • Be Notified Of Responses
    To Your Posts
  • Keyword Search
  • One-Click Access To Your
    Favorite Forums
  • Automated Signatures
    On Your Posts
  • Best Of All, It's Free!

Join Tek-Tips
*Tek-Tips's functionality depends on members receiving e-mail. By joining you are opting in to receive e-mail.

LINK TO THIS FORUM!

Add Stickiness To Your Site By Linking To This Professionally Managed Technical Forum.
Just copy and paste the
code below into your site.

Partner With Us!

"Best Of Breed" Forums Add Stickiness To Your Site
Partner Button
(Download This Button Today!)

Feedback

"...An excellent site which has quite possibly prevented me from having a mental/nervous breakdown..."

Geography

Where in the world do Tek-Tips members come from?
ngthompson (IS/IT--Management)
6 Jan 09 11:49
Hi all,

Is it a bad idea to have the same field on both a dimension and fact table?

Here's my scenario:
I have about 10 descriptive numeric fields that reside on a dimension table. While the fields are numeric they are truly descriptive and rarely change and so they are dimension attributes. However, these same fields often participate in complex calculations. The dimension table is fairly large (15-20 million loan records).

In our environment it is possible run a meaningful report by simply hitting the fact table. If the fields in question don't exist on the fact table then we'll perform an unnecessary join to the large dimension table.

In order to improve performance, we've decided to duplicate the 10 fields on both the dimension & fact table. This way any reports that don't need other dimensional data can run just from the fact table.

Design-wise, does it seem like a bad idea to duplicate the fields?

Any insight will be greatly appreciated.
Thanks!
Hans63 (Programmer)
7 Jan 09 2:50
I wonder hwo you can run a report on just a fact table. You don't use a date dimension? Are you using surrogate keys in teh fact table, or are you using the keys from teh source system?

Design wise I would say, yes, it is a bad idea to duplicate these fields.
Consider the effects of a change of one of these fields as well. Impact on ETL would be significant.

But it depends on the size of the fact table (nr of rows) as well. If this is way less then the 20Million from the dimension you might consider this, if you take the growth of the fact table into account.

How many reports would need these fields? Nearly all or just a few?
 
johnherman (MIS)
7 Jan 09 9:13
Hans63 makes some good points, but I believe there are situations where duplicating a column onto the fact table can be beneficial. From a theoretical point of view, one of the tenets of DW is to trade storage (duplicate data) for better performance.  One case which might apply:  Let's say you have a small dimension, like State, or maybe some company specific attribute like product group.  With small reference tables, sometimes performance can be gained by not having to join to the dimension.  I would think this is the exception rather than the rule. Also of consideration is whether the attribute which you are denormalizing into the fact is a slowly changing dimenions.  

-------------------------
The trouble with doing something right the first time is that nobody appreciates how difficult it was - Steven Wright

ngthompson (IS/IT--Management)
7 Jan 09 11:08
Hans63 and johnherman thank you both so much for your responses.

The fact table in question contains loan balance information -- the grain is 1 record per loan per month. The large dimension in question is a loan attribute dimension and contains 1 record per loan.

Hans63, I misspoke when I said we can run reports solely from the fact table. What I meant is we can run reports solely from the fact table & date dimension (and other smaller dimensions) without having to go to the large dimension mentioned above. For example, our users have a need to do complex calculations (using the duplicated fields) across all loans in a given month. However, those duplicated fields are also attributes of the loan and can be used in the "where" clause for other reports. Because of the size of the loan attribute dimension (20 million or so records), we thought it best to duplicate the fields to improve performance.

johnherman, all the duplicated fields are SCD1 on the large loan attribute dimension table.

From a design perspective, I would prefer not to duplicate the fields, but the performance consideration makes me unsure... Have either of you had this situation before? Is a bad idea because it complicates the ETL process?
Hans63 (Programmer)
8 Jan 09 2:57
The fact table you describe will probably get way bigger then the loan dimension, unless you somehow manage to reduce the number of records. e.g. by holding only 1 year of history. But that will give you 12 * 20 million = 240 million records in the fact table. (this is the space part of the trade-off)

I take it that you need both the balance information and the other attributes from the dimension to perform those complex calculations. Meaning you will have to join 20M dimension records to 20M fact records.

As to the impact on the ETL proces, consider the fact that, for a change in a loan attribute, you will have to update the fact table and the dimension table. As shown above the fact table may get quite big. And this is an extra piece of ETL that needs to be maintained.

Performance, both for the ETL and the report, will depend on available hardware and indexing strategies.

The right choice also depends on the number of reports needing these complex calculations, the available timeframe and the importance of these report for the business. Is it just 1 report, I wouild opt for NOT duplicating these fields. If you have about 20% of your reports needing (some of) those calculations I would consider duplicating.
Please make sure that the business understands that this report may take some time to produce.

Does the business need these calculations on ALL loans every time? Or can you filter on the loan as well?

Reply To This Thread

Posting in the Tek-Tips forums is a member-only feature.

Click Here to join Tek-Tips and talk with other members!

Close Box

Join Tek-Tips® Today!

Join your peers on the Internet's largest technical computer professional community.
It's easy to join and it's free.

Here's Why Members Love Tek-Tips Forums:

Register now while it's still free!

Already a member? Close this window and log in.

Join Us             Close