×
INTELLIGENT WORK FORUMS
FOR COMPUTER PROFESSIONALS

Log In

Come Join Us!

Are you a
Computer / IT professional?
Join Tek-Tips Forums!
  • Talk With Other Members
  • Be Notified Of Responses
    To Your Posts
  • Keyword Search
  • One-Click Access To Your
    Favorite Forums
  • Automated Signatures
    On Your Posts
  • Best Of All, It's Free!
  • Students Click Here

*Tek-Tips's functionality depends on members receiving e-mail. By joining you are opting in to receive e-mail.

Posting Guidelines

Promoting, selling, recruiting, coursework and thesis posting is forbidden.

Students Click Here

Jobs

duplicating fields on both dimension and fact table.

duplicating fields on both dimension and fact table.

duplicating fields on both dimension and fact table.

(OP)
Hi all,

Is it a bad idea to have the same field on both a dimension and fact table?

Here's my scenario:
I have about 10 descriptive numeric fields that reside on a dimension table. While the fields are numeric they are truly descriptive and rarely change and so they are dimension attributes. However, these same fields often participate in complex calculations. The dimension table is fairly large (15-20 million loan records).

In our environment it is possible run a meaningful report by simply hitting the fact table. If the fields in question don't exist on the fact table then we'll perform an unnecessary join to the large dimension table.

In order to improve performance, we've decided to duplicate the 10 fields on both the dimension & fact table. This way any reports that don't need other dimensional data can run just from the fact table.

Design-wise, does it seem like a bad idea to duplicate the fields?

Any insight will be greatly appreciated.
Thanks!

RE: duplicating fields on both dimension and fact table.

I wonder hwo you can run a report on just a fact table. You don't use a date dimension? Are you using surrogate keys in teh fact table, or are you using the keys from teh source system?

Design wise I would say, yes, it is a bad idea to duplicate these fields.
Consider the effects of a change of one of these fields as well. Impact on ETL would be significant.

But it depends on the size of the fact table (nr of rows) as well. If this is way less then the 20Million from the dimension you might consider this, if you take the growth of the fact table into account.

How many reports would need these fields? Nearly all or just a few?
 

RE: duplicating fields on both dimension and fact table.

Hans63 makes some good points, but I believe there are situations where duplicating a column onto the fact table can be beneficial. From a theoretical point of view, one of the tenets of DW is to trade storage (duplicate data) for better performance.  One case which might apply:  Let's say you have a small dimension, like State, or maybe some company specific attribute like product group.  With small reference tables, sometimes performance can be gained by not having to join to the dimension.  I would think this is the exception rather than the rule. Also of consideration is whether the attribute which you are denormalizing into the fact is a slowly changing dimenions.  

-------------------------
The trouble with doing something right the first time is that nobody appreciates how difficult it was - Steven Wright

RE: duplicating fields on both dimension and fact table.

(OP)
Hans63 and johnherman thank you both so much for your responses.

The fact table in question contains loan balance information -- the grain is 1 record per loan per month. The large dimension in question is a loan attribute dimension and contains 1 record per loan.

Hans63, I misspoke when I said we can run reports solely from the fact table. What I meant is we can run reports solely from the fact table & date dimension (and other smaller dimensions) without having to go to the large dimension mentioned above. For example, our users have a need to do complex calculations (using the duplicated fields) across all loans in a given month. However, those duplicated fields are also attributes of the loan and can be used in the "where" clause for other reports. Because of the size of the loan attribute dimension (20 million or so records), we thought it best to duplicate the fields to improve performance.

johnherman, all the duplicated fields are SCD1 on the large loan attribute dimension table.

From a design perspective, I would prefer not to duplicate the fields, but the performance consideration makes me unsure... Have either of you had this situation before? Is a bad idea because it complicates the ETL process?

RE: duplicating fields on both dimension and fact table.

The fact table you describe will probably get way bigger then the loan dimension, unless you somehow manage to reduce the number of records. e.g. by holding only 1 year of history. But that will give you 12 * 20 million = 240 million records in the fact table. (this is the space part of the trade-off)

I take it that you need both the balance information and the other attributes from the dimension to perform those complex calculations. Meaning you will have to join 20M dimension records to 20M fact records.

As to the impact on the ETL proces, consider the fact that, for a change in a loan attribute, you will have to update the fact table and the dimension table. As shown above the fact table may get quite big. And this is an extra piece of ETL that needs to be maintained.

Performance, both for the ETL and the report, will depend on available hardware and indexing strategies.

The right choice also depends on the number of reports needing these complex calculations, the available timeframe and the importance of these report for the business. Is it just 1 report, I wouild opt for NOT duplicating these fields. If you have about 20% of your reports needing (some of) those calculations I would consider duplicating.
Please make sure that the business understands that this report may take some time to produce.

Does the business need these calculations on ALL loans every time? Or can you filter on the loan as well?

Red Flag This Post

Please let us know here why this post is inappropriate. Reasons such as off-topic, duplicates, flames, illegal, vulgar, or students posting their homework.

Red Flag Submitted

Thank you for helping keep Tek-Tips Forums free from inappropriate posts.
The Tek-Tips staff will check this out and take appropriate action.

Reply To This Thread

Posting in the Tek-Tips forums is a member-only feature.

Click Here to join Tek-Tips and talk with other members! Already a Member? Login

Close Box

Join Tek-Tips® Today!

Join your peers on the Internet's largest technical computer professional community.
It's easy to join and it's free.

Here's Why Members Love Tek-Tips Forums:

Register now while it's still free!

Already a member? Close this window and log in.

Join Us             Close