×
INTELLIGENT WORK FORUMS
FOR COMPUTER PROFESSIONALS

Log In

Come Join Us!

Are you a
Computer / IT professional?
Join Tek-Tips Forums!
  • Talk With Other Members
  • Be Notified Of Responses
    To Your Posts
  • Keyword Search
  • One-Click Access To Your
    Favorite Forums
  • Automated Signatures
    On Your Posts
  • Best Of All, It's Free!
  • Students Click Here

*Tek-Tips's functionality depends on members receiving e-mail. By joining you are opting in to receive e-mail.

Posting Guidelines

Promoting, selling, recruiting, coursework and thesis posting is forbidden.

Students Click Here

Jobs

Wide fact table vs. multiple fact tables

Wide fact table vs. multiple fact tables

Wide fact table vs. multiple fact tables

(OP)
I am currently developing a data warehouse for a telecom company.
We collect several different measures (100+) from the network elements on an hourly basis.
I'm wondering what the optimal data warehouse design would be:
1) Since all measures bear the same granularity (1 hour), and originate from similar network elements, it would make sense to put them all in the same fact table. I would therefore end up with a fact table composed of about 5 dimensions and 100+ fact fields. Does a 100+ field table sound like an acceptable design?
2) I could create several fact tables with each about 5 dimensions and each only a subset of the 100+ facts, grouped in a way likely to be queried by the end users. I would therefore have several fact tables with maximum 25 fact fields (which seems more reasonable?), but I would loose some flexibility in the sense that users could no longer query the data in ways that were unplanned or unexpected.
I would greatly appreciate any advice or comment or field experience with the 2 options proposed (or a 3rd alternative...).

RE: Wide fact table vs. multiple fact tables

You're right, 100 measures is quite a bit.  The first thing I am wondering is if these are all 100 actual different measures, or are there repeated measures stated in different units of measure?  For example:  WeightSoldPounds, WeightSoldKilograms.  

RE: Wide fact table vs. multiple fact tables

(OP)
Well, these are clearly 100+ different measures (and most of them are already groupings of an original set of 1000+ 'raw' counters). These telecom network elements are quite complex and thus require a fair amount of indicators. This is inherent to most complex radio transmission systems.
From a pure radio engineering point of view, of course not all queries involving any of the 100 facts make sense. However, I expect the end users to ask for the possibility to build unexpected or uncommon query types at certain points in time, for example when troubleshooting a particular network element, or doing root cause analysis.

RE: Wide fact table vs. multiple fact tables

Although 100+ measures is indeed much, if they are in reality related and have the same grain, I would suggest to opt for 1 fact table.
But is this case the expected queries may be of considerable impact. Test and check if the 100+ measures fact table can answer the expected questions with a reasonable performance. Or if you need to split the fact table.

And I wonder what queries you anticipate that you cannot answer with several fact tables.
How far do you want to go to meet the users in designing for Unexpected questions?

RE: Wide fact table vs. multiple fact tables

You could establish multiple fact tables, each with a 1:1 relation to each other. And the facts could be grouped either by subject area or perhaps by frequency of usage. You would want to try to limit the number of joins from fact to fact (to fact, to fact, ....) to get optimal performance.

-------------------------
The trouble with doing something right the first time is that nobody appreciates how difficult it was - Steven Wright

RE: Wide fact table vs. multiple fact tables

There are also some other potential drawbacks I see with multiple fact tables in this scenario.  For example, if a record errors out in your ETL to one fact table, you will want to make sure that the corresponding records in the other fact tables are deleted and staged to your error table so you don't end up with any bogus information.  This is especially true if end users have their own calculations in the front end tool.

Secondly, you might need to come up with a scheme to make incremental loads to the fact tables available to the end users only after all the related fact tables are loaded.

Thirdly, and least importantly, multiple fact tables could end up taking more storage space as you will have the foreign keys to the dimensions repeated multiple times.
    

RE: Wide fact table vs. multiple fact tables

How is the data going to be exposed to the users?  For example if you are using MS SSAS  then the issue of multiple fact tables actually becomes the better option as you would set the source of a measure group to a specific fact table.  If you had a single fact table you would have to have multiple queries hitting a single fact table to build your measure groups.  

If the users are reporting directly from the Warehouse then I would probably still go multiple fact tables following John's advice by grouping them into related facts. I would place each table on a seperate file group and drive set to maximize available IO.

Paul
---------------------------------------
Shoot Me! Shoot Me NOW!!!
                           - Daffy Duck

RE: Wide fact table vs. multiple fact tables

Following the general line of thought (subject area fact tables), you could even duplicate data across subject area fact tables to help prevent the joining of fact tables. Trading disk space and off-hours CPU time for rapid query response is a major tenet of DW.  

-------------------------
The trouble with doing something right the first time is that nobody appreciates how difficult it was - Steven Wright

RE: Wide fact table vs. multiple fact tables

(OP)
Thank you all for the great comments!
After discussing the issue thoroughly with my client, it appears that the 'unexpected' queries might not be so infrequent after all (thus probably calling for the wide fact table option). It seems that the flexibility is really a key factor to them, and they are ready to loose a bit on the response time to keep all analysis options open.
Anyway, based on the comments above, my intention is indeed to start with the wide fact table option because it also means that the front-end applications (normally this DW will be queried directly) will remain simple. I will then monitor closely the main query types and emerge most likely groupings if performance becomes an issue.
Migrating to the option of multiple fact tables will thus remain a possible design for the future. At this stage, I don't think that the added complexity of the ETL to keep data consistent across multiple fact tables under error conditions is worth the effort for the potential performance gain.
The whole project is developed using the Extreme Programming method, so we remain open to changes all along, but we generally go for the simpler solutions in the short term.
Thanks again for your help!

Red Flag This Post

Please let us know here why this post is inappropriate. Reasons such as off-topic, duplicates, flames, illegal, vulgar, or students posting their homework.

Red Flag Submitted

Thank you for helping keep Tek-Tips Forums free from inappropriate posts.
The Tek-Tips staff will check this out and take appropriate action.

Reply To This Thread

Posting in the Tek-Tips forums is a member-only feature.

Click Here to join Tek-Tips and talk with other members! Already a Member? Login

Close Box

Join Tek-Tips® Today!

Join your peers on the Internet's largest technical computer professional community.
It's easy to join and it's free.

Here's Why Members Love Tek-Tips Forums:

Register now while it's still free!

Already a member? Close this window and log in.

Join Us             Close