Log In

Come Join Us!

Are you a
Computer / IT professional?
Join Tek-Tips Forums!
  • Talk With Other Members
  • Be Notified Of Responses
    To Your Posts
  • Keyword Search
  • One-Click Access To Your
    Favorite Forums
  • Automated Signatures
    On Your Posts
  • Best Of All, It's Free!
  • Students Click Here

*Tek-Tips's functionality depends on members receiving e-mail. By joining you are opting in to receive e-mail.

Posting Guidelines

Promoting, selling, recruiting, coursework and thesis posting is forbidden.

Students Click Here


Storing complex order pipeline data in a DW, is Kimball method applic?

Storing complex order pipeline data in a DW, is Kimball method applic?

Storing complex order pipeline data in a DW, is Kimball method applic?


On page 128 of Kimball's book: The Data Warehouse Toolkit (2002), he describes the Accumulating Snapshot for tracking the movement of orders through a pipeline.

For those who don't have the book, he basiscally recomends to put one row per order in a fact table and to have many status attributes in the fact table that is referenced to a date dimension every time an order reaches a certain milestone an order chain. Therefore, the fact is updated everytime an order has reached a next milestone in the chain.

The example that Kimball presents is pretty simple, and contains six milestones in the chain. However, my question is what to do in real life situations where you're chain would be more complicated, e.g. where you have around 50 milestones and time becomes important too. Then, you would easily have over 100 attributes that link to the date and time dimensions. Would this be a good design choise, since the fact table gets soo many attributes. Is the Kimball methodology still applicable here?

Many thanks in advance,
Remco van Strien.


RE: Storing complex order pipeline data in a DW, is Kimball method applic?

With such a large and indeterminate chain of events, I might look at "state change" fact table, which is basically a transaction fact table, where each fact is the event of an entity (in your case, an order) changing state (in your case, going from one point in the pipeline to another). Put a pair of datetime fields in there to track the beginning and end of each state; I'd make them redundant, putting real datetime fields directly in the fact table, for ease of querying and understandability, in addition to foriegn keys to your calendar dimension.

What's often interesting in state changes are the pairs of changes (from this state > to that state). How many times, for instance, does an order go into a "backorder" state?There are a few ways to model this, but one I'd suggest making a dimension table of all the pairs-of-states, and then putting a fk to this in the fact.

You'd need to define what you mean by "state", of course, and if you want "state" to mean "all the attributes of the order", rather than just "the part of the pipeline", then you'll have lots of state change transaction records (and you may need more than one dimension to track state change pairs, if you can organize them, for instance, into different spheres of interest).

The approach is somewhat similar to Ralphie's HR model -- see http://www.fortunecity.com/skyscraper/oracle/699/orahtml/dbmsmag/9802d05.html

Jeff Prenevost

RE: Storing complex order pipeline data in a DW, is Kimball method applic?

I agree that a separate Order Event table may be the best solution.  So, reading the most recent event date for an order will give its current status and attributes. Set up a concatonated index on the order and date columns and another on the date and order columsn. So, when the interest in tracing the history of one or more orders, the first index (order/date) can be used.  When interested in what happened on a particular date, the second index (date/order) would be used.    

The trouble with doing something right the first time is that nobody appreciates how difficult it was - Steven Wright

RE: Storing complex order pipeline data in a DW, is Kimball method applic?

This is dependend on the queries you have to run. If you want to compare the time between 2 states, a query on a transaction fact table will be ugly and ask a lot of resources (but it can be done).

Adding time to an accumulating snapshot poses no problem whatsoever. Time is inherent to datetime datatype.
And 100+ date fields (of which many may be null) is no problem either (except for a clear understanding perhaps).
The ETL proces for 100+ fields looks daunting, but is in essence the same trick over and over again. So it is not really complex.

What I have done before is having 2 datetime fields for every state as a product may move back and forth between states. So I had a date with the first time a certain state was reached and another one for the last time the state was reached.

Red Flag This Post

Please let us know here why this post is inappropriate. Reasons such as off-topic, duplicates, flames, illegal, vulgar, or students posting their homework.

Red Flag Submitted

Thank you for helping keep Tek-Tips Forums free from inappropriate posts.
The Tek-Tips staff will check this out and take appropriate action.

Reply To This Thread

Posting in the Tek-Tips forums is a member-only feature.

Click Here to join Tek-Tips and talk with other members! Already a Member? Login

Close Box

Join Tek-Tips® Today!

Join your peers on the Internet's largest technical computer professional community.
It's easy to join and it's free.

Here's Why Members Love Tek-Tips Forums:

Register now while it's still free!

Already a member? Close this window and log in.

Join Us             Close