OLAP modeling: one cube versus many cubes?
OLAP modeling: one cube versus many cubes?
(OP)
I am still puzzled about the correct way to design a cube vs. several cubes so that reports and queries that compare several different measurements can be made as easily as possible. The problem is that when there are different measurements, these measurements may only share some dimensions or some levels of a hierarchical dimension.
For example (a very artificial example just to illustrate the question) a measurement about working hours may be available on a daily basis, involve the dimension male/female, job category, and also the dimension overtime. Another measurement e.g. production may be available only on a monthly basis, and have the dimension product type. Both measurements share the hierarchies year and month of the time dimension.
These measurements are very different and only share a subset of dimensions, and dimension values, yet we want to create reports and queries where one measurement is compared with the other. For example we want to compare the total female working hours of job category A with the total production of product type X on a monthly basis. Or we want to compare the average daily working hours of all males with total production on a monthly basis.
The usual advice given is not to make the cube to big, avoid "unrelated" measures and avoid too many dimensions, but for some applications, reporting over many measures that are related only by e.g. the time dimension is important. There seems to be no easy, "standard" way to create reports over many cubes in an ad-hoc fashion.
So - how is this kind of problem usually dealt with? Is there some good reading stuff for this?
For example (a very artificial example just to illustrate the question) a measurement about working hours may be available on a daily basis, involve the dimension male/female, job category, and also the dimension overtime. Another measurement e.g. production may be available only on a monthly basis, and have the dimension product type. Both measurements share the hierarchies year and month of the time dimension.
These measurements are very different and only share a subset of dimensions, and dimension values, yet we want to create reports and queries where one measurement is compared with the other. For example we want to compare the total female working hours of job category A with the total production of product type X on a monthly basis. Or we want to compare the average daily working hours of all males with total production on a monthly basis.
The usual advice given is not to make the cube to big, avoid "unrelated" measures and avoid too many dimensions, but for some applications, reporting over many measures that are related only by e.g. the time dimension is important. There seems to be no easy, "standard" way to create reports over many cubes in an ad-hoc fashion.
So - how is this kind of problem usually dealt with? Is there some good reading stuff for this?
RE: OLAP modeling: one cube versus many cubes?
====================================
The trouble with doing something right the first time is that nobody appreciates how difficult it was - Steven Wright
RE: OLAP modeling: one cube versus many cubes?
The problem with several cubes then appears to be (I am not certain) that the user (or at least the GUI code that has to be developed) will have to know which cube contains and which queries can be done on which cube.
This is bad.
We would like to come up with a GUI that contains all the measures and all the dimensions and that lets the user naturally navigate to all the combinations of measures and dimensions that make sense while hiding everything else. Since the potential queries/reports may combine so many different measures with so many different dimensions, there is no practical way to really anticipate them all and plan for all. Rather, we would like to describe the compatibilities on a meta-level and then have the GUI and MDX or generated SQL take care of the rest.
RE: OLAP modeling: one cube versus many cubes?
What are you using at the moment?
Ties Blom
RE: OLAP modeling: one cube versus many cubes?
We are aware of the sparsity problem, but despite the huge number of dimensions, the final fact table would probably not exceed a few hundred million or maybe a billion rows by much.
The real issue that is puzzling here is this: depending on the way how the data is modeled in one or several cubes, there will be situations where a row has a combination of dimension values that apply to measurement A but not measurement B. For another combination of dimensions the reverse may be true and these dimensions may apply to measurement B but not A. In other words, for certain sets of dimensions, some measurements are more granular than others. The most obvious way to deal with this seems to be to make all measurements null where the dimensions combination does not apply. Would that be the standard approach that can also be expected to be supported by the usual tools and e.g. MDX?
RE: OLAP modeling: one cube versus many cubes?
Ties Blom
RE: OLAP modeling: one cube versus many cubes?
The dilemma we are in looks rather basic and principal so I wonder if that is something that is discussed in some whitepaper or book or some web-page -- it is hard to believe that we are the first one who face these kinds of problems.
But most stuff I read about BI, reporting and OLAP either ignores this problem completely or assumes that it is possible to factor the data in a way so that each user group gets their own cube that would satisfy all their query and reporting needs.
RE: OLAP modeling: one cube versus many cubes?
www.ssas-info.com
IBM Cognos also offers TM1 which is basically an in-memory OLAP solution that allows up 90% reduction of sparsity, but it will not be cheap :(
Ties Blom