×
INTELLIGENT WORK FORUMS
FOR COMPUTER PROFESSIONALS

Contact US

Log In

Come Join Us!

Are you a
Computer / IT professional?
Join Tek-Tips Forums!
  • Talk With Other Members
  • Be Notified Of Responses
    To Your Posts
  • Keyword Search
  • One-Click Access To Your
    Favorite Forums
  • Automated Signatures
    On Your Posts
  • Best Of All, It's Free!

*Tek-Tips's functionality depends on members receiving e-mail. By joining you are opting in to receive e-mail.

Posting Guidelines

Promoting, selling, recruiting, coursework and thesis posting is forbidden.

Students Click Here

Category Count

Category Count

Category Count

(OP)
Hello Everyone,

I need a measures to give distinct counts over various dimensions. The column(lets call it col_id) on which the counts are based on is not required as a dimension/level. But I embed it as a level only for using category counts and later suppress that complete category. Also, because there is no direct Activity Measure associated with this, I create a dummy column in IMR/IQD to hold constant value '1'.
Now the problem is that this col_id field has over 200k values and it grows rapidly. So the cube generation is slow and also violates the category ratio rule. Is there any efficient way of creating this distinct count.
(I went through the archives and there were suggestions of doing this is through Impromptu; if thats the answer, please explain how?)

Regards,
Prashar

RE: Category Count

Hi Pr07,

I'm experiencing the same problem and it looks like that using a category count measure to distinctly count the number of categories that exceed 65,536 in a dimension will not work. Even worse for my case, I have to distinctly count over 1M categories (distinct customers) in a customer dimension and even if I suppress the customer_id level which is the basis of my category count measure, transformer says that the children of a category count measure cannot exceed 65,536. I suggest that you create separate cubes based on rolled-up transactions for the distinct column you want to measure. I know this is not the best way to do it but I'm running out of options. Hope someone else can helps us though :P

RE: Category Count

Could you gentlemen give a layout of your dimension map plan and the measures. Basically category counts are easy to establish by simply assigning a count of "1" in the underlying IMR.

Example

Country   Region   State    Customer   Customer Count

USA       West     CA       ABC        1
USA       East     NJ       XYZ        1
USA       East     NC       DEF        1

In this instance, Country, Region and State would form the dimensions and "Customer Count" will be the measure. "Customer" per se will NOT be part of your dimension plan or a level in any other dimension.

The measure "Customer Count" will give you the category counts for any dimesion depending upon the level of rollup being viewed.

I am sure you would have known all this. Perhaps I am wrong in assessing you problem. Let me know.

RE: Category Count

(OP)
Thank you guys for your input.
Correct me if I am wrong but I think the solution mentioned above will work( without the need of category count) if the data source is specifically customer file. But things are different if the data source(IQD) is in the form of fact table. The col_count(field containing 1s) will not return the distinct count and thus show higher value than the category count i.e. by putting the col_id field as a level or dimension.
I apologize for missing this point previously. Any help would be greatly appreciated.

regards

RE: Category Count

There are many ways to achieve a distinct count.

1) Select the "Eliminate Duplicate Rows" in the filter tab of the report, if your report does not have FACTs but only dimensional data.

2) Use Count(Distinct Cust_ID) in the count column grouped at the lowest level.

3) Use the Running-Count summary function to mark only the first line of the row with "1".

if ( Running-Count ( Cust_ID ) for Country, State, Region ) = 1 ) then ( 1 ) else ( 0 )

Hope that helps.

RE: Category Count

Hi Nagrajm: Can we also use the running count for  our situation. Right now, we use category count on Case_id in a similar manner as in this problem.
We are trying to build a cube with data from two tables sharing Master-Detail relationship. The master table stores the static information like Case_type, Received_date, Origin etc. about 'Cases' which go through different phases. The changes in the phase along with Date_Phase_change are stored in the child table.

Dimensions:
*Time: Year- Quarter- Month
* Case Type: Various Types
* Case Phase:Various Phases( Open,... Trial,... Closed etc.) (sequences of these phases is not fixed)
*Origin: State- Dist- City
*Zone: Zone- Sector (another geographical location totally independent from Origin)

The measures requested is:
*Number of Cases ( for any combinations of the dimension selection)

We build a staging table(source for IQD) from the above two tables.The catch is if Case is "Opened" in Mar 2002 and "Trial" in Jun 2002 : it should be counted under "Opened" in Mar,Apr & May 2002 and "Trial" from Jun 2002 till date until the file is closed. We generate extra records in the staging table to show the phase Case is in for intermediate months.

Thanks in advance

RE: Category Count

Hi Nchd15

Ordinarily this report is simple and bookish except when you consider the catch

"The catch is if Case is "Opened" in Mar 2002 and "Trial" in Jun 2002 : it should be counted under "Opened" in Mar,Apr & May 2002 and "Trial" from Jun 2002 till date until the file is closed."

But when I read

"We build a staging table(source for IQD) from the above two tables......We generate extra records in the staging table to show the phase Case is in for intermediate months."

I said you had the solution figured. This is the only possible way, in my opinion, to count those cases during the intermediate months. Running-Count will not give you the fuctionality to achieve this.

Does that answer your question?

RE: Category Count

nagrajm,

In one of your earlier posts, you suggest the following:
2) Use Count(Distinct Cust_ID) in the count column grouped at the lowest level.

I have a similiar situation as is being discussed in this thread and I have tried doing a count(distinct ss_num) in Impromptu when I create the IQD.  But when I use that column in Transformer as a measure, I do not get a distinct count.  It counts all the occurrances.  What am I doing wrong?

Neal

RE: Category Count

The way I would do this, although indirect:
pick one field that has a decent spread in the group, like city or customer manager, and break the cube on that field, via a cube group . This will create several cubes, identical in every way, except for that one field (and obviously the measure values). You'll be able to keep your category count, and it's a great selling point to the users that a cube is split on some logical pivotal piece of data. Look under the options in Transformer Admin for creating a cube, or search the documentation for 'cube group'.  

RE: Category Count

Hi Neal

Does the IMR return distinct values correctly?

Else, try the Running-Count function to mark the first occurance of ss_num with 1 and other occurances with 0. This way you will avoid counting multiple occurances.

Nagraj

RE: Category Count

Hi !
I also have the same problem!
My solution is to use external meatures. I build a special table only for this meature, where I calculate the distinct count of customers directly by the oracle database. There ist a special function, called CUBE(), to calculate all combinations of different columns.

Example:
select count(distinct cust_id),sex,region
from table1 group by cube(sex,region)

It will generate you a list of all values and all combinations of sex and region.
So my solution is not the simplist one, but the cube is very small and fast, also for more then 1 Million customers/lines to count !!

RE: Category Count

Nagrajm's response from Aug 7, # 3), is the best method I've found so far.  
We "count" the number of specific dimensions on several levels.  This is kinda complex and
limited to only the way the "count" 1's are placed, especially in a CUBE.  Because you can't drill deeper than
what you calculated the "count" 1's or you risk losing the 1's in their correct place.

Red Flag This Post

Please let us know here why this post is inappropriate. Reasons such as off-topic, duplicates, flames, illegal, vulgar, or students posting their homework.

Red Flag Submitted

Thank you for helping keep Tek-Tips Forums free from inappropriate posts.
The Tek-Tips staff will check this out and take appropriate action.

Reply To This Thread

Posting in the Tek-Tips forums is a member-only feature.

Click Here to join Tek-Tips and talk with other members! Already a Member? Login

Close Box

Join Tek-Tips® Today!

Join your peers on the Internet's largest technical computer professional community.
It's easy to join and it's free.

Here's Why Members Love Tek-Tips Forums:

Register now while it's still free!

Already a member? Close this window and log in.

Join Us             Close