INTELLIGENT WORK FORUMS
FOR COMPUTER PROFESSIONALS

Member Login




Remember Me
Forgot Password?
Join Us!

Come Join Us!

Are you a
Computer / IT professional?
Join Tek-Tips Forums!
  • Talk With Other Members
  • Be Notified Of Responses
    To Your Posts
  • Keyword Search
  • One-Click Access To Your
    Favorite Forums
  • Automated Signatures
    On Your Posts
  • Best Of All, It's Free!

Join Tek-Tips
*Tek-Tips's functionality depends on members receiving e-mail. By joining you are opting in to receive e-mail.

Posting Guidelines

Promoting, selling, recruiting, coursework and thesis posting is forbidden.
Jobs from Indeed

Link To This Forum!

Partner Button
Add Stickiness To Your Site By Linking To This Professionally Managed Technical Forum.
Just copy and paste the
code below into your site.

Count unique codes by group

Count unique codes by group

(OP)
I am trying to count the number of unique codes in a group. Each group consist of multiple rows and the codes are pipe delimited in each cell. A sample of my dataset looks like this:

GROUP CODES
1 |231|322|414|
1 |231|322|
2 |231|
2 |231|114|
2
3
3

So in GROUP 1 there are 3 unique codes (231, 322, and 414) and in GROUP 2 there are 2 unique codes (231 and 114) There are 0 codes in GROUP 3. Can anyone tell me how to get SAS to give me an output dataset like this:

Group Count
1 3
2 2
3 0

Thanks in advance.

Chris

RE: Count unique codes by group

You just count the number of "|" and subtract 1?

data x;
set master;
rcount =(length(Name)-length(compress(Name,"|"))) - 1;
run;

compress removes the '|' from whatever the variable is leaving you with the number of '|'. Since there is an extra '|' just subtract one. There would be other ways to do this, but this was the easiest I could think of.

RE: Count unique codes by group

Oh, and if some don't have the '|', you could do the index function to make sure and then use an If, then statement.

So, basically:

if index(name,'|') > 0 then rcount = (statement above); else 0;
 

RE: Count unique codes by group

In SAS 9 there is a countw function that counts the 'words' in your string. All you have to do is set the delimiter to the pipe '|' char.

ex

CODE

total = countw(codes,'|');

Hope this helps..
Klaz

RE: Count unique codes by group

(OP)
Thanks for the help. I really appreciate it. That worked great for counting the number of codes within each cell. The problem is I need to know the number of unique codes across multiple observations (or rows). See example above. Any ideas?

  

RE: Count unique codes by group

I figured that you would want the unique codes count too (and I was curious) so I experimented with some code.

CODE

data YouroutputDS(keep=outstr wrdtot agestr total_unq);
  set inputds end = last;
  *** INITIALIZE YOUR OUTPUT STRING VAR ***;
  length outstr $500;
  *** SET UP AN ARRAY WITH 5000 FIELDS ***;
  array strsort (*) $500 strs1-strs5000 ;
  *** COUNT THE ITEMS IN YOUR STRING (CODES) ***;
  wrdtot = countw(CODE,'|');
  do i=1 to wrdtot;
    strsort(i) = scan(CODE,i,'|');
  end;
  *** THIS IS A SORT FUNCTION ***;
  *** THIS FUNCTION SHOULD HAVE A UNIQUE OPTION ***;
  *** WE SHOULD PETITION SAS :)                 ***;
  call sortc(of strsort (*));
  *** CHECK FOR UNIQUE CODES ***;
  flag = 0;
  do i=1 to dim(strsort);
    *** WE DONT CHECK ON THE FIRST POSITION ***;
    if i gt 1 and trim(strsort(i)) ne '' then do;
     if trim(strsort(i))= trim(strsort(i-1)) then do;
           *** BLANK OUT ALL DUPS ***;
       strsort(i)='';
       call sortc(of strsort (*));
     end;
    end;
  end;

  CALL CATX('|', OF strs1-strs5000);
  outstr = strsort(1);
  total_unq = countw(outstr,'|');
run;
I hope this helps...
Klaz

RE: Count unique codes by group

(OP)
Thanks Klaz. I ran the code, however it is only giving the number of unique codes within each cell. For example, if my dataset looks like this:

CODE
|191|
|191|
|187|189|191|278|934|
|187|189|191|278|
|191|278|

When I run the code I get a dataset that looks like this:

outstr wrdtot total_unq
|191| 2 1
|191| 2 1
|187|189|191|278|934| 6 5
|187|189|191|278| 5 4
|191|278| 3 2

It is giving the total unique codes within each cell but not the total unique codes accross the group. In this case there are 6 unique codes within this group (or within these 5 rows).

I really appreciate your help. Do you have any other ideas?

Thanks,
Chris

RE: Count unique codes by group

(OP)
Sorry, there was an error in my last post. The example should look like this.

CODE
|190|
|191|
|187|189|191|278|934|
|187|189|191|278|
|191|278|

When I run the code I get a dataset that looks like this:

outstr wrdtot total_unq
|190| 2 1
|191| 2 1
|187|189|191|278|934| 6 5
|187|189|191|278| 5 4
|191|278| 3 2

It is giving the total unique codes within each cell but not the total unique codes accross the group. In this case there are 6 unique codes within this group (or within these 5 rows).

Thanks again,
Chris

RE: Count unique codes by group

Yes the code only gives you the unique count in each cell (record). If you wanted to combine rows and then get calculate the unique codes you have two options.  
1) Use the retain statement and 'grow' your code string until all codes are in one record. Then run the script you have.

2) Use Proc transpose to transpose your data values into a vertical structure (1 column for all your values by groupID) then use Proc SQL to count with a distinct option.

Personaly, I would use the second option as the first may run out of room if you have a few hundred codes in a group.

Klaz

RE: Count unique codes by group

(OP)
Thanks Klaz. I will try that.  

RE: Count unique codes by group

Here is an example of the Proc Transpose method.

CODE

data f2;
  set your_ds end = last;
  
  array strsort (*) $500 strs1-strs5000 ;
  wrdtot = countw(code,'|');
  do i=1 to wrdtot;
    strsort(i) = scan(code,i,'|');
  end;
run;
*** DATA MUST BE SORTED BY ID VAR ***;
proc transpose data =f2 out =f3;
by id;
var strs1-strs5000;
run;

*** COUNTS NON-MISSING VALUES ***;
proc sql;
  create table f4 as
  select id , count(distinct col1) as unq_tot
  from f3
  group by id;
quit;

I hope this helps you.
Klaz

RE: Count unique codes by group

(OP)
Hi Klaz,

First, thanks so much for you help. The code you posted ran fine but it is only giving me the unique (or distinct) count for col1, which is  not the distinct count across the entire group. In the sql statement I need the distinct count across all of the columns (i.e. col1, col2, col3, etc.). Any ideas?

Thanks again,

Chris

RE: Count unique codes by group

I thought that you wanted the unique total across all your columns.  If you don't all you have to do is in the PROC SQL step add a group by varname.

Look at the transposed dataset and you will find a column that has the original transposed variable name. Add that column to both the select and group statements.

select id, thetransposedcolumnname, count(distinct col1) as...

group by id, thetransposedcolumnname

change the above variable name to the correct one (I can't recall off-hand what that name is)

klaz

RE: Count unique codes by group

(OP)
Hi Klaz,

Thanks again for getting back to me. I do want the unique total across all columns. The code is not giving that result. Here is the code that you sent that I ran:

data f2;
  set cltest end = last;
  array strsort (*) $500 strs1-strs5000 ;
  wrdtot = countw(code,'|');
  do i=1 to wrdtot;
    strsort(i) = scan(code,i,'|');
  end;
run;
proc sort data=f2; by id; run;
proc transpose data =f2 out =f3;
by id;
var strs1-strs5000;
run;
*** COUNTS NON-MISSING VALUES ***;
proc sql;
  create table f4 as
  select id , count(distinct col1) as unq_tot
  from f3
  group by id;
quit;

The input data file, cltest, is available here:

http://rapidshare.com/files/415124745/cltest.sas7bdat

For example there are 11 unique CODES for ID 601 in the cltest dataset. The codes are 178, 181, 186, 278, 204, 816, 208, 212, 214, 176, and 202. However the dataset f4 only shows 9 unique codes for ID 601, as it is only counting the unique observations in the first transposed column (col1).

Sorry for so many questions. This stuff seems so easy for you. If you have any ideas to fix this problem it would really help me out a lot. Again, I really appreciate all of your help.

thanks,
chris

RE: Count unique codes by group

Hi Chris,

Hopefully this is what you are looking for (one of many ways):

First transposing, then getting the unique group/counts and using proc summary to perform the counts. The completetypes and classdata options allow proc summary to retrieve the groups that have 0 codes.

CODE

data have ;
   infile cards missover ;
   input group codes :$20. ;
cards ;
1 |231|322|414|
1 |231|322|
2 |231|
2 |231|114|
3      
3       
3        
;;;;;
proc print;run;
data long (keep=group code);
   set have ;
   do _n_=1 to countW(codes,'|')-1;
      code =input(scan(codes,_n_,'|'),best.) ;
      output;
      end ;
   run;
proc sort data=long out=unique nodupkey ;
   by group code ;
   run;
proc summary data=unique missing completetypes classdata=have nway ;
   class group ;
   output out= want(keep=group count) n(code)=count ;
   run;
proc print;run;

RE: Count unique codes by group

(OP)
That worked perfectly! Thank you so much.

Chris

Reply To This Thread

Posting in the Tek-Tips forums is a member-only feature.

Click Here to join Tek-Tips and talk with other members!

Close Box

Join Tek-Tips® Today!

Join your peers on the Internet's largest technical computer professional community.
It's easy to join and it's free.

Here's Why Members Love Tek-Tips Forums:

Register now while it's still free!

Already a member? Close this window and log in.

Join Us             Close