Smart questions
Smart answers
Smart people
Join Tek-Tips Forums
INTELLIGENT WORK FORUMS
FOR COMPUTER PROFESSIONALS

Member Login




Remember Me
Forgot Password?
Join Us!

Come Join Us!

Are you a
Computer / IT professional?
Join Tek-Tips now!
  • Talk With Other Members
  • Be Notified Of Responses
    To Your Posts
  • Keyword Search
  • One-Click Access To Your
    Favorite Forums
  • Automated Signatures
    On Your Posts
  • Best Of All, It's Free!

Join Tek-Tips
*Tek-Tips's functionality depends on members receiving e-mail. By joining you are opting in to receive e-mail.

Posting Guidelines

Promoting, selling, recruiting, coursework and thesis posting is forbidden.
Jobs from Indeed

Link To This Forum!

Partner Button
Add Stickiness To Your Site By Linking To This Professionally Managed Technical Forum.
Just copy and paste the
code below into your site.

cte1 (TechnicalUser) (OP)
16 Aug 10 14:29
I am trying to count the number of unique codes in a group. Each group consist of multiple rows and the codes are pipe delimited in each cell. A sample of my dataset looks like this:

GROUP CODES
1 |231|322|414|
1 |231|322|
2 |231|
2 |231|114|
2
3
3

So in GROUP 1 there are 3 unique codes (231, 322, and 414) and in GROUP 2 there are 2 unique codes (231 and 114) There are 0 codes in GROUP 3. Can anyone tell me how to get SAS to give me an output dataset like this:

Group Count
1 3
2 2
3 0

Thanks in advance.

Chris
mdieckman (MIS)
17 Aug 10 12:23
You just count the number of "|" and subtract 1?

data x;
set master;
rcount =(length(Name)-length(compress(Name,"|"))) - 1;
run;

compress removes the '|' from whatever the variable is leaving you with the number of '|'. Since there is an extra '|' just subtract one. There would be other ways to do this, but this was the easiest I could think of.
mdieckman (MIS)
17 Aug 10 12:26
Oh, and if some don't have the '|', you could do the index function to make sure and then use an If, then statement.

So, basically:

if index(name,'|') > 0 then rcount = (statement above); else 0;
 
klaz2002 (Programmer)
19 Aug 10 15:07
In SAS 9 there is a countw function that counts the 'words' in your string. All you have to do is set the delimiter to the pipe '|' char.

ex

CODE

total = countw(codes,'|');

Hope this helps..
Klaz
cte1 (TechnicalUser) (OP)
19 Aug 10 15:18
Thanks for the help. I really appreciate it. That worked great for counting the number of codes within each cell. The problem is I need to know the number of unique codes across multiple observations (or rows). See example above. Any ideas?

  
klaz2002 (Programmer)
19 Aug 10 18:06
I figured that you would want the unique codes count too (and I was curious) so I experimented with some code.

CODE

data YouroutputDS(keep=outstr wrdtot agestr total_unq);
  set inputds end = last;
  *** INITIALIZE YOUR OUTPUT STRING VAR ***;
  length outstr $500;
  *** SET UP AN ARRAY WITH 5000 FIELDS ***;
  array strsort (*) $500 strs1-strs5000 ;
  *** COUNT THE ITEMS IN YOUR STRING (CODES) ***;
  wrdtot = countw(CODE,'|');
  do i=1 to wrdtot;
    strsort(i) = scan(CODE,i,'|');
  end;
  *** THIS IS A SORT FUNCTION ***;
  *** THIS FUNCTION SHOULD HAVE A UNIQUE OPTION ***;
  *** WE SHOULD PETITION SAS :)                 ***;
  call sortc(of strsort (*));
  *** CHECK FOR UNIQUE CODES ***;
  flag = 0;
  do i=1 to dim(strsort);
    *** WE DONT CHECK ON THE FIRST POSITION ***;
    if i gt 1 and trim(strsort(i)) ne '' then do;
     if trim(strsort(i))= trim(strsort(i-1)) then do;
           *** BLANK OUT ALL DUPS ***;
       strsort(i)='';
       call sortc(of strsort (*));
     end;
    end;
  end;

  CALL CATX('|', OF strs1-strs5000);
  outstr = strsort(1);
  total_unq = countw(outstr,'|');
run;
I hope this helps...
Klaz
cte1 (TechnicalUser) (OP)
20 Aug 10 9:17
Thanks Klaz. I ran the code, however it is only giving the number of unique codes within each cell. For example, if my dataset looks like this:

CODE
|191|
|191|
|187|189|191|278|934|
|187|189|191|278|
|191|278|

When I run the code I get a dataset that looks like this:

outstr wrdtot total_unq
|191| 2 1
|191| 2 1
|187|189|191|278|934| 6 5
|187|189|191|278| 5 4
|191|278| 3 2

It is giving the total unique codes within each cell but not the total unique codes accross the group. In this case there are 6 unique codes within this group (or within these 5 rows).

I really appreciate your help. Do you have any other ideas?

Thanks,
Chris
cte1 (TechnicalUser) (OP)
20 Aug 10 9:25
Sorry, there was an error in my last post. The example should look like this.

CODE
|190|
|191|
|187|189|191|278|934|
|187|189|191|278|
|191|278|

When I run the code I get a dataset that looks like this:

outstr wrdtot total_unq
|190| 2 1
|191| 2 1
|187|189|191|278|934| 6 5
|187|189|191|278| 5 4
|191|278| 3 2

It is giving the total unique codes within each cell but not the total unique codes accross the group. In this case there are 6 unique codes within this group (or within these 5 rows).

Thanks again,
Chris
klaz2002 (Programmer)
20 Aug 10 9:37
Yes the code only gives you the unique count in each cell (record). If you wanted to combine rows and then get calculate the unique codes you have two options.  
1) Use the retain statement and 'grow' your code string until all codes are in one record. Then run the script you have.

2) Use Proc transpose to transpose your data values into a vertical structure (1 column for all your values by groupID) then use Proc SQL to count with a distinct option.

Personaly, I would use the second option as the first may run out of room if you have a few hundred codes in a group.

Klaz
cte1 (TechnicalUser) (OP)
20 Aug 10 9:45
Thanks Klaz. I will try that.  
klaz2002 (Programmer)
20 Aug 10 12:31
Here is an example of the Proc Transpose method.

CODE

data f2;
  set your_ds end = last;
  
  array strsort (*) $500 strs1-strs5000 ;
  wrdtot = countw(code,'|');
  do i=1 to wrdtot;
    strsort(i) = scan(code,i,'|');
  end;
run;
*** DATA MUST BE SORTED BY ID VAR ***;
proc transpose data =f2 out =f3;
by id;
var strs1-strs5000;
run;

*** COUNTS NON-MISSING VALUES ***;
proc sql;
  create table f4 as
  select id , count(distinct col1) as unq_tot
  from f3
  group by id;
quit;

I hope this helps you.
Klaz
cte1 (TechnicalUser) (OP)
24 Aug 10 16:57
Hi Klaz,

First, thanks so much for you help. The code you posted ran fine but it is only giving me the unique (or distinct) count for col1, which is  not the distinct count across the entire group. In the sql statement I need the distinct count across all of the columns (i.e. col1, col2, col3, etc.). Any ideas?

Thanks again,

Chris
klaz2002 (Programmer)
25 Aug 10 11:51
I thought that you wanted the unique total across all your columns.  If you don't all you have to do is in the PROC SQL step add a group by varname.

Look at the transposed dataset and you will find a column that has the original transposed variable name. Add that column to both the select and group statements.

select id, thetransposedcolumnname, count(distinct col1) as...

group by id, thetransposedcolumnname

change the above variable name to the correct one (I can't recall off-hand what that name is)

klaz
cte1 (TechnicalUser) (OP)
25 Aug 10 16:31
Hi Klaz,

Thanks again for getting back to me. I do want the unique total across all columns. The code is not giving that result. Here is the code that you sent that I ran:

data f2;
  set cltest end = last;
  array strsort (*) $500 strs1-strs5000 ;
  wrdtot = countw(code,'|');
  do i=1 to wrdtot;
    strsort(i) = scan(code,i,'|');
  end;
run;
proc sort data=f2; by id; run;
proc transpose data =f2 out =f3;
by id;
var strs1-strs5000;
run;
*** COUNTS NON-MISSING VALUES ***;
proc sql;
  create table f4 as
  select id , count(distinct col1) as unq_tot
  from f3
  group by id;
quit;

The input data file, cltest, is available here:

http://rapidshare.com/files/415124745/cltest.sas7bdat

For example there are 11 unique CODES for ID 601 in the cltest dataset. The codes are 178, 181, 186, 278, 204, 816, 208, 212, 214, 176, and 202. However the dataset f4 only shows 9 unique codes for ID 601, as it is only counting the unique observations in the first transposed column (col1).

Sorry for so many questions. This stuff seems so easy for you. If you have any ideas to fix this problem it would really help me out a lot. Again, I really appreciate all of your help.

thanks,
chris
kdt82 (Programmer)
27 Aug 10 9:44
Hi Chris,

Hopefully this is what you are looking for (one of many ways):

First transposing, then getting the unique group/counts and using proc summary to perform the counts. The completetypes and classdata options allow proc summary to retrieve the groups that have 0 codes.

CODE

data have ;
   infile cards missover ;
   input group codes :$20. ;
cards ;
1 |231|322|414|
1 |231|322|
2 |231|
2 |231|114|
3      
3       
3        
;;;;;
proc print;run;
data long (keep=group code);
   set have ;
   do _n_=1 to countW(codes,'|')-1;
      code =input(scan(codes,_n_,'|'),best.) ;
      output;
      end ;
   run;
proc sort data=long out=unique nodupkey ;
   by group code ;
   run;
proc summary data=unique missing completetypes classdata=have nway ;
   class group ;
   output out= want(keep=group count) n(code)=count ;
   run;
proc print;run;
cte1 (TechnicalUser) (OP)
30 Aug 10 11:11
That worked perfectly! Thank you so much.

Chris

Reply To This Thread

Posting in the Tek-Tips forums is a member-only feature.

Click Here to join Tek-Tips and talk with other members!

Close Box

Join Tek-Tips® Today!

Join your peers on the Internet's largest technical computer professional community.
It's easy to join and it's free.

Here's Why Members Love Tek-Tips Forums:

Register now while it's still free!

Already a member? Close this window and log in.

Join Us             Close