## Count unique codes by group

## Count unique codes by group

(OP)

I am trying to count the number of unique codes in a group. Each group consist of multiple rows and the codes are pipe delimited in each cell. A sample of my dataset looks like this:

GROUP CODES

1 |231|322|414|

1 |231|322|

2 |231|

2 |231|114|

2

3

3

So in GROUP 1 there are 3 unique codes (231, 322, and 414) and in GROUP 2 there are 2 unique codes (231 and 114) There are 0 codes in GROUP 3. Can anyone tell me how to get SAS to give me an output dataset like this:

Group Count

1 3

2 2

3 0

Thanks in advance.

Chris

GROUP CODES

1 |231|322|414|

1 |231|322|

2 |231|

2 |231|114|

2

3

3

So in GROUP 1 there are 3 unique codes (231, 322, and 414) and in GROUP 2 there are 2 unique codes (231 and 114) There are 0 codes in GROUP 3. Can anyone tell me how to get SAS to give me an output dataset like this:

Group Count

1 3

2 2

3 0

Thanks in advance.

Chris

## RE: Count unique codes by group

data x;

set master;

rcount =(length(Name)-length(compress(Name,"|"))) - 1;

run;

compress removes the '|' from whatever the variable is leaving you with the number of '|'. Since there is an extra '|' just subtract one. There would be other ways to do this, but this was the easiest I could think of.

## RE: Count unique codes by group

So, basically:

if index(name,'|') > 0 then rcount = (statement above); else 0;

## RE: Count unique codes by group

ex

## CODE

Hope this helps..

Klaz

## RE: Count unique codes by group

## RE: Count unique codes by group

## CODE

set inputds end = last;

*** INITIALIZE YOUR OUTPUT STRING VAR ***;

length outstr $500;

*** SET UP AN ARRAY WITH 5000 FIELDS ***;

array strsort (*) $500 strs1-strs5000 ;

*** COUNT THE ITEMS IN YOUR STRING (CODES) ***;

wrdtot = countw(CODE,'|');

do i=1 to wrdtot;

strsort(i) = scan(CODE,i,'|');

end;

*** THIS IS A SORT FUNCTION ***;

*** THIS FUNCTION SHOULD HAVE A UNIQUE OPTION ***;

*** WE SHOULD PETITION SAS :) ***;

call sortc(of strsort (*));

*** CHECK FOR UNIQUE CODES ***;

flag = 0;

do i=1 to dim(strsort);

*** WE DONT CHECK ON THE FIRST POSITION ***;

if i gt 1 and trim(strsort(i)) ne '' then do;

if trim(strsort(i))= trim(strsort(i-1)) then do;

*** BLANK OUT ALL DUPS ***;

strsort(i)='';

call sortc(of strsort (*));

end;

end;

end;

CALL CATX('|', OF strs1-strs5000);

outstr = strsort(1);

total_unq = countw(outstr,'|');

run;

Klaz

## RE: Count unique codes by group

CODE

|191|

|191|

|187|189|191|278|934|

|187|189|191|278|

|191|278|

When I run the code I get a dataset that looks like this:

outstr wrdtot total_unq

|191| 2 1

|191| 2 1

|187|189|191|278|934| 6 5

|187|189|191|278| 5 4

|191|278| 3 2

It is giving the total unique codes within each cell but not the total unique codes accross the group. In this case there are 6 unique codes within this group (or within these 5 rows).

I really appreciate your help. Do you have any other ideas?

Thanks,

Chris

## RE: Count unique codes by group

CODE

|190|

|191|

|187|189|191|278|934|

|187|189|191|278|

|191|278|

When I run the code I get a dataset that looks like this:

outstr wrdtot total_unq

|190| 2 1

|191| 2 1

|187|189|191|278|934| 6 5

|187|189|191|278| 5 4

|191|278| 3 2

It is giving the total unique codes within each cell but not the total unique codes accross the group. In this case there are 6 unique codes within this group (or within these 5 rows).

Thanks again,

Chris

## RE: Count unique codes by group

1) Use the retain statement and 'grow' your code string until all codes are in one record. Then run the script you have.

2) Use Proc transpose to transpose your data values into a vertical structure (1 column for all your values by groupID) then use Proc SQL to count with a distinct option.

Personaly, I would use the second option as the first may run out of room if you have a few hundred codes in a group.

Klaz

## RE: Count unique codes by group

## RE: Count unique codes by group

## CODE

set your_ds end = last;

array strsort (*) $500 strs1-strs5000 ;

wrdtot = countw(code,'|');

do i=1 to wrdtot;

strsort(i) = scan(code,i,'|');

end;

run;

*** DATA MUST BE SORTED BY ID VAR ***;

proc transpose data =f2 out =f3;

by id;

var strs1-strs5000;

run;

*** COUNTS NON-MISSING VALUES ***;

proc sql;

create table f4 as

select id , count(distinct col1) as unq_tot

from f3

group by id;

quit;

I hope this helps you.

Klaz

## RE: Count unique codes by group

First, thanks so much for you help. The code you posted ran fine but it is only giving me the unique (or distinct) count for col1, which is not the distinct count across the entire group. In the sql statement I need the distinct count across all of the columns (i.e. col1, col2, col3, etc.). Any ideas?

Thanks again,

Chris

## RE: Count unique codes by group

Look at the transposed dataset and you will find a column that has the original transposed variable name. Add that column to both the select and group statements.

select id, thetransposedcolumnname, count(distinct col1) as...

group by id, thetransposedcolumnname

change the above variable name to the correct one (I can't recall off-hand what that name is)

klaz

## RE: Count unique codes by group

Thanks again for getting back to me. I do want the unique total across all columns. The code is not giving that result. Here is the code that you sent that I ran:

data f2;

set cltest end = last;

array strsort (*) $500 strs1-strs5000 ;

wrdtot = countw(code,'|');

do i=1 to wrdtot;

strsort(i) = scan(code,i,'|');

end;

run;

proc sort data=f2; by id; run;

proc transpose data =f2 out =f3;

by id;

var strs1-strs5000;

run;

*** COUNTS NON-MISSING VALUES ***;

proc sql;

create table f4 as

select id , count(distinct col1) as unq_tot

from f3

group by id;

quit;

The input data file, cltest, is available here:

htt

For example there are 11 unique CODES for ID 601 in the cltest dataset. The codes are 178, 181, 186, 278, 204, 816, 208, 212, 214, 176, and 202. However the dataset f4 only shows 9 unique codes for ID 601, as it is only counting the unique observations in the first transposed column (col1).

Sorry for so many questions. This stuff seems so easy for you. If you have any ideas to fix this problem it would really help me out a lot. Again, I really appreciate all of your help.

thanks,

chris

## RE: Count unique codes by group

Hopefully this is what you are looking for (one of many ways):

First transposing, then getting the unique group/counts and using proc summary to perform the counts. The completetypes and classdata options allow proc summary to retrieve the groups that have 0 codes.

## CODE

infile cards missover ;

input group codes :$20. ;

cards ;

1 |231|322|414|

1 |231|322|

2 |231|

2 |231|114|

3

3

3

;;;;;

proc print;run;

data long (keep=group code);

set have ;

do _n_=1 to countW(codes,'|')-1;

code =input(scan(codes,_n_,'|'),best.) ;

output;

end ;

run;

proc sort data=long out=unique nodupkey ;

by group code ;

run;

proc summary data=unique missing completetypes classdata=have nway ;

class group ;

output out= want(keep=group count) n(code)=count ;

run;

proc print;run;

## RE: Count unique codes by group

Chris