There are lots of ways of doing this stuff. One is to flag the outlier based on the the number of Standard Deviations it is away from the mean. In the example below, I used 1.5 Standard Deviations ...
You can cut and paste this code as a new little program, run it step by step and see the results.
You might want to play around with the value of 1.5, 2, etc. What I like about this, is the choice of 1.5 or 2 or whatever is completely independent of the type of data (i.e. Winter Temperatures, New Car prices ... it's all relative to the data it's being applied to).
Here's the code
---------------------------------------
* Quick macro to make a sample of 100 obs ;
* all between 0 and 1 ;
%macro makedata ;
%do i=1 %to 100 ;
RANDNUM = ranuni(1234) ; output ;
%end;
%mend makedata ; run;
data SAMPLE100 ;
%makedata ;
format RANDNUM 9.4 ;
run;
* Force observation #50 to be relatively huge ;
data SAMPLE100 ;
set SAMPLE100 ;
if _n_ = 50 then RANDNUM = 1000 ;
run;
* Identify the outlier ;
proc means data=SAMPLE100 noprint ;
var RANDNUM ;
output out=STATS (drop=_type_ _freq_)
std=STD_RANDNUM
mean=AVG_RANDNUM ;
run;
* Put these values into macro variables ;
data STATS ;
set STATS ;
call symput("STDDEV",STD_RANDNUM) ;
call symput("AVERAGE",AVG_RANDNUM) ;
run;
* Find the outliners ;
data SAMPLE100 ;
set SAMPLE100 ;
* If the value is more than 1.5 Standard Deviations ;
* from the mean, flag it as an outlier ;
if abs( (RANDNUM-&AVERAGE.) / &STDDEV ) > 1.5
then OUTLIER='Y' ;
else OUTLIER='N' ;
run;
Alan J. Volkert
Fleet Services
GE Commercial Finance Capital Solutions
(World's longest company title)
Eden Prairie, MN