data step or proc sql for performance time
data step or proc sql for performance time
(OP)
data lib.NewTable (keep=ID Provider);
set table1
table2
table3;
if status = '3';
if code = '2';
run;
compared to
proc sql;
create table lib.NewTable as
select ID,Provide
From
table1 a
left outer join
table2 b
on a.ID=b.ID
left outer join table3 c
on a.ID=c.ID
where status = '3'
and code = '2'
quit;
I want to do is include ID and Provider from all tables regardless. I am trying to reduce processing time as table1 2 and 3 have thousands of records. If proc sql is not the answer in this case, I am looking for ways to reduce processing time using the data step method (current method)
set table1
table2
table3;
if status = '3';
if code = '2';
run;
compared to
proc sql;
create table lib.NewTable as
select ID,Provide
From
table1 a
left outer join
table2 b
on a.ID=b.ID
left outer join table3 c
on a.ID=c.ID
where status = '3'
and code = '2'
quit;
I want to do is include ID and Provider from all tables regardless. I am trying to reduce processing time as table1 2 and 3 have thousands of records. If proc sql is not the answer in this case, I am looking for ways to reduce processing time using the data step method (current method)
RE: data step or proc sql for performance time
In your example you are trying to do different things I think: the datastep simply appends 3 tables and keeps those records where the status and code match as defined - no merge here. To accelerate that you can put a where clause on each single table: "status eq '3' and code eq '2'". Could do a bit, subsetting at the source is always good, but is not so significant if the table is not wide.
The SQL is a join, something completely different, so it would be interesting here what table each variable comes from. Or was "union" the intention because all tables have the same layout as implied by the first datastep? In case of the latter, the system has to read each table completely, with limiting the I/O to the referenced variables (KEEP statement or variable list in SQL) I see no performance difference... via SAS/SHARE I don't know, too long ago since I used that. Limit fairly much should be physically defined by CPU and disk / storage when you work locally and have "append".
RE: data step or proc sql for performance time