Contact US

Log In

Come Join Us!

Are you a
Computer / IT professional?
Join Tek-Tips Forums!
  • Talk With Other Members
  • Be Notified Of Responses
    To Your Posts
  • Keyword Search
  • One-Click Access To Your
    Favorite Forums
  • Automated Signatures
    On Your Posts
  • Best Of All, It's Free!

*Tek-Tips's functionality depends on members receiving e-mail. By joining you are opting in to receive e-mail.

Posting Guidelines

Promoting, selling, recruiting, coursework and thesis posting is forbidden.

Students Click Here

data step or proc sql for performance time

data step or proc sql for performance time

data step or proc sql for performance time

data lib.NewTable (keep=ID Provider);
set table1
if status = '3';
if code = '2';

compared to

proc sql;
create table lib.NewTable as
select ID,Provide
table1 a
left outer join
table2 b
on a.ID=b.ID
left outer join table3 c
on a.ID=c.ID
where status = '3'
and code = '2'

I want to do is include ID and Provider from all tables regardless.  I am trying to reduce processing time as table1 2 and 3 have thousands of records.  If proc sql is not the answer in this case, I am looking for ways to reduce processing time using the data step method (current method)

RE: data step or proc sql for performance time

Thousands of records isn't so bad, I don't think you will see much difference either way - unless sitting on >really< slow CPUs and I/O. You'd need more complex joins and lookups with 100k's or millions of records crossed with thousands to see a real difference. Then things like "SET KEY=" on datastep or using HASH objects become interesting.

In your example you are trying to do different things I think: the datastep simply appends 3 tables and keeps those records where the status and code match as defined - no merge here. To accelerate that you can put a where clause on each single table: "status eq '3' and code eq '2'". Could do a bit, subsetting at the source is always good, but is not so significant if the table is not wide.

The SQL is a join, something completely different, so it would be interesting here what table each variable comes from. Or was "union" the intention because all tables have the same layout as implied by the first datastep? In case of the latter, the system has to read each table completely, with limiting the I/O to the referenced variables (KEEP statement or variable list in SQL) I see no performance difference... via SAS/SHARE I don't know, too long ago since I used that. Limit fairly much should be physically defined by CPU and disk / storage when you work locally and have "append".

RE: data step or proc sql for performance time

I have to agree here, when I comes to creating tables/datasets sql or sas base is more or less the same, its when you come to exacting 100k's/million rows of data and/or merging (joining) as you dont have to pre sort a join, whereas a datastep merge you do..

Red Flag This Post

Please let us know here why this post is inappropriate. Reasons such as off-topic, duplicates, flames, illegal, vulgar, or students posting their homework.

Red Flag Submitted

Thank you for helping keep Tek-Tips Forums free from inappropriate posts.
The Tek-Tips staff will check this out and take appropriate action.

Reply To This Thread

Posting in the Tek-Tips forums is a member-only feature.

Click Here to join Tek-Tips and talk with other members! Already a Member? Login

Close Box

Join Tek-Tips® Today!

Join your peers on the Internet's largest technical computer professional community.
It's easy to join and it's free.

Here's Why Members Love Tek-Tips Forums:

Register now while it's still free!

Already a member? Close this window and log in.

Join Us             Close