Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations bkrike on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

selecting sample data 1

Status
Not open for further replies.

mpramods

Technical User
Jun 3, 2003
50
US
I have to select sample data from a large table containing millions of rows. I have to follow the following rule to select the sample data.

First I have to select a number that is near to 100 as an interval e.g., suppose I select 99 as my interval.

Then I have to decide a particular number of record from the table e.g, I select # 3.

My first sample record will be the 3rd Record. Subsequently, since 99 is my sampling interval, my 2nd sample record should be Record 102 from the table, my 3rd sample record should be Record 201 etc and so on......

Anybody knows how do I go about selecting the records with the above rule ?

Thanks,
Pramod
 
If you don't assume that records in table are stored in specific order (that is not the case), you may:

select * from (select <table>.*, rownum rn)
where mod(rn, interval) = num

I think that in any case you can not avoid full scan performed at least twice.

Regards, Dima
 
Thanks Dima, but what is rownum, rn and num in the above query

Also I didn’t get the part select * from(select<table>.* in the query.

Can you please explain a little bit.

Thanks,
Pramod
 
MPramods,

Since it's past midnight in the Ukraine (Dima's home) and it's probably after his bedtime, I'll go ahead and respond to your questions in Dima's behalf.

ROWNUM is a zero-argument Oracle function which returns the ordinal number of a row from a query. The ROWNUM value for any row equals 1 + the number of rows that preceded this row in the RESULT set. (Notice RESULT set, not TABLE.)

In your original request, you wanted every 100th row. It might be nice, then, to say:

SELECT * from table_name where mod(rownum,100) = 0;

...which logically, should display every 100th row. But remember, ROWNUM increments only when a row is welcomed into the RESULT set. With the code, above, which TABLE row is the first row to join the RESULT set? ROWNUM remains zero for every row in the table since no other row ever makes it past the WHERE clause criterion.

So, in Dima's clever code, he executes an &quot;inner&quot; query that unrestrictively assigns a ROWNUM value to each row, and (vitally) assigns an alias (&quot;RN&quot;) to the ROWNUM value to retain the ROWNUM value beyond the life of the inner query. Next, the results from the inner query (which proceed to the outer query with a &quot;temporarily static&quot; ROWNUM, thanks to the RN alias) attempt to survive the outer WHERE clause, &quot;WHERE mod(rn,100)=0&quot;. And, yes, every 100 rows, the MOD remainder = 0.

Here is sample code that uses the Oracle Education &quot;S_EMP&quot; table to illustrates the concept:
Code:
Query 1 (that shows the full set of ROWNUM-bered data):
select last_name, rownum from s_emp;

LAST_NAME                     ROWNUM
------------------------- ----------
Velasquez                          1
Ngao                               2
Nagayama                           3
Quick-To-See                       4
Ropeburn                           5
Urguhart                           6
Menchu                             7
Biri                               8
Catchpole                          9
Havel                             10
Magee                             11
Giljum                            12
Sedeghi                           13
Nguyen                            14
Dumas                             15
Maduro                            16
Smith                             17
Nozaki                            18
Patel                             19
Newman                            20
Markarian                         21
Chang                             22
Patel                             23
Dancs                             24
Schwartz                          25

25 rows selected.

Query 2 (, the naively simple, but unsuccessful query):

select last_name from s_emp where mod(rownum,2)=0;

no rows selected

Query 3 (, the successful variant of Dima's code, which selects every other row from S_EMP):

select last_name, rn from (select last_name, rownum rn from s_emp)
where mod(rn,2) = 0

LAST_NAME                         RN
------------------------- ----------
Ngao                               2
Quick-To-See                       4
Urguhart                           6
Biri                               8
Havel                             10
Giljum                            12
Nguyen                            14
Maduro                            16
Nozaki                            18
Newman                            20
Chang                             22
Dancs                             24

12 rows selected.
====================================
The second part of your question: 
(what does) &quot;select<table>.* in the query&quot; (mean)?

Dima's code originally read (in part):
&quot;...select <table>.*, rownum rn...&quot;
This means, &quot;Display all columns (*) from whatever table name you use instead of <table>, then append to that the results of the function, &quot;ROWNUM&quot;, for that row, and assign an alias of &quot;RN&quot; to the resulting value.

If any questions linger, please followup here.

Dave
Sandy, Utah, USA @ 22:14 GMT, 15:14 Mountain Time
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top