Random Number Problem
Random Number Problem
(OP)
Hi there,
I am trying to get 200 (records) sample data from 50,000 records. I am using the RDNORM function but it's not giving me the random data, instead the output is in chronological order. Here's the code:
TABLE FILE BAN
PRINT
BAN
BAN_STATUS
BAN_TYPE
BAN_AR_BALANCE
COMPUTE RANDOM/D10.2=RDNORM(RANDOM);
BY BAN_BAN
WHERE READLIMIT EQ 50000
ON TABLE HOLD AS 'RDM2A'
END
TABLE FILE RDM2A
PRINT
BAN
BAN_STATUS
BAN_TYPE
BAN_AR_BALANCE
RANDOM
BY BAN_BAN
WHERE RANDOM LT 0
WHERE RECORDLIMIT EQ 100
ON TABLE PCHOLD FORMAT EXCEL
END
Any help would be much appreciated. Thank You.
I am trying to get 200 (records) sample data from 50,000 records. I am using the RDNORM function but it's not giving me the random data, instead the output is in chronological order. Here's the code:
TABLE FILE BAN
BAN
BAN_STATUS
BAN_TYPE
BAN_AR_BALANCE
COMPUTE RANDOM/D10.2=RDNORM(RANDOM);
BY BAN_BAN
WHERE READLIMIT EQ 50000
ON TABLE HOLD AS 'RDM2A'
END
TABLE FILE RDM2A
BAN
BAN_STATUS
BAN_TYPE
BAN_AR_BALANCE
RANDOM
BY BAN_BAN
WHERE RANDOM LT 0
WHERE RECORDLIMIT EQ 100
ON TABLE PCHOLD FORMAT EXCEL
END
Any help would be much appreciated. Thank You.
RE: Random Number Problem
For random selection, use the function RDUNIF, which returns a value which is uniformly distributed (all values are equally likely), having values between 0 and 1.
Now, if you want approximately 200 records, you can do the following, in ONE pass:
1. what's the ratio of the number of records desired to the total population? Here, it's 200 out of 50000, or .004.
2. Calculate RDUNIF for each record (as you've done), but ONLY accept records where the result is LE .004
The problem with wanting EXACTLY 200 records is that, then the selection is no longer random (independent). Whether a record gets selected is not only a function of randomness, but also how many records you've already retrieved.
However, if THAT'S what you want, you can do it, but it will take two passes, which you're already doing.
First, calculate the value for RDUNIF with a DEFINE, rather than a COMPUTE. In your first pass, retrieve EVERYTHING, but SORT it by the DEFINEd random number. In your second pass, use RECORDLIMIT to retrieve the first 200 records.
The code might look like this:
CODE
RANDOM/D10.2 WITH BAN=RDUNIF(RANDOM);
END
TABLE FILE BAN
PRINT
BAN
BAN_STATUS
BAN_TYPE
BAN_AR_BALANCE
BAN_BAN
BY RANDOM
WHERE READLIMIT EQ 50000
ON TABLE HOLD AS 'RDM2A'
END
TABLE FILE RDM2A
PRINT
BAN
BAN_STATUS
BAN_TYPE
BAN_AR_BALANCE
RANDOM
BY BAN_BAN
WHERE RECORDLIMIT EQ 200
ON TABLE PCHOLD FORMAT EXCEL
END
Note the use of 'WITH' in the DEFINE. This says to calculate a new value every time you get a different value for the specified field. It should be at the lowest level of the retrieved structure. If the structure is a flat single segment, then ANY field will do.