INTELLIGENT WORK FORUMS FOR COMPUTER PROFESSIONALS
Come Join Us!
- Talk With Other Members
- Be Notified Of Responses
To Your Posts
- Keyword Search
- Turn Off Ad Banners
- One-Click Access To Your
Favorite Forums
- Automated Signatures
On Your Posts
- Best Of All, It's Free!
*Tek-Tips's functionality depends on members receiving e-mail. By joining you are opting in to receive e-mail.
Partner With Us!
"Best Of Breed" Forums Add Stickiness To Your Site

(Download This Button Today!)
Member Feedback
"...I have tons of books, have book marked tons of tutorials, which have helped, but this forum has answered those "impossible to find" solutions. I am thrilled with this site..."
Geography
Where in the world do Tek-Tips members come from?
|
Microsoft SQL Server: Programming FAQ
|
Random Numbers
|
Random numbers: reloaded
Posted: 30 Aug 05 (Edited 12 Feb 06)
|
From time to time features involving some sort of "randomness" are necessity. Although SQL Server is capable of going random, things don't always behave as expected - at least compared to procedural languages. Purpose of this FAQ is to explore common pitfalls and provide code examples that work in a set-based fashion - with minimal looping of any kind and no dreaded cursors at all.
RAND() fundamentals
OK. We know there is RAND() function. What next?
According to Books Online, RAND() returns "random float value from 0 through 1". Apparently this means [0, 1) - 0.000 can be generated (albeit very rarely) but 1.000 can not. By using simple linear math this interval can be translated/stretched to suit real-life needs. Here are two common examples:
CODE-- random float from 0 up to 20 - [0, 20) SELECT 20*RAND()
-- random float from 10 up to 30 - [10, 30) SELECT 10 + (30-10)*RAND() When it comes to random integers, things are a little bit different:
CODE-- random integer between 0 and 20 - [0, 20] SELECT CONVERT(int, (20+1)*RAND())
-- random integer between 10 and 30 - [10, 30] SELECT 10 + CONVERT(int, (30-10+1)*RAND()) Why +1 stuff? Number of integers between 0 and 20 (including boundaries) is 21, not 20. Without added +1 RAND() would never generate value 20. This way it has an equal chance to be generated as 0, 5, 13 or any other number within specified interval.
Function determinism
When called with same input arguments (if any), some functions always return the same value. These functions are deterministic. Some functions don't behave that way and are therefore nondeterministic. Naturally, RAND() belongs to second group. Because of "unpredictable" output such functions cannot be used for creation of persistent DB objects, including indexed views and indexes over computed column.
Arguably no less important, certain ND functions cannot be used within user-defined functions (UDFs). List includes RAND(), GETDATE(), GETUTCDATE() and NEWID(). Why is that so, we can only guess . Anyway, wanna write function called fn_rollDice - forgetaboutit :(. *
* Valid claim for SQL Server 2000. Behaviour is changed in 2005; GETDATE() in UDFs is now allowed. .
Seed value
When supplied with optional integer seed value, RAND() shows deterministic behaviour. To illustrate:
CODESELECT RAND(2) WAITFOR DELAY '00:00:02' SELECT RAND(2) USE Northwind SELECT RAND(2) USE myDB While generated number appears to be random, it is in fact pseudo-random. This determinism can be used for generating sequence of pseudo-random numbers:
CODEdeclare @seed int; set @seed = 391247 declare @randnum int declare @t table( randnum int )
DECLARE @cnt int; SET @cnt = 0 WHILE @cnt <=100 BEGIN SET @cnt = @cnt + 1 SET @randnum = CONVERT(int, 1e6 * rand(@seed)) INSERT INTO @t SELECT @randnum SET @seed = @randnum END
select randnum from @t In order to reconstruct (remember?) 100 numbers from above, all you have to know is initial seed - and have identical pseudo-random number generator. This opens space for some interesting applications, most of which fall beyond scope of this FAQ.
Another purpose of seed is to "initialize" random number generator with... random value. Kind of shuffling dices before you let 'em roll. Example:
CODESELECT RAND(RAND()*999983) However, with time-based seeds there is always the danger that subsequent calls happen too soon, especially on those 3+ GHz room heaters we're running these days. Example (partially taken from BOL):
CODEDECLARE @t table( randnum float )
DECLARE @cnt int; set @cnt = 0 WHILE @cnt <=10000 BEGIN set @cnt = @cnt + 1 INSERT INTO @t SELECT RAND( (DATEPART(mm, GETDATE()) * 100000 ) + (DATEPART(ss, GETDATE()) * 1000 ) + DATEPART(ms, GETDATE()) ) END
SELECT randnum, count(*) FROM @t GROUP BY randnum As a result subsequent calls may produce identical values - and overall, complete distribution is far from random.
RAND() and set-based operations
Let's make small sample table first:
CODECREATE TABLE myTable ( id int identity(1, 1) primary key, name varchar(20), randnum int )
INSERT INTO myTable(name) VALUES('Joe') INSERT INTO myTable(name) VALUES('Bill') INSERT INTO myTable(name) VALUES('Mary') INSERT INTO myTable(name) VALUES('Fred') INSERT INTO myTable(name) VALUES('Sue') Our objective is to assign random number in range [1, 10] to each row individually. Instant answer is usually:
CODEUPDATE myTable SET randnum = 1 + 10*RAND()
SELECT * FROM myTable Ouch. Doesn't work. Optimizer evaluates RAND() only once per query - seed remains unchanged - and all generated values are the same. At this point it seems looping is inevitable.. not.
Using NEWID() and CHECKSUM()
NEWID() function also generates random value. Bad thing is about type of returned value (uniqueidentifier). Good thing is that NEWID() is "super-nondeterministic". It returns different value for each call, even within a single SELECT query. To illustrate:
CODESELECT name, NEWID() FROM myTable To solve bad thing we must somehow convert GUID to integer seed. That's where CHECKSUM() comes handy:
CODEUPDATE myTable SET randnum = 1 + 10*RAND(CHECKSUM(NEWID())) If you need pseudo-random values that depend on row contents, try CHECKSUM(*):
CODEUPDATE myTable SET randnum = 1 + 10*RAND(CHECKSUM(*)) Both functions - especially NEWID() - are very useful for solving various "random" problems.
Random TOP N rows
Simple:
CODESELECT TOP 3 * FROM myTable ORDER BY NEWID()
Unique random numbers
How to fill above table with random numbers, and ensure there will be no duplicates? Try this:
CODESELECT IDENTITY(int, 1, 1) as randid, id INTO #tmp FROM ( SELECT TOP 100 PERCENT id FROM myTable ORDER BY NEWID() ) X
UPDATE myTable SET randnum = #tmp.randid FROM myTable INNER JOIN #tmp ON myTable.id=#tmp.id
DROP TABLE #tmp
SELECT * FROM myTable Here is singleton SELECT INTO statement used to fill temp table with identity column in random order. After that UPDATE with INNER JOIN gives us unique random values back to source table. Generated values fall in range [1, COUNT(*)], as dictated by identity. TOP 100 PERCENT prevents optimizer from being too smart .
Side note: INSERT with ORDER BY is... well, not "clean" SQL. While theory says that order of insertions is irrelevant, identity column fundamentally breaks theoretical stuff - as long as SQL Server allows such constructs.
Random numbers per group
To make new playground let's duplicate sample data:
CODEUPDATE myTable SET randnum = NULL
INSERT INTO myTable (name, randnum) SELECT name, randnum FROM myTable We'll assume that name is grouping column. The goal is to assign different random numbers in range [1, 10] for each group. Numbers inside group must be the same.
CODEUPDATE T SET randnum = Y.randnum FROM myTable T INNER JOIN ( SELECT TOP 100 PERCENT name, 1+10*RAND(CHECKSUM(NEWID())) AS randnum FROM ( SELECT DISTINCT name FROM myTable ) X ORDER BY name ) Y ON T.name=Y.name
SELECT * FROM myTable ORDER BY name Again, TOP 100 PERCENT with explicit ORDER BY gets the job done. Same code without these additions won't produce desired results (aka: try it).
If that looks clunky, the same thing can be done with temp table:
CODESELECT DISTINCT name, CONVERT(int, NULL) as randnum INTO #tmp FROM myTable
UPDATE #tmp SET randnum = 1+10*RAND(CHECKSUM(NEWID()))
UPDATE myTable SET randnum = #tmp.randnum FROM myTable INNER JOIN #tmp on myTable.name=#tmp.name
DROP TABLE #tmp
------------------ Revision history 2005/08/31 - typos, typos, typos 2005/08/30 - initial draft. Added some smileys :P 2006/02/12 - minor addendum about GETDATE()/UDF changes in SQL2005 |
Back to Microsoft SQL Server: Programming FAQ Index
Back to Microsoft SQL Server: Programming Forum
My FAQ Archive
Email This FAQ To A Friend |
|
 |
|