From time to time features involving some sort of "randomness" are necessity. Although SQL Server is capable of going random, things don't always behave as expected - at least compared to procedural languages. Purpose of this FAQ is to explore common pitfalls and provide code examples that work in a set-based fashion - with minimal looping of any kind and no dreaded cursors at all.
RAND() fundamentals
OK. We know there is RAND() function. What next?
According to Books Online, RAND() returns "
random float value from 0 through 1". Apparently this means [0, 1) - 0.000 can be generated (albeit very rarely) but 1.000 can not. By using simple linear math this interval can be translated/stretched to suit real-life needs. Here are two common examples:
Code:
-- random float from 0 up to 20 - [0, 20)
SELECT 20*RAND()
-- random float from 10 up to 30 - [10, 30)
SELECT 10 + (30-10)*RAND()
When it comes to random integers, things are a little bit different:
Code:
-- random integer between 0 and 20 - [0, 20]
SELECT CONVERT(int, (20+1)*RAND())
-- random integer between 10 and 30 - [10, 30]
SELECT 10 + CONVERT(int, (30-10+1)*RAND())
Why +1 stuff? Number of integers between 0 and 20 (including boundaries) is 21, not 20. Without added +1 RAND() would never generate value 20. This way it has an equal chance to be generated as 0, 5, 13 or any other number within specified interval.
Function determinism
When called with same input arguments (if any), some functions always return the same value. These functions are
deterministic. Some functions don't behave that way and are therefore
nondeterministic. Naturally, RAND() belongs to second group. Because of "unpredictable" output such functions cannot be used for creation of persistent DB objects, including indexed views and indexes over computed column.
Arguably no less important, certain ND functions
cannot be used within user-defined functions (UDFs). List includes RAND(), GETDATE(), GETUTCDATE() and NEWID(). Why is that so, we can only guess
![[borg2] [borg2] [borg2]](/data/assets/smilies/borg2.gif)
. Anyway, wanna write function called fn_rollDice - forgetaboutit

. *
* Valid claim for SQL Server 2000. Behaviour is
changed in 2005; GETDATE() in UDFs is now allowed.
![[smile] [smile] [smile]](/data/assets/smilies/smile.gif)
.
Seed value
When supplied with optional integer seed value, RAND() shows deterministic behaviour. To illustrate:
Code:
SELECT RAND(2)
WAITFOR DELAY '00:00:02'
SELECT RAND(2)
USE Northwind
SELECT RAND(2)
USE myDB
While generated number appears to be random, it is in fact
pseudo-random. This determinism can be used for generating
sequence of pseudo-random numbers:
Code:
declare @seed int; set @seed = 391247
declare @randnum int
declare @t table( randnum int )
DECLARE @cnt int; SET @cnt = 0
WHILE @cnt <=100
BEGIN
SET @cnt = @cnt + 1
SET @randnum = CONVERT(int, 1e6 * rand(@seed))
INSERT INTO @t SELECT @randnum
SET @seed = @randnum
END
select randnum from @t
In order to reconstruct (remember?) 100 numbers from above, all you have to know is initial seed - and have identical pseudo-random number generator. This opens space for some interesting applications, most of which fall beyond scope of this FAQ.
Another purpose of seed is to "initialize" random number generator with... random value. Kind of shuffling dices before you let 'em roll. Example:
Code:
SELECT RAND(RAND()*999983)
However, with
time-based seeds there is always the danger that subsequent calls happen too soon, especially on those 3+ GHz room heaters we're running these days. Example (partially taken from BOL):
Code:
DECLARE @t table( randnum float )
DECLARE @cnt int; set @cnt = 0
WHILE @cnt <=10000
BEGIN
set @cnt = @cnt + 1
INSERT INTO @t
SELECT RAND( (DATEPART(mm, GETDATE()) * 100000 )
+ (DATEPART(ss, GETDATE()) * 1000 )
+ DATEPART(ms, GETDATE()) )
END
SELECT randnum, count(*)
FROM @t
GROUP BY randnum
As a result subsequent calls may produce
identical values - and overall, complete distribution is far from random.
RAND() and set-based operations
Let's make small sample table first:
Code:
CREATE TABLE myTable
( id int identity(1, 1) primary key,
name varchar(20),
randnum int
)
INSERT INTO myTable(name) VALUES('Joe')
INSERT INTO myTable(name) VALUES('Bill')
INSERT INTO myTable(name) VALUES('Mary')
INSERT INTO myTable(name) VALUES('Fred')
INSERT INTO myTable(name) VALUES('Sue')
Our objective is to assign random number in range [1, 10] to each row individually. Instant answer is usually:
Code:
UPDATE myTable
SET randnum = 1 + 10*RAND()
SELECT * FROM myTable
Ouch. Doesn't work. Optimizer evaluates RAND()
only once per query - seed remains unchanged - and all generated values are the same. At this point it seems looping is inevitable.. not.
Using NEWID() and CHECKSUM()
NEWID() function also generates random value. Bad thing is about type of returned value (uniqueidentifier). Good thing is that NEWID() is "super-nondeterministic". It returns different value for each call, even within a single SELECT query. To illustrate:
Code:
SELECT name, NEWID()
FROM myTable
To solve bad thing we must somehow convert GUID to integer seed. That's where CHECKSUM() comes handy:
Code:
UPDATE myTable
SET randnum = 1 + 10*RAND(CHECKSUM(NEWID()))
If you need pseudo-random values that depend on row contents, try CHECKSUM(*):
Code:
UPDATE myTable
SET randnum = 1 + 10*RAND(CHECKSUM(*))
Both functions - especially NEWID() - are very useful for solving various "random" problems.
Random TOP N rows
Simple:
Code:
SELECT TOP 3 *
FROM myTable
ORDER BY NEWID()
Unique random numbers
How to fill above table with random numbers, and ensure there will be no duplicates? Try this:
Code:
SELECT IDENTITY(int, 1, 1) as randid, id
INTO #tmp
FROM
( SELECT TOP 100 PERCENT id
FROM myTable
ORDER BY NEWID()
) X
UPDATE myTable
SET randnum = #tmp.randid
FROM myTable
INNER JOIN #tmp ON myTable.id=#tmp.id
DROP TABLE #tmp
SELECT * FROM myTable
Here is singleton SELECT INTO statement used to fill temp table with identity column
in random order. After that UPDATE with INNER JOIN gives us unique random values back to source table. Generated values fall in range [1, COUNT(*)], as dictated by identity. TOP 100 PERCENT prevents optimizer from being too smart
![[pipe] [pipe] [pipe]](/data/assets/smilies/pipe.gif)
.
Side note: INSERT with ORDER BY is... well, not "clean" SQL. While theory says that order of insertions is irrelevant, identity column fundamentally breaks theoretical stuff - as long as SQL Server allows such constructs.
Random numbers per group
To make new playground let's duplicate sample data:
Code:
UPDATE myTable
SET randnum = NULL
INSERT INTO myTable (name, randnum)
SELECT name, randnum
FROM myTable
We'll assume that name is grouping column. The goal is to assign different random numbers in range [1, 10] for each group. Numbers inside group must be the same.
Code:
UPDATE T
SET randnum = Y.randnum
FROM myTable T
INNER JOIN
( SELECT TOP 100 PERCENT name, 1+10*RAND(CHECKSUM(NEWID())) AS randnum
FROM
( SELECT DISTINCT name
FROM myTable
) X
ORDER BY name
) Y ON T.name=Y.name
SELECT *
FROM myTable
ORDER BY name
Again, TOP 100 PERCENT with explicit ORDER BY gets the job done. Same code without these additions won't produce desired results (aka: try it).
If that looks clunky, the same thing can be done with temp table:
Code:
SELECT DISTINCT name, CONVERT(int, NULL) as randnum
INTO #tmp
FROM myTable
UPDATE #tmp
SET randnum = 1+10*RAND(CHECKSUM(NEWID()))
UPDATE myTable
SET randnum = #tmp.randnum
FROM myTable
INNER JOIN #tmp on myTable.name=#tmp.name
DROP TABLE #tmp
------------------
Revision history
2005/08/31 - typos, typos, typos
2005/08/30 - initial draft. Added some smileys 
2006/02/12 - minor addendum about GETDATE()/UDF changes in SQL2005