From time to time features involving some sort of "randomness" are necessity. Although SQL Server is capable of going random, things don't always behave as expected - at least compared to procedural languages. Purpose of this FAQ is to explore common pitfalls and provide code examples that work in a set-based fashion - with minimal looping of any kind and no dreaded cursors at all.
OK. We know there is RAND() function. What next?
According to Books Online, RAND() returns "random float value from 0 through 1". Apparently this means [0, 1) - 0.000 can be generated (albeit very rarely) but 1.000 can not. By using simple linear math this interval can be translated/stretched to suit real-life needs. Here are two common examples:
-- random float from 0 up to 20 - [0, 20) SELECT 20*RAND()
-- random float from 10 up to 30 - [10, 30) SELECT 10 + (30-10)*RAND()
When it comes to random integers, things are a little bit different:
-- random integer between 0 and 20 - [0, 20] SELECT CONVERT(int, (20+1)*RAND())
-- random integer between 10 and 30 - [10, 30] SELECT 10 + CONVERT(int, (30-10+1)*RAND())
Why +1 stuff? Number of integers between 0 and 20 (including boundaries) is 21, not 20. Without added +1 RAND() would never generate value 20. This way it has an equal chance to be generated as 0, 5, 13 or any other number within specified interval.
When called with same input arguments (if any), some functions always return the same value. These functions are deterministic. Some functions don't behave that way and are therefore nondeterministic. Naturally, RAND() belongs to second group. Because of "unpredictable" output such functions cannot be used for creation of persistent DB objects, including indexed views and indexes over computed column.
Arguably no less important, certain ND functions cannot be used within user-defined functions (UDFs). List includes RAND(), GETDATE(), GETUTCDATE() and NEWID(). Why is that so, we can only guess . Anyway, wanna write function called fn_rollDice - forgetaboutit :(. *
* Valid claim for SQL Server 2000. Behaviour is changed in 2005; GETDATE() in UDFs is now allowed. .
When supplied with optional integer seed value, RAND() shows deterministic behaviour. To illustrate:
SELECT RAND(2) WAITFOR DELAY '00:00:02' SELECT RAND(2) USE Northwind SELECT RAND(2) USE myDB
While generated number appears to be random, it is in fact pseudo-random. This determinism can be used for generating sequence of pseudo-random numbers:
declare @seed int; set @seed = 391247 declare @randnum int declare @t table( randnum int )
DECLARE @cnt int; SET @cnt = 0 WHILE @cnt <=100 BEGIN SET @cnt = @cnt + 1 SET @randnum = CONVERT(int, 1e6 * rand(@seed)) INSERT INTO @t SELECT @randnum SET @seed = @randnum END
select randnum from @t
In order to reconstruct (remember?) 100 numbers from above, all you have to know is initial seed - and have identical pseudo-random number generator. This opens space for some interesting applications, most of which fall beyond scope of this FAQ.
Another purpose of seed is to "initialize" random number generator with... random value. Kind of shuffling dices before you let 'em roll. Example:
However, with time-based seeds there is always the danger that subsequent calls happen too soon, especially on those 3+ GHz room heaters we're running these days. Example (partially taken from BOL):
DECLARE @t table( randnum float )
DECLARE @cnt int; set @cnt = 0 WHILE @cnt <=10000 BEGIN set @cnt = @cnt + 1 INSERT INTO @t SELECT RAND( (DATEPART(mm, GETDATE()) * 100000 ) + (DATEPART(ss, GETDATE()) * 1000 ) + DATEPART(ms, GETDATE()) ) END
SELECT randnum, count(*) FROM @t GROUP BY randnum
As a result subsequent calls may produce identical values - and overall, complete distribution is far from random.
RAND() and set-based operations
Let's make small sample table first:
CREATE TABLE myTable ( id int identity(1, 1) primary key, name varchar(20), randnum int )
INSERT INTO myTable(name) VALUES('Joe') INSERT INTO myTable(name) VALUES('Bill') INSERT INTO myTable(name) VALUES('Mary') INSERT INTO myTable(name) VALUES('Fred') INSERT INTO myTable(name) VALUES('Sue')
Our objective is to assign random number in range [1, 10] to each row individually. Instant answer is usually:
UPDATE myTable SET randnum = 1 + 10*RAND()
SELECT * FROM myTable
Ouch. Doesn't work. Optimizer evaluates RAND() only once per query - seed remains unchanged - and all generated values are the same. At this point it seems looping is inevitable.. not.
Using NEWID() and CHECKSUM()
NEWID() function also generates random value. Bad thing is about type of returned value (uniqueidentifier). Good thing is that NEWID() is "super-nondeterministic". It returns different value for each call, even within a single SELECT query. To illustrate:
SELECT name, NEWID() FROM myTable
To solve bad thing we must somehow convert GUID to integer seed. That's where CHECKSUM() comes handy:
UPDATE myTable SET randnum = 1 + 10*RAND(CHECKSUM(NEWID()))
If you need pseudo-random values that depend on row contents, try CHECKSUM(*):
UPDATE myTable SET randnum = 1 + 10*RAND(CHECKSUM(*))
Both functions - especially NEWID() - are very useful for solving various "random" problems.
Random TOP N rows
SELECT TOP 3 * FROM myTable ORDER BY NEWID()
Unique random numbers
How to fill above table with random numbers, and ensure there will be no duplicates? Try this:
SELECT IDENTITY(int, 1, 1) as randid, id INTO #tmp FROM ( SELECT TOP 100 PERCENT id FROM myTable ORDER BY NEWID() ) X
UPDATE myTable SET randnum = #tmp.randid FROM myTable INNER JOIN #tmp ON myTable.id=#tmp.id
DROP TABLE #tmp
SELECT * FROM myTable
Here is singleton SELECT INTO statement used to fill temp table with identity column in random order. After that UPDATE with INNER JOIN gives us unique random values back to source table. Generated values fall in range [1, COUNT(*)], as dictated by identity. TOP 100 PERCENT prevents optimizer from being too smart .
Side note: INSERT with ORDER BY is... well, not "clean" SQL. While theory says that order of insertions is irrelevant, identity column fundamentally breaks theoretical stuff - as long as SQL Server allows such constructs.
Random numbers per group
To make new playground let's duplicate sample data:
UPDATE myTable SET randnum = NULL
INSERT INTO myTable (name, randnum) SELECT name, randnum FROM myTable
We'll assume that name is grouping column. The goal is to assign different random numbers in range [1, 10] for each group. Numbers inside group must be the same.
UPDATE T SET randnum = Y.randnum FROM myTable T INNER JOIN ( SELECT TOP 100 PERCENT name, 1+10*RAND(CHECKSUM(NEWID())) AS randnum FROM ( SELECT DISTINCT name FROM myTable ) X ORDER BY name ) Y ON T.name=Y.name
SELECT * FROM myTable ORDER BY name
Again, TOP 100 PERCENT with explicit ORDER BY gets the job done. Same code without these additions won't produce desired results (aka: try it).
If that looks clunky, the same thing can be done with temp table:
SELECT DISTINCT name, CONVERT(int, NULL) as randnum INTO #tmp FROM myTable
UPDATE #tmp SET randnum = 1+10*RAND(CHECKSUM(NEWID()))
UPDATE myTable SET randnum = #tmp.randnum FROM myTable INNER JOIN #tmp on myTable.name=#tmp.name
DROP TABLE #tmp
------------------ Revision history 2005/08/31 - typos, typos, typos 2005/08/30 - initial draft. Added some smileys :P 2006/02/12 - minor addendum about GETDATE()/UDF changes in SQL2005