INTELLIGENT WORK FORUMS
FOR COMPUTER PROFESSIONALS

Log In

Come Join Us!

Are you a
Computer / IT professional?
Join Tek-Tips Forums!
  • Talk With Other Members
  • Be Notified Of Responses
    To Your Posts
  • Keyword Search
  • One-Click Access To Your
    Favorite Forums
  • Automated Signatures
    On Your Posts
  • Best Of All, It's Free!

*Tek-Tips's functionality depends on members receiving e-mail. By joining you are opting in to receive e-mail.

Posting Guidelines

Promoting, selling, recruiting, coursework and thesis posting is forbidden.

Jobs

Random Numbers

Random numbers: reloaded by vongrunt
Posted: 30 Aug 05 (Edited 12 Feb 06)

From time to time features involving some sort of "randomness" are necessity. Although SQL Server is capable of going random, things don't always behave as expected - at least compared to procedural languages. Purpose of this FAQ is to explore common pitfalls and provide code examples that work in a set-based fashion - with minimal looping of any kind and no dreaded cursors at all.


RAND() fundamentals

OK. We know there is RAND() function. What next?

According to Books Online, RAND() returns "random float value from 0 through 1". Apparently this means [0, 1) - 0.000 can be generated (albeit very rarely) but 1.000 can not. By using simple linear math this interval can be translated/stretched to suit real-life needs. Here are two common examples:

CODE

-- random float from 0 up to 20 - [0, 20)
SELECT 20*RAND()

-- random float from 10 up to 30 - [10, 30)
SELECT 10 + (30-10)*RAND()

When it comes to random integers, things are a little bit different:

CODE

-- random integer between 0 and 20 - [0, 20]
SELECT CONVERT(int, (20+1)*RAND())

-- random integer between 10 and 30 - [10, 30]
SELECT 10 + CONVERT(int, (30-10+1)*RAND())

Why +1 stuff? Number of integers between 0 and 20 (including boundaries) is 21, not 20. Without added +1 RAND() would never generate value 20. This way it has an equal chance to be generated as 0, 5, 13 or any other number within specified interval.


Function determinism

When called with same input arguments (if any), some functions always return the same value. These functions are deterministic. Some functions don't behave that way and are therefore nondeterministic. Naturally, RAND() belongs to second group. Because of "unpredictable" output such functions cannot be used for creation of persistent DB objects, including indexed views and indexes over computed column.

Arguably no less important, certain ND functions cannot be used within user-defined functions (UDFs). List includes RAND(), GETDATE(), GETUTCDATE() and NEWID(). Why is that so, we can only guess borg2. Anyway, wanna write function called fn_rollDice - forgetaboutit :(. *

* Valid claim for SQL Server 2000. Behaviour is changed in 2005; GETDATE() in UDFs is now allowed. smile.

Seed value

When supplied with optional integer seed value, RAND() shows deterministic behaviour. To illustrate:

CODE

SELECT RAND(2)
WAITFOR DELAY '00:00:02'
SELECT RAND(2)
USE Northwind
SELECT RAND(2)
USE myDB
While generated number appears to be random, it is in fact pseudo-random. This determinism can be used for generating sequence of pseudo-random numbers:

CODE

declare @seed int; set @seed = 391247
declare @randnum int
declare @t table( randnum int )

DECLARE @cnt int; SET @cnt = 0
WHILE @cnt <=100
BEGIN
    SET @cnt = @cnt + 1
    SET @randnum = CONVERT(int, 1e6 * rand(@seed))
    INSERT INTO @t SELECT @randnum
    SET @seed = @randnum
END

select randnum from @t
In order to reconstruct (remember?) 100 numbers from above, all you have to know is initial seed - and have identical pseudo-random number generator. This opens space for some interesting applications, most of which fall beyond scope of this FAQ.

Another purpose of seed is to "initialize" random number generator with... random value. Kind of shuffling dices before you let 'em roll. Example:

CODE

SELECT RAND(RAND()*999983)

However, with time-based seeds there is always the danger that subsequent calls happen too soon, especially on those 3+ GHz room heaters we're running these days. Example (partially taken from BOL):

CODE

DECLARE @t table( randnum float )

DECLARE @cnt int; set @cnt = 0
WHILE  @cnt <=10000
BEGIN
    set @cnt = @cnt + 1
    INSERT INTO @t
    SELECT RAND( (DATEPART(mm, GETDATE()) * 100000 )
               + (DATEPART(ss, GETDATE()) * 1000 )
               + DATEPART(ms, GETDATE()) )
END

SELECT randnum, count(*)
FROM @t
GROUP BY randnum
As a result subsequent calls may produce identical values - and overall, complete distribution is far from random.


RAND() and set-based operations

Let's make small sample table first:

CODE

CREATE TABLE myTable
(    id int identity(1, 1) primary key,
    name varchar(20),
    randnum int
)

INSERT INTO myTable(name) VALUES('Joe')
INSERT INTO myTable(name) VALUES('Bill')
INSERT INTO myTable(name) VALUES('Mary')
INSERT INTO myTable(name) VALUES('Fred')
INSERT INTO myTable(name) VALUES('Sue')

Our objective is to assign random number in range [1, 10] to each row individually. Instant answer is usually:

CODE

UPDATE myTable
SET randnum = 1 + 10*RAND()

SELECT * FROM myTable
Ouch. Doesn't work. Optimizer evaluates RAND() only once per query - seed remains unchanged - and all generated values are the same. At this point it seems looping is inevitable.. not.


Using NEWID() and CHECKSUM()

NEWID() function also generates random value. Bad thing is about type of returned value (uniqueidentifier). Good thing is that NEWID() is "super-nondeterministic". It returns different value for each call, even within a single SELECT query. To illustrate:

CODE

SELECT name, NEWID()
FROM myTable

To solve bad thing we must somehow convert GUID to integer seed. That's where CHECKSUM() comes handy:

CODE

UPDATE myTable
SET randnum = 1 + 10*RAND(CHECKSUM(NEWID()))

If you need pseudo-random values that depend on row contents, try CHECKSUM(*):

CODE

UPDATE myTable
SET randnum = 1 + 10*RAND(CHECKSUM(*))

Both functions - especially NEWID() - are very useful for solving various "random" problems.


Random TOP N rows

Simple:

CODE

SELECT TOP 3 *
FROM myTable
ORDER BY NEWID()


Unique random numbers

How to fill above table with random numbers, and ensure there will be no duplicates? Try this:

CODE

SELECT IDENTITY(int, 1, 1) as randid, id
INTO #tmp
FROM
(    SELECT TOP 100 PERCENT id
    FROM myTable
    ORDER BY NEWID()
) X

UPDATE myTable
SET randnum = #tmp.randid
FROM myTable
INNER JOIN #tmp ON myTable.id=#tmp.id

DROP TABLE #tmp

SELECT * FROM myTable
Here is singleton SELECT INTO statement used to fill temp table with identity column in random order. After that UPDATE with INNER JOIN gives us unique random values back to source table. Generated values fall in range [1, COUNT(*)], as dictated by identity. TOP 100 PERCENT prevents optimizer from being too smart pipe.

Side note: INSERT with ORDER BY is... well, not "clean" SQL. While theory says that order of insertions is irrelevant, identity column fundamentally breaks theoretical stuff - as long as SQL Server allows such constructs.


Random numbers per group

To make new playground let's duplicate sample data:

CODE

UPDATE myTable
SET randnum = NULL

INSERT INTO myTable (name, randnum)
SELECT name, randnum
FROM myTable
We'll assume that name is grouping column. The goal is to assign different random numbers in range [1, 10] for each group. Numbers inside group must be the same.

CODE

UPDATE T
SET randnum = Y.randnum
FROM myTable T
INNER JOIN
(    SELECT TOP 100 PERCENT name, 1+10*RAND(CHECKSUM(NEWID())) AS randnum
    FROM
    (    SELECT DISTINCT name
        FROM myTable
    ) X
    ORDER BY name
) Y ON T.name=Y.name

SELECT *
FROM myTable
ORDER BY name
Again, TOP 100 PERCENT with explicit ORDER BY gets the job done. Same code without these additions won't produce desired results (aka: try it).

If that looks clunky, the same thing can be done with temp table:

CODE

SELECT DISTINCT name, CONVERT(int, NULL) as randnum
INTO #tmp
FROM myTable

UPDATE #tmp
SET randnum = 1+10*RAND(CHECKSUM(NEWID()))

UPDATE myTable
SET randnum = #tmp.randnum
FROM myTable
INNER JOIN #tmp on myTable.name=#tmp.name

DROP TABLE #tmp


------------------
Revision history
2005/08/31 - typos, typos, typos
2005/08/30 - initial draft. Added some smileys :P
2006/02/12 - minor addendum about GETDATE()/UDF changes in SQL2005

Back to Microsoft SQL Server: Programming FAQ Index
Back to Microsoft SQL Server: Programming Forum

My Archive

Resources

Close Box

Join Tek-Tips® Today!

Join your peers on the Internet's largest technical computer professional community.
It's easy to join and it's free.

Here's Why Members Love Tek-Tips Forums:

Register now while it's still free!

Already a member? Close this window and log in.

Join Us             Close