INTELLIGENT WORK FORUMS
FOR COMPUTER PROFESSIONALS

Are you a
Computer / IT professional?
Join Tek-Tips Forums!
• Talk With Other Members
• Be Notified Of Responses
• Keyword Search
Favorite Forums
• Automated Signatures
• Best Of All, It's Free!

*Tek-Tips's functionality depends on members receiving e-mail. By joining you are opting in to receive e-mail.

#### Posting Guidelines

Promoting, selling, recruiting, coursework and thesis posting is forbidden.

# Microsoft SQL Server: Programming FAQ

## T-SQL Hints and Tips

Calculating a Median without use of cursors and dynamic SQL by vongrunt
Posted: 21 Feb 06

Yup. Same subject, 3rd FAQ :P

Tricks and methods found in this FAQ are recycled from thread183-1193016. First we'll explain yet another way how to calculate single median value, then how to calculate median per group without cursors and dynamic SQL - and finally how to exploit SQL2005 ranking functions.

Sample data

#### CODE

CREATE TABLE sampleData ( groupID int, numValue int )
INSERT INTO sampleData VALUES ( 1,   1 )
INSERT INTO sampleData VALUES ( 1,   2 )
INSERT INTO sampleData VALUES ( 1,   6 )
INSERT INTO sampleData VALUES ( 1,  16 )
INSERT INTO sampleData VALUES ( 1,   7 )
INSERT INTO sampleData VALUES ( 2,   5 )
INSERT INTO sampleData VALUES ( 2,   5 )
INSERT INTO sampleData VALUES ( 2,   5 )
INSERT INTO sampleData VALUES ( 2,  11 )
INSERT INTO sampleData VALUES ( 3,  10 )
INSERT INTO sampleData VALUES ( 3,  17 )
INSERT INTO sampleData VALUES ( 3,  52 )
INSERT INTO sampleData VALUES ( 3,  66 )
INSERT INTO sampleData VALUES ( 4,  18 )
INSERT INTO sampleData VALUES ( 5,   0 )
INSERT INTO sampleData VALUES ( 5,   0 )
Five groups of rows = five median values to calculate at once. For purpose of this testing we assumed numValue cannot be NULL.

Single median value

Let's take 1st group (groupID = 1). To calculate median value we need last value from first half and first value from second half, while taking care about odd count of elements. Looks like TOP 50 PERCENT works exactly as desired:

#### CODE

DECLARE @groupID int; SET @groupID = 1
DECLARE @M1 int, @M2 int

SELECT TOP 50 PERCENT @M1 = numValue FROM sampleData WHERE groupID = @groupID ORDER BY numValue ASC
SELECT TOP 50 PERCENT @M2 = numValue FROM sampleData WHERE groupID = @groupID ORDER BY numValue DESC

SELECT (@M1+@M2)/2.0
Credits go to donutman who posted that trick on TT long time ago.

Median values per group (SQL Server 2000 and above)

Simple: wrap above code into user-defined function:

#### CODE

CREATE FUNCTION fn_getMedian( @groupID int )
RETURNS decimal(5, 3)
AS
BEGIN
DECLARE @m1 int, @m2 int, @ret decimal(5, 3)
SELECT TOP 50 PERCENT @m1= numValue FROM sampleData WHERE groupID=@groupID ORDER BY numValue ASC
SELECT TOP 50 PERCENT @m2= numValue FROM sampleData WHERE groupID=@groupID ORDER BY numValue DESC

SET @ret = (@m1+@m2)/2.0

RETURN @ret
END
GO
... and then use it as custom aggregate:

#### CODE

SELECT groupID, dbo.fn_getMedian( groupID)
FROM sampleData
GROUP BY groupID
ORDER BY groupID
Although this code looks fairly elegant, scanning rows ASC/DESC combined with sequential nature of UDFs (called for each new groupID) may cause excessive amount of I/O reads (aka: query will run a bit slower on larger data set).

Another way is to select all rows into identity table and then use identity "row numbers" to get "middle" rows:

#### CODE

DECLARE @temp TABLE( rownum int identity(1, 1), GroupID int, numValue int )

INSERT INTO @temp( groupID, numValue )
SELECT groupID, numValue
FROM SampleData
ORDER BY groupID, numValue

SELECT A.groupID, AVG(1.0*A.NumValue) AS medianValue
FROM @temp A
INNER JOIN
(    SELECT groupID, FLOOR(AVG(1.0*rownum)) AS m1, CEILING(AVG(1.0*rownum)) AS m2
FROM @temp
GROUP BY groupID
) B
ON A.groupID = B.groupID AND A.rownum BETWEEN B.m1 AND B.m2
GROUP BY A.groupID
Alternatively you can use temp #table instead of table @variable. Weird as it may sound, this works faster on larger sets of data - at least in lab conditions where tempdb is not tortured by concurrent users/processes.

Median values per group (SQL Server 2005 only)

Yukon provides several new ways to calculate median. Arguably the most obvious one is with ROW_NUMBER() function:

#### CODE

SELECT R.groupID, AVG(1.0*R.numValue) AS medianValue
FROM
(    SELECT GroupID, numValue, ROW_NUMBER() OVER(PARTITION BY groupID ORDER BY NumValue) AS rowno
FROM sampleData
) R
INNER JOIN
(    SELECT GroupID, 1+count(*) as N
FROM sampleData
GROUP BY GroupID
) G
ON R.GroupID = G.GroupID AND R.rowNo BETWEEN N/2 AND N/2+N%2
GROUP BY R.groupID
Note that RANK() and DENSE_RANK() cannot be used for this purpose due to possible duplicate numValues in group. But NTILE() can be used, though code is not so elegant:

#### CODE

SET ANSI_WARNINGS OFF
SELECT groupID, CASE WHEN cnt%2=1 THEN t1max ELSE (t1max+t2min)/2.0 END AS medianValue
FROM
(    SELECT groupID, COUNT(numValue) AS cnt,
MAX(CASE WHEN tile=1 THEN numValue END) AS t1max,
MIN(CASE WHEN tile=2 THEN numValue END) AS t2min
FROM
(    SELECT groupID, numValue, NTILE(2) OVER( PARTITION BY groupID ORDER BY numValue ) AS tile
FROM sampleData
) T
GROUP BY T.groupID
) G
ORDER BY groupID

FYI method with ROW_NUMBER() is by far and large the fastest of all mentioned.

Revision history:
------------------
2006/02/21 - initial draft

Back to Microsoft SQL Server: Programming FAQ Index
Back to Microsoft SQL Server: Programming Forum

Close Box

# Join Tek-Tips® Today!

Join your peers on the Internet's largest technical computer professional community.
It's easy to join and it's free.

Here's Why Members Love Tek-Tips Forums:

• Talk To Other Members
• Notification Of Responses To Questions
• Favorite Forums One Click Access
• Keyword Search Of All Posts, And More...

Register now while it's still free!