INTELLIGENT WORK FORUMS
FOR COMPUTER PROFESSIONALS

Log In

Come Join Us!

Are you a
Computer / IT professional?
Join Tek-Tips Forums!
  • Talk With Other Members
  • Be Notified Of Responses
    To Your Posts
  • Keyword Search
  • One-Click Access To Your
    Favorite Forums
  • Automated Signatures
    On Your Posts
  • Best Of All, It's Free!

*Tek-Tips's functionality depends on members receiving e-mail. By joining you are opting in to receive e-mail.

Posting Guidelines

Promoting, selling, recruiting, coursework and thesis posting is forbidden.

Jobs

T-SQL Hints and Tips

Calculating a Median without use of cursors and dynamic SQL by vongrunt
Posted: 21 Feb 06

Yup. Same subject, 3rd FAQ :P

Tricks and methods found in this FAQ are recycled from thread183-1193016. First we'll explain yet another way how to calculate single median value, then how to calculate median per group without cursors and dynamic SQL - and finally how to exploit SQL2005 ranking functions.


Sample data

CODE

CREATE TABLE sampleData ( groupID int, numValue int )
INSERT INTO sampleData VALUES ( 1,   1 )
INSERT INTO sampleData VALUES ( 1,   2 )
INSERT INTO sampleData VALUES ( 1,   6 )
INSERT INTO sampleData VALUES ( 1,  16 )
INSERT INTO sampleData VALUES ( 1,   7 )
INSERT INTO sampleData VALUES ( 2,   5 )
INSERT INTO sampleData VALUES ( 2,   5 )
INSERT INTO sampleData VALUES ( 2,   5 )
INSERT INTO sampleData VALUES ( 2,  11 )
INSERT INTO sampleData VALUES ( 3,  10 )
INSERT INTO sampleData VALUES ( 3,  17 )
INSERT INTO sampleData VALUES ( 3,  52 )
INSERT INTO sampleData VALUES ( 3,  66 )
INSERT INTO sampleData VALUES ( 4,  18 )
INSERT INTO sampleData VALUES ( 5,   0 )
INSERT INTO sampleData VALUES ( 5,   0 )
Five groups of rows = five median values to calculate at once. For purpose of this testing we assumed numValue cannot be NULL.


Single median value

Let's take 1st group (groupID = 1). To calculate median value we need last value from first half and first value from second half, while taking care about odd count of elements. Looks like TOP 50 PERCENT works exactly as desired:

CODE

DECLARE @groupID int; SET @groupID = 1
DECLARE @M1 int, @M2 int

SELECT TOP 50 PERCENT @M1 = numValue FROM sampleData WHERE groupID = @groupID ORDER BY numValue ASC
SELECT TOP 50 PERCENT @M2 = numValue FROM sampleData WHERE groupID = @groupID ORDER BY numValue DESC

SELECT (@M1+@M2)/2.0
Credits go to donutman who posted that trick on TT long time ago.


Median values per group (SQL Server 2000 and above)

Simple: wrap above code into user-defined function:

CODE

CREATE FUNCTION fn_getMedian( @groupID int )
RETURNS decimal(5, 3)
AS
BEGIN
    DECLARE @m1 int, @m2 int, @ret decimal(5, 3)
    SELECT TOP 50 PERCENT @m1= numValue FROM sampleData WHERE groupID=@groupID ORDER BY numValue ASC
    SELECT TOP 50 PERCENT @m2= numValue FROM sampleData WHERE groupID=@groupID ORDER BY numValue DESC

    SET @ret = (@m1+@m2)/2.0

    RETURN @ret
END
GO
... and then use it as custom aggregate:

CODE

SELECT groupID, dbo.fn_getMedian( groupID)
FROM sampleData
GROUP BY groupID
ORDER BY groupID
Although this code looks fairly elegant, scanning rows ASC/DESC combined with sequential nature of UDFs (called for each new groupID) may cause excessive amount of I/O reads (aka: query will run a bit slower on larger data set).

Another way is to select all rows into identity table and then use identity "row numbers" to get "middle" rows:

CODE

DECLARE @temp TABLE( rownum int identity(1, 1), GroupID int, numValue int )

INSERT INTO @temp( groupID, numValue )
SELECT groupID, numValue
FROM SampleData
ORDER BY groupID, numValue

SELECT A.groupID, AVG(1.0*A.NumValue) AS medianValue
FROM @temp A
INNER JOIN
(    SELECT groupID, FLOOR(AVG(1.0*rownum)) AS m1, CEILING(AVG(1.0*rownum)) AS m2
    FROM @temp
    GROUP BY groupID
) B
ON A.groupID = B.groupID AND A.rownum BETWEEN B.m1 AND B.m2
GROUP BY A.groupID
Alternatively you can use temp #table instead of table @variable. Weird as it may sound, this works faster on larger sets of data - at least in lab conditions where tempdb is not tortured by concurrent users/processes.


Median values per group (SQL Server 2005 only)

Yukon provides several new ways to calculate median. Arguably the most obvious one is with ROW_NUMBER() function:

CODE

SELECT R.groupID, AVG(1.0*R.numValue) AS medianValue
FROM
(    SELECT GroupID, numValue, ROW_NUMBER() OVER(PARTITION BY groupID ORDER BY NumValue) AS rowno
    FROM sampleData
) R
INNER JOIN
(    SELECT GroupID, 1+count(*) as N
    FROM sampleData
    GROUP BY GroupID
) G
ON R.GroupID = G.GroupID AND R.rowNo BETWEEN N/2 AND N/2+N%2
GROUP BY R.groupID
Note that RANK() and DENSE_RANK() cannot be used for this purpose due to possible duplicate numValues in group. But NTILE() can be used, though code is not so elegant:

CODE

SET ANSI_WARNINGS OFF
SELECT groupID, CASE WHEN cnt%2=1 THEN t1max ELSE (t1max+t2min)/2.0 END AS medianValue
FROM
(    SELECT groupID, COUNT(numValue) AS cnt,
        MAX(CASE WHEN tile=1 THEN numValue END) AS t1max,
        MIN(CASE WHEN tile=2 THEN numValue END) AS t2min
    FROM
    (    SELECT groupID, numValue, NTILE(2) OVER( PARTITION BY groupID ORDER BY numValue ) AS tile
        FROM sampleData
    ) T
    GROUP BY T.groupID
) G
ORDER BY groupID

FYI method with ROW_NUMBER() is by far and large the fastest of all mentioned.



Revision history:
------------------
2006/02/21 - initial draft

Back to Microsoft SQL Server: Programming FAQ Index
Back to Microsoft SQL Server: Programming Forum

My Archive

Resources

Close Box

Join Tek-Tips® Today!

Join your peers on the Internet's largest technical computer professional community.
It's easy to join and it's free.

Here's Why Members Love Tek-Tips Forums:

Register now while it's still free!

Already a member? Close this window and log in.

Join Us             Close