SQL Puzzle #... 8?: Median values 4

vongrunt · Feb 17, 2006

Overview
We all probably know wtf is median value, but anyway...

Imagine any ordered set of values, like this one:

1 3 7 8 17 23 42

Median represents "middle" value in set, or in above case 4th element - 8. For even count of elements median becomes mean (average) of two middle values. So if we remove 8, median becomes (7+17)/2 = 12.

Purpose of median value is statistical - it gives "representative" value from set, and usually does it better than average value - assuming data distribution is not too f**ed up. Visit wikipedia for more information.

Objective
The goal of this puzzle is to calculate median per group. Sorta like custom aggregate, though even good SQL2005/CLR implementation is somewhat of tricky. Here is sample gen script:

Code:

create table sampleData ( groupID int, numValue int )
insert into sampleData values ( 1,  1 )
insert into sampleData values ( 1,  2 )
insert into sampleData values ( 1,  6 )
insert into sampleData values ( 1,  16 )
insert into sampleData values ( 1,  7 )
insert into sampleData values ( 2,  3 )
insert into sampleData values ( 2,  9 )
insert into sampleData values ( 2,  23 )
insert into sampleData values ( 2,  31 )
insert into sampleData values ( 3,  10 )
insert into sampleData values ( 3,  17)
insert into sampleData values ( 3,  52 )
insert into sampleData values ( 3,  66 )
insert into sampleData values ( 4,  18 )

To make things simpler, let's assume numValue cannot be NULL. Expected result set is:

Code:

groupID medianValue
-------.-----------
      1         6.0
      2        16.0
      3        34.5
      4        18.0

Rules & restrictions
No OLAP/Analysis services allowed

You can do it with SQL2000, Yukon, whatever. Personally I'd still value 2000 code more - simply because it is harder to do.

There are no restrictions about T-SQL features used - as long as code is written set-based.

The most elegant answers (yup, subjective) get some stars...

-----
P.S. to George: what happened with points in polygon thing? Shall we continue testing?

------
Theory: everybody knows everything, nothing works
Practice: everything works, nobody knows why
[banghead]

gmmastros · Feb 17, 2006

Vongrunt,

I've been very busy lately. I plan on looking at the point in polygon some time next week.

-George

Strong and bitter words indicate a weak cause. - Fortune cookie wisdom

vongrunt · Feb 17, 2006

> I've been very busy lately. I plan on looking at the point in polygon some time next week.

Me too. I had exactly enough time to sleep, eat and fix buncha bugs made by... #!@!^$$

#8 will be opened 'til next Friday.

------
Theory: everybody knows everything, nothing works
Practice: everything works, nobody knows why
[banghead]

gmmastros · Feb 17, 2006

Done.

I Created a Procedure in my TekTips database called GetMedianValues_gmmastros [small](So i'll remember where it is next friday when I post the code).[/small]

I have 1 Temp Table, 2 queries with a Union ALL where each query has 1 sub query.

I ran it on a database I have here. There are 10,862 records in this table. I have 20 Groups of data. It took 0.106 seconds to run.

-George

Strong and bitter words indicate a weak cause. - Fortune cookie wisdom

SQLDenis · Feb 17, 2006

try using RANK() in 2005 without temptable (identity value) ;-)

Denis The SQL Menace
SQL blog:

http://sqlservercode.blogspot.com/

Personal Blog:

http://otherthingsnow.blogspot.com/

SQLDenis · Feb 17, 2006

This should get us started

SELECT RANK() OVER(PARTITION BY groupID ORDER BY numValue ) AS 'rownumberdense',*
FROM sampleData
ORDER BY groupID,rownumberdense

Denis The SQL Menace
SQL blog:

http://sqlservercode.blogspot.com/

Personal Blog:

http://otherthingsnow.blogspot.com/

DotNetGnat · Feb 17, 2006

i know all these puzzles are not in my skill set range yet i visit these threads just to view the posts of you guys...
and as always von, denis and george ...you guys are awesome...

actually you guys know what...i am currently in las vegas and ofcourse losing money on gambling [mad]

...may be you guys could have created a small db and a magic query that fortells the numbers to bet on different games [tongue]

...create some puzzles on this concept guys...

-DNG

vongrunt · Feb 17, 2006

SQLDenis said:
This should get us started

OK, when you are finished do INSERT INTO sampleData SELECT * FROM sampleData - and try again

DNG said:
...may be you guys could have created a small db and a magic query that fortells the numbers to bet on different games

SELECT forty-two, as usual... assuming ya don't play blackjack or something [upsidedown]

.

------
Theory: everybody knows everything, nothing works
Practice: everything works, nobody knows why
[banghead]

SQLDenis · Feb 17, 2006

vongrunt said:
OK, when you are finished do INSERT INTO sampleData SELECT * FROM sampleData - and try again

That's why we have DENSE_RANK

Denis The SQL Menace
SQL blog:

http://sqlservercode.blogspot.com/

Personal Blog:

http://otherthingsnow.blogspot.com/

gmmastros · Feb 17, 2006

OK. I did the Insert Into SampleData Select * From SampleData.

I still get the same answers. That's good, right? [wink]

-George

Strong and bitter words indicate a weak cause. - Fortune cookie wisdom

vongrunt · Feb 17, 2006

> That's why we have DENSE_RANK

Hehe... try it [pipe]

> I still get the same answers. That's good, right?

Yes, median values should remain the same.

------
Theory: everybody knows everything, nothing works
Practice: everything works, nobody knows why
[banghead]

SQLDenis · Feb 17, 2006

I got it to work with RANK and DENSE RANK, no temp tables, no unions
1 select (subqueries of course)

Denis The SQL Menace
SQL blog:

http://sqlservercode.blogspot.com/

Personal Blog:

http://otherthingsnow.blogspot.com/

vongrunt · Feb 17, 2006

I got it with ROW_NUMBER().

Looks like there is no need to wait for Friday... shoot.

------
Theory: everybody knows everything, nothing works
Practice: everything works, nobody knows why
[banghead]

SQLDenis · Feb 17, 2006

Okay here we go

Code:

create  table sampleData ( groupID int, numValue int )
insert into sampleData values ( 1,  1 )
insert into sampleData values ( 1,  2 )
insert into sampleData values ( 1,  6 )
insert into sampleData values ( 1,  16 )
insert into sampleData values ( 1,  7 )
insert into sampleData values ( 2,  3 )
insert into sampleData values ( 2,  9 )
insert into sampleData values ( 2,  23 )
insert into sampleData values ( 2,  31 )
insert into sampleData values ( 3,  10 )
insert into sampleData values ( 3,  17)
insert into sampleData values ( 3,  52 )
insert into sampleData values ( 3,  66 )
insert into sampleData values ( 4,  18 )

--RANK
select avg( convert(money,numvalue)),z.groupid from (
select case when count(groupid)%2 =0 then (count(groupid)/2)else (count(groupid)/2) + 1  end as GroupStart,
case when count(groupid)%2 =0 then (count(groupid)/2)  + sign(count(groupid) +1 %2) else (count(groupid)/2)+1 end as GroupEnd,groupid 
from(
select RANK() OVER(PARTITION BY groupID ORDER BY numValue ) as 'rownumberdense',* 
from (select distinct * from sampleData) zz
) c
group by groupid) z join (select RANK() OVER(PARTITION BY groupID ORDER BY numValue ) as 'rownumberdense',* 
from (select distinct * from sampleData) xx
) p on z.groupid =p.groupid
and p.rownumberdense between groupstart and GroupEnd
group by z.groupid

--DENSE_RANK
select avg( convert(money,numvalue)),z.groupid from (
select case when count(groupid)%2 =0 then (count(groupid)/2)else (count(groupid)/2) + 1  end as GroupStart,
case when count(groupid)%2 =0 then (count(groupid)/2)  + sign(count(groupid) +1 %2) else (count(groupid)/2)+1 end as GroupEnd,groupid 
from(
select DENSE_RANK() OVER(PARTITION BY groupID ORDER BY numValue ) as 'rownumberdense',* 
from (select distinct * from sampleData) zz
) c
group by groupid) z join (select DENSE_RANK() OVER(PARTITION BY groupID ORDER BY numValue ) as 'rownumberdense',* 
from (select distinct * from sampleData) xx
) p on z.groupid =p.groupid
and p.rownumberdense between groupstart and GroupEnd
group by z.groupid


--Let's double the data
insert into sampleData
select * from sampleData

--RANK
select avg( convert(money,numvalue)),z.groupid from (
select case when count(groupid)%2 =0 then (count(groupid)/2)else (count(groupid)/2) + 1  end as GroupStart,
case when count(groupid)%2 =0 then (count(groupid)/2)  + sign(count(groupid) +1 %2) else (count(groupid)/2)+1 end as GroupEnd,groupid 
from(
select RANK() OVER(PARTITION BY groupID ORDER BY numValue ) as 'rownumberdense',* 
from (select distinct * from sampleData) zz
) c
group by groupid) z join (select RANK() OVER(PARTITION BY groupID ORDER BY numValue ) as 'rownumberdense',* 
from (select distinct * from sampleData) xx
) p on z.groupid =p.groupid
and p.rownumberdense between groupstart and GroupEnd
group by z.groupid

--DENSE_RANK
select avg( convert(money,numvalue)),z.groupid from (
select case when count(groupid)%2 =0 then (count(groupid)/2)else (count(groupid)/2) + 1  end as GroupStart,
case when count(groupid)%2 =0 then (count(groupid)/2)  + sign(count(groupid) +1 %2) else (count(groupid)/2)+1 end as GroupEnd,groupid 
from(
select DENSE_RANK() OVER(PARTITION BY groupID ORDER BY numValue ) as 'rownumberdense',* 
from (select distinct * from sampleData) zz
) c
group by groupid) z join (select DENSE_RANK() OVER(PARTITION BY groupID ORDER BY numValue ) as 'rownumberdense',* 
from (select distinct * from sampleData) xx
) p on z.groupid =p.groupid
and p.rownumberdense between groupstart and GroupEnd
group by z.groupid

Denis The SQL Menace
SQL blog:

http://sqlservercode.blogspot.com/

Personal Blog:

http://otherthingsnow.blogspot.com/

vongrunt · Feb 17, 2006

I have no SQL2005 handy at the moment... are you sure RANK()/DENSE_RANK() work in all cases?

------
Theory: everybody knows everything, nothing works
Practice: everything works, nobody knows why
[banghead]

SQLDenis · Feb 17, 2006

yep, that's why I gave them both in the code
but since I am doing distinct it doesn't matter ( I know I cheated a little, ROWNUMBER would have worked also)
In am slo counting by partition meaning it start from 1 for every groupid

select DENSE_RANK() OVER(PARTITION BY groupID ORDER BY numValue ) as 'rownumberdense',*
from (select distinct * from sampleData) xx

Denis The SQL Menace
SQL blog:

http://sqlservercode.blogspot.com/

Personal Blog:

http://otherthingsnow.blogspot.com/

gmmastros · Feb 17, 2006

Denis,

I am not very familiar with SQL 2005 syntax. I see in your query where you are using... select distinct * from sampleData. Is this right?

Can you add this data and re-run it?

insert into sampleData values ( 5, 10 )
insert into sampleData values ( 5, 10 )
insert into sampleData values ( 5, 10 )
insert into sampleData values ( 5, 10 )
insert into sampleData values ( 5, 20 )

Clearly, the median for Group 5 should be 10. What does your query display?

-George

Strong and bitter words indicate a weak cause. - Fortune cookie wisdom

vongrunt · Feb 17, 2006

OK, with distinct values both RANK and DENSE_RANK do the same.

Suppose one group contains the following numValues

3 3 3 5

What would return your code? Correct value is 3.

Sorry again, here @work I don't have 2005 yet - otherwise I would check code myself.

------
Theory: everybody knows everything, nothing works
Practice: everything works, nobody knows why
[banghead]

SQLDenis · Feb 17, 2006

Damn.......
Ok i will have to fix that ;-(

Denis The SQL Menace
SQL blog:

http://sqlservercode.blogspot.com/

Personal Blog:

http://otherthingsnow.blogspot.com/

gmmastros · Feb 17, 2006

Here is my code...

Code:

Declare @Temp Table(RowId Integer Identity(1,1), GroupId Integer, NumValue Integer)

Insert Into @Temp(GroupId, NumValue)
Select 	GroupId, NumValue
From 	SampleData
Order By GroupId, NumValue

Select	A.GroupId,
		Convert(Decimal(10,1), T.NumValue) As MedianValue
From	(
		Select 	GroupId,
				(Max(RowId) + Min(RowId)) / 2 As RowId
		From 	@Temp
		Group By GroupId
		Having (Max(RowId) - Min(RowId)) % 2 = 0
		) A
		Inner Join @Temp T 
			On 	A.GroupId = T.GroupId
			And A.RowId = T.RowId

Union All 

Select	A.GroupId,
		Convert(Decimal(10,1), (Convert(Decimal(10,5), A.NumValue) + Convert(Decimal(10,5), B.NumValue)) / 2.0) As MedianValue
From	(
		Select 	GroupId,
				Convert(Int, 0.5 + (Convert(Decimal(10,1), Max(RowId)) + Convert(Decimal(10,1), Min(RowId))) / 2.0) As FirstRowId,
				Convert(Int, -0.5 + (Convert(Decimal(10,1), Max(RowId)) + Convert(Decimal(10,1), Min(RowId))) / 2.0) As SecondRowId
		From 	@Temp
		Group By GroupId
		Having (Max(RowId) - Min(RowId)) % 2 = 1
		) T
		Inner Join @Temp A
			On 	T.GroupId = A.GroupId
			And	T.FirstRowId = A.RowId
		Inner Join @Temp B
			On	T.GroupId = B.GroupId
			And T.SecondRowId = B.RowId

Order By A.GroupId

-George

Strong and bitter words indicate a weak cause. - Fortune cookie wisdom

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

SQL Puzzle #... 8?: Median values 4

Programmer

Programmer

Programmer

Programmer

Programmer

Programmer

Programmer

Programmer

Programmer

Programmer

Programmer

Programmer

Programmer

Programmer

Programmer

Programmer

Programmer

Programmer

Programmer

Programmer

Similar threads

Log in

Part and Inventory Search

Sponsor