Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Chriss Miller on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

SQL Puzzle #... 8?: Median values 4

Status
Not open for further replies.

vongrunt

Programmer
Mar 8, 2004
4,863
HR
Overview
We all probably know wtf is median value, but anyway...

Imagine any ordered set of values, like this one:

1 3 7 8 17 23 42

Median represents "middle" value in set, or in above case 4th element - 8. For even count of elements median becomes mean (average) of two middle values. So if we remove 8, median becomes (7+17)/2 = 12.

Purpose of median value is statistical - it gives "representative" value from set, and usually does it better than average value - assuming data distribution is not too f**ed up. Visit wikipedia for more information.

Objective
The goal of this puzzle is to calculate median per group. Sorta like custom aggregate, though even good SQL2005/CLR implementation is somewhat of tricky. Here is sample gen script:
Code:
create table sampleData ( groupID int, numValue int )
insert into sampleData values ( 1,  1 )
insert into sampleData values ( 1,  2 )
insert into sampleData values ( 1,  6 )
insert into sampleData values ( 1,  16 )
insert into sampleData values ( 1,  7 )
insert into sampleData values ( 2,  3 )
insert into sampleData values ( 2,  9 )
insert into sampleData values ( 2,  23 )
insert into sampleData values ( 2,  31 )
insert into sampleData values ( 3,  10 )
insert into sampleData values ( 3,  17)
insert into sampleData values ( 3,  52 )
insert into sampleData values ( 3,  66 )
insert into sampleData values ( 4,  18 )
To make things simpler, let's assume numValue cannot be NULL. Expected result set is:
Code:
groupID medianValue
-------.-----------
      1         6.0
      2        16.0
      3        34.5
      4        18.0
Rules & restrictions
No OLAP/Analysis services allowed :p

You can do it with SQL2000, Yukon, whatever. Personally I'd still value 2000 code more - simply because it is harder to do.

There are no restrictions about T-SQL features used - as long as code is written set-based.

The most elegant answers (yup, subjective) get some stars...

-----
P.S. to George: what happened with points in polygon thing? Shall we continue testing?

------
Theory: everybody knows everything, nothing works
Practice: everything works, nobody knows why

[banghead]
 
Vongrunt,

I've been very busy lately. I plan on looking at the point in polygon some time next week.

-George

Strong and bitter words indicate a weak cause. - Fortune cookie wisdom
 
> I've been very busy lately. I plan on looking at the point in polygon some time next week.

Me too. I had exactly enough time to sleep, eat and fix buncha bugs made by... #!@!^$$

#8 will be opened 'til next Friday.

------
Theory: everybody knows everything, nothing works
Practice: everything works, nobody knows why

[banghead]
 
Done.

I Created a Procedure in my TekTips database called GetMedianValues_gmmastros [small](So i'll remember where it is next friday when I post the code).[/small]

I have 1 Temp Table, 2 queries with a Union ALL where each query has 1 sub query.

I ran it on a database I have here. There are 10,862 records in this table. I have 20 Groups of data. It took 0.106 seconds to run.

-George

Strong and bitter words indicate a weak cause. - Fortune cookie wisdom
 
i know all these puzzles are not in my skill set range yet i visit these threads just to view the posts of you guys...
and as always von, denis and george ...you guys are awesome...

actually you guys know what...i am currently in las vegas and ofcourse losing money on gambling [mad]...may be you guys could have created a small db and a magic query that fortells the numbers to bet on different games [tongue]...create some puzzles on this concept guys...

-DNG
 
SQLDenis said:
This should get us started
OK, when you are finished do INSERT INTO sampleData SELECT * FROM sampleData - and try again :)

DNG said:
...may be you guys could have created a small db and a magic query that fortells the numbers to bet on different games
SELECT forty-two, as usual... assuming ya don't play blackjack or something [upsidedown].

------
Theory: everybody knows everything, nothing works
Practice: everything works, nobody knows why

[banghead]
 
OK. I did the Insert Into SampleData Select * From SampleData.

I still get the same answers. That's good, right? [wink]


-George

Strong and bitter words indicate a weak cause. - Fortune cookie wisdom
 
> That's why we have DENSE_RANK

Hehe... try it [pipe]

> I still get the same answers. That's good, right?

Yes, median values should remain the same.

------
Theory: everybody knows everything, nothing works
Practice: everything works, nobody knows why

[banghead]
 
I got it with ROW_NUMBER().

Looks like there is no need to wait for Friday... shoot.

------
Theory: everybody knows everything, nothing works
Practice: everything works, nobody knows why

[banghead]
 
Okay here we go

Code:
create  table sampleData ( groupID int, numValue int )
insert into sampleData values ( 1,  1 )
insert into sampleData values ( 1,  2 )
insert into sampleData values ( 1,  6 )
insert into sampleData values ( 1,  16 )
insert into sampleData values ( 1,  7 )
insert into sampleData values ( 2,  3 )
insert into sampleData values ( 2,  9 )
insert into sampleData values ( 2,  23 )
insert into sampleData values ( 2,  31 )
insert into sampleData values ( 3,  10 )
insert into sampleData values ( 3,  17)
insert into sampleData values ( 3,  52 )
insert into sampleData values ( 3,  66 )
insert into sampleData values ( 4,  18 )

--RANK
select avg( convert(money,numvalue)),z.groupid from (
select case when count(groupid)%2 =0 then (count(groupid)/2)else (count(groupid)/2) + 1  end as GroupStart,
case when count(groupid)%2 =0 then (count(groupid)/2)  + sign(count(groupid) +1 %2) else (count(groupid)/2)+1 end as GroupEnd,groupid 
from(
select RANK() OVER(PARTITION BY groupID ORDER BY numValue ) as 'rownumberdense',* 
from (select distinct * from sampleData) zz
) c
group by groupid) z join (select RANK() OVER(PARTITION BY groupID ORDER BY numValue ) as 'rownumberdense',* 
from (select distinct * from sampleData) xx
) p on z.groupid =p.groupid
and p.rownumberdense between groupstart and GroupEnd
group by z.groupid

--DENSE_RANK
select avg( convert(money,numvalue)),z.groupid from (
select case when count(groupid)%2 =0 then (count(groupid)/2)else (count(groupid)/2) + 1  end as GroupStart,
case when count(groupid)%2 =0 then (count(groupid)/2)  + sign(count(groupid) +1 %2) else (count(groupid)/2)+1 end as GroupEnd,groupid 
from(
select DENSE_RANK() OVER(PARTITION BY groupID ORDER BY numValue ) as 'rownumberdense',* 
from (select distinct * from sampleData) zz
) c
group by groupid) z join (select DENSE_RANK() OVER(PARTITION BY groupID ORDER BY numValue ) as 'rownumberdense',* 
from (select distinct * from sampleData) xx
) p on z.groupid =p.groupid
and p.rownumberdense between groupstart and GroupEnd
group by z.groupid


--Let's double the data
insert into sampleData
select * from sampleData

--RANK
select avg( convert(money,numvalue)),z.groupid from (
select case when count(groupid)%2 =0 then (count(groupid)/2)else (count(groupid)/2) + 1  end as GroupStart,
case when count(groupid)%2 =0 then (count(groupid)/2)  + sign(count(groupid) +1 %2) else (count(groupid)/2)+1 end as GroupEnd,groupid 
from(
select RANK() OVER(PARTITION BY groupID ORDER BY numValue ) as 'rownumberdense',* 
from (select distinct * from sampleData) zz
) c
group by groupid) z join (select RANK() OVER(PARTITION BY groupID ORDER BY numValue ) as 'rownumberdense',* 
from (select distinct * from sampleData) xx
) p on z.groupid =p.groupid
and p.rownumberdense between groupstart and GroupEnd
group by z.groupid

--DENSE_RANK
select avg( convert(money,numvalue)),z.groupid from (
select case when count(groupid)%2 =0 then (count(groupid)/2)else (count(groupid)/2) + 1  end as GroupStart,
case when count(groupid)%2 =0 then (count(groupid)/2)  + sign(count(groupid) +1 %2) else (count(groupid)/2)+1 end as GroupEnd,groupid 
from(
select DENSE_RANK() OVER(PARTITION BY groupID ORDER BY numValue ) as 'rownumberdense',* 
from (select distinct * from sampleData) zz
) c
group by groupid) z join (select DENSE_RANK() OVER(PARTITION BY groupID ORDER BY numValue ) as 'rownumberdense',* 
from (select distinct * from sampleData) xx
) p on z.groupid =p.groupid
and p.rownumberdense between groupstart and GroupEnd
group by z.groupid

Denis The SQL Menace
SQL blog:
Personal Blog:
 
I have no SQL2005 handy at the moment... are you sure RANK()/DENSE_RANK() work in all cases?

------
Theory: everybody knows everything, nothing works
Practice: everything works, nobody knows why

[banghead]
 
yep, that's why I gave them both in the code
but since I am doing distinct it doesn't matter ( I know I cheated a little, ROWNUMBER would have worked also)
In am slo counting by partition meaning it start from 1 for every groupid

select DENSE_RANK() OVER(PARTITION BY groupID ORDER BY numValue ) as 'rownumberdense',*
from (select distinct * from sampleData) xx


Denis The SQL Menace
SQL blog:
Personal Blog:
 
Denis,

I am not very familiar with SQL 2005 syntax. I see in your query where you are using... select distinct * from sampleData. Is this right?

Can you add this data and re-run it?

insert into sampleData values ( 5, 10 )
insert into sampleData values ( 5, 10 )
insert into sampleData values ( 5, 10 )
insert into sampleData values ( 5, 10 )
insert into sampleData values ( 5, 20 )

Clearly, the median for Group 5 should be 10. What does your query display?

-George

Strong and bitter words indicate a weak cause. - Fortune cookie wisdom
 
OK, with distinct values both RANK and DENSE_RANK do the same.

Suppose one group contains the following numValues

3 3 3 5

What would return your code? Correct value is 3.

Sorry again, here @work I don't have 2005 yet - otherwise I would check code myself.

------
Theory: everybody knows everything, nothing works
Practice: everything works, nobody knows why

[banghead]
 
Here is my code...

Code:
Declare @Temp Table(RowId Integer Identity(1,1), GroupId Integer, NumValue Integer)

Insert Into @Temp(GroupId, NumValue)
Select 	GroupId, NumValue
From 	SampleData
Order By GroupId, NumValue

Select	A.GroupId,
		Convert(Decimal(10,1), T.NumValue) As MedianValue
From	(
		Select 	GroupId,
				(Max(RowId) + Min(RowId)) / 2 As RowId
		From 	@Temp
		Group By GroupId
		Having (Max(RowId) - Min(RowId)) % 2 = 0
		) A
		Inner Join @Temp T 
			On 	A.GroupId = T.GroupId
			And A.RowId = T.RowId

Union All 

Select	A.GroupId,
		Convert(Decimal(10,1), (Convert(Decimal(10,5), A.NumValue) + Convert(Decimal(10,5), B.NumValue)) / 2.0) As MedianValue
From	(
		Select 	GroupId,
				Convert(Int, 0.5 + (Convert(Decimal(10,1), Max(RowId)) + Convert(Decimal(10,1), Min(RowId))) / 2.0) As FirstRowId,
				Convert(Int, -0.5 + (Convert(Decimal(10,1), Max(RowId)) + Convert(Decimal(10,1), Min(RowId))) / 2.0) As SecondRowId
		From 	@Temp
		Group By GroupId
		Having (Max(RowId) - Min(RowId)) % 2 = 1
		) T
		Inner Join @Temp A
			On 	T.GroupId = A.GroupId
			And	T.FirstRowId = A.RowId
		Inner Join @Temp B
			On	T.GroupId = B.GroupId
			And T.SecondRowId = B.RowId

Order By A.GroupId

-George

Strong and bitter words indicate a weak cause. - Fortune cookie wisdom
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top