Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Chriss Miller on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

simple query help 4

Status
Not open for further replies.

DotNetGnat

Programmer
Mar 10, 2005
5,548
IN
Guys,

I have the following table:

ID | dist
_________
1 | 100.12
1 | 100.24
1 | 100.35
2 | 300.12
2 | 100.14
2 | 300.35
3 | 10.12
3 | 10.24

I want to find out the record within the particular ID whose dist(distance) is greater than 5

for the above data:

2 | 100.35 is way off..

thanks

-DNG
 
Just to get the ball rolling....

Code:
Declare @Temp Table(Id Int, Dist decimal(10,2))

Insert Into @Temp Values(1  , 100.12)
Insert Into @Temp Values(1  , 100.24)
Insert Into @Temp Values(1  , 100.35)
Insert Into @Temp Values(2  , 300.12)
Insert Into @Temp Values(2  , 100.14)
Insert Into @Temp Values(2  , 300.35)
Insert Into @Temp Values(3  , 10.12)
Insert Into @Temp Values(3  , 10.24)

Select 	Id
From 	@Temp
Group By Id
Having	Max(Dist) - Min(Dist) > 5

The above query will get you the ID. However... how do you determine the particular record to return (especially if there is only 2 records to choose from)?

-George

"The great things about standards is that there are so many to choose from." - Fortune Cookie Wisdom
 
Code:
SELECT ID
FROM Table
GROUP BY ID
HAVING Max(dist) - Min(dist) > 5
Code:
SELECT T.*
FROM
   Table T
   INNER JOIN (
      SELECT ID, Avgdist = Average(dist)
      FROM Table
      GROUP BY ID
      HAVING
         Max(dist) - Avg(dist) > 5
         OR Avg(dist) - Min(dist) > 5
   ) X ON T.ID = X.ID
WHERE
   Abs(T.dist - Avgdist) > 5
There's some interpretation here of what you meant by greater than 5. Greater than 5 away from the average of the other values? Greater than 5 away from any value? And identifying the ID that has the problem is easier than identifying the outlier exactly, especially for the "away from ANY value" calculation.

[COLOR=black #e0e0e0]For SQL and technical ideas, visit my blog, Squared Thoughts.

The best part about anything that has cheese is the cheese.[/color]
 
Thanks George, but i guess I did not explain it right...

I want to do a check on each record'distance with this preceding lesser distance to see if there are no weird jumps in the data with an ID.

For example:

1 | 10.1
1 | 10.2
1 | 10.3

has no problem

but if we look at the following

2 | 10.3
2 | 10.4
2 | 20.1
2 | 10.5

there is a problem with the third record because the difference between the second record's distance and third record's distance is greater than 5.

may be its a data entry problem...but because of that...even the fourth record with show up as a problem because we will be comparing with the third record's distance...

i am just trying to get a preliminary list of wrong data entries which will then be later checked manually...

-DNG
 
ESquared,

I am just looking for "Away from preceding value".

Thanks for your reply...

I was trying with a self-join, something like this:

Select a.dist from mytable a, mytable b
where a.id = b.id and A.dist < b.dist
and b.dist-a.dist > 5

-DNG
 
Can you use standard deviation to figure out the variance?

Code:
SELECT ID, STDEV(Value)
FROM
(SELECT 1  ID , 100.12 Value
UNION
SELECT 1  , 100.24
UNION
SELECT 1  , 100.35
UNION
SELECT 2  , 300.12
UNION
SELECT 2  , 100.14
UNION
SELECT 2  , 300.35
UNION
SELECT 3  , 10.12
UNION
SELECT 3  , 10.24
UNION
SELECT 4 , 10.3
UNION
SELECT 4 , 10.4
UNION
SELECT 4 , 20.1
UNION
SELECT 4 , 10.5
) TBL
GROUP BY ID
Having STDEV(Value) > 4

Sunil
 
I'm sure George and ESquared will chime in on this, but your simple query is quickly becoming not so simple. I am not a SQL expect by any means, but looking at the sample data, I don't see how you are going to get the results you are looking for.

Records in a table do not necessarily have any logical order...so unless you have a date or timestamp to order the records, how do you know which one is which in terms of order?

If you ignore the order issue and try to work with MIN, MAX, and AVG, how can you create an average to compare a value against when the value is included in the set? This errant value(s) will greatly skew the average.

Hopefully someone can help you find a solution...it's above my SQL knowledge but I wanted to share my observations. Good luck.

=======================================
People think it must be fun to be a super genius, but they don't realize how hard it is to put up with all the idiots in the world. (Calvin from Calvin And Hobbs)

Robert L. Johnson III
CCNA, CCDA, MCSA, CNA, Net+, A+, CHDP
VB.NET Programmer
 
All the data in my table is ordered...I just want to check the distance value of a particular record with in an ID with the previous record to see that there is no big jump...

May be I should use cursors for this...but I am just trying to avoid that path...

thanks

-DNG
 
Tables by definition are unordered. Unless you have a sortable column in there like a collection date or an identity value which reflects data insertion order, we cannot help you.

There is no next or previous in a table!

Even a clustered primary index cannot guarantee that the rows come out in that order. One feature of SQL Server is that if two requests come in at once, sometimes the engine can piggyback the data stream of one query into another. So if query 1 is reading sequentially from a table (as expected) and it gets to data that query 2 will need (but might not have queried yet) then query 2 may get injected into its stream data from query 1, then finally go get the parts that the piggybacking didn't cover. This means you can never trust row order unless you have an ORDER BY clause. Period.
 
Thanks guys, I really appreciate your assistance.Just one question though for ESquared. Why cannot I use Cursors in this particular case to get look at two rows at a time?

But any ways,
I have altered my table to look like below...

ID | OrderNum |dist
___________________
1 | 1 |100.12
1 | 2 |100.24
1 | 3 |100.35
2 | 1 |300.12
2 | 2 |100.14
2 | 3 |300.35
3 | 1 |10.12
3 | 2 |10.24

So I guess the query be easy now...

-DNG
 
Code:
SELECT *
FROM
   MyTable T1
   INNER JOIN MyTable T2 ON T1.ID = T2.ID AND T1.OrderNum = T2.OrderNum - 1
WHERE
   Abs(T1.dist - T2.dist) > 5
Identifying which one is the outlier is again more complicated. If there are only two rows, which one is it?

If there are three rows and the first and last row are the same values, is the middle row the outlier or are both the other rows the outlier?

By the way, mstrmage was absolutely correct in saying that AVG is nearly useless to you, because the outlier will affect the average as well, potentially causing false positives in the analysis. If your values are 1 1 1 1 100, they're all more than 5 away from the average. The difference between min and max does work if that was what you were looking for, but you were wanting a row-by-row difference comparison.
 
Thank you everyone. I have learned new stuff from this thread.

Have Stars...

-DNG
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top