Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Chriss Miller on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

query help - i'm an idiot 3

Status
Not open for further replies.

cLFlaVA

Programmer
Jun 14, 2004
6,450
US
hi people. i am a moron and apparently know nothing about SQL, despite what my employers may think.

i have the query below that is returning values that are sometimes 4x more than they should be. i thought i was handling for this by grouping everything. any idea what i'm doing wrong? i know i've run into this before and have just worked around it... now i'd like to know why it's happening.

Code:
select p.sub_id
     , p.sub_name
     , sum(pyb.amount) py_budget
     , sum(cyb.amount) cy_budget
     , sum(pya.amount) py_actual
     , sum(cya.amount) cy_actual
     , sum(aya.amount) ay_actual
     , sum(cyf.amount) cy_fcst
     , sum(ayf.amount) ay_fcst
  from sub_project p left join drilldown_approved pyb
    on (p.sub_id = pyb.sub_id and pyb.cost_budget_year = 2006
) left join drilldown_approved cyb
    on (p.sub_id = cyb.sub_id and cyb.cost_budget_year = 2007
) left join drilldown_actual pya
    on (p.sub_id = pya.sub_id and pya.cost_budget_year = 2006
) left join drilldown_actual cya
    on (p.sub_id = cya.sub_id and cya.cost_budget_year = 2007
) left join drilldown_actual aya
    on (p.sub_id = aya.sub_id
) left join drilldown_eac cyf
    on (p.sub_id = cyf.sub_id and cyf.cost_budget_year = 2007
) left join drilldown_eac ayf
    on (p.sub_id = ayf.sub_id)
 where p.program_id = 818
group by p.sub_id, p.sub_name

feel free to mock my code and offer insight as to how to improve...

thanks :)



*cLFlaVA
----------------------------
[tt]"quote goes here"[/tt]
[URL unfurl="true"]http://www.coryarthus.com/[/url]
 
Cut down the storm by using some case statements that use pivot strategies. Sorry about mangling your query and changing the style of everything.
Code:
select
   p.sub_id,
   p.sub_name,
   py_budget = sum(case when b.cost_budget_year = 2006 then b.amount else 0 end),
   cy_budget = sum(case when b.cost_budget_year = 2007 then b.amount else 0 end),
   py_actual = sum(case when a.cost_budget_year = 2006 then a.amount else 0 end),
   cy_actual = sum(case when a.cost_budget_year = 2007 then a.amount else 0 end),
   ay_actual = sum(a.amount),
   cy_fcst = sum(case when f.cost_budget_year = 2007 then f.amount else 0 end),
   ay_fcst = sum(f.amount)
from
   sub_project p
   left join drilldown_approved b on p.sub_id = b.sub_id
   left join drilldown_actual a on p.sub_id = a.sub_id
   left join drilldown_eac f on p.sub_id = f.sub_id and cyf.cost_budget_year = 2007
where
   p.program_id = 818
group by
   p.sub_id,
   p.sub_name
I tried valiantly to fix your query without changing everything, by adding conditions to the joins to make the years match. But I couldn't get it to work. Basically, I think you have an effective cross join because when there are multiple records for a 2006 or 2007 year the join doesn't specify to use those years when doing the join for the "all" data.

Find a sub_id where the amount is coming out wrong. Take out the grouping and the sums from your query and run it for just that sub_id. See that the join for the aya or the ayf is matching cost_budget_years other than 2006 or 2007 to the c or p corresponding tables that are 2006 or 2007.

[COLOR=black #d0d0d0]My alarm clock causes time travel. When I hit snooze, no apparent time passes before the alarm sounds again, but the universe has in fact moved forward through time by ten minutes![/color]
 
Your example is perfect to show why you're getting the wrong values. taking just your forecast table and joining to it twice, you now have four rows:

1483 2007 1000 2008 3000
1483 2007 1000 2008 4000
1483 2007 2000 2008 3000
1483 2007 2000 2008 4000

You're joining BEFORE the aggregation. If you were joining after aggregating by sub_id you'd be okay, but you're not.

[COLOR=black #d0d0d0]My alarm clock causes time travel. When I hit snooze, no apparent time passes before the alarm sounds again, but the universe has in fact moved forward through time by ten minutes![/color]
 
What's with the weird star awarding in this thread?

[COLOR=black #d0d0d0]My alarm clock causes time travel. When I hit snooze, no apparent time passes before the alarm sounds again, but the universe has in fact moved forward through time by ten minutes![/color]
 
Hi E2,

Sorry for the delay. I tried your method and I am still getting multiples of the data I'd expect. I have a feeling I need to determine an alternative way of retrieving/displaying this data.

As for the handing out of stars - not sure how it happened. There are some loonies on this site.

Thanks for your help.



*cLFlaVA
----------------------------
[tt]"quote goes here"[/tt]
[URL unfurl="true"]http://www.coryarthus.com/[/url]
 
You could try left joining to some subqueries. Something like this?

Code:
left join (select sub_id, sum(amount) as amount
			from drilldown_approved
			where cost_budget_year = 2006 
			group by sub_id) pyb
    on p.sub_id = pyb.sub_id

If this doesn't work then you may need to try correlated subqueries but that would probably be rather slow.

Hope this helps,

Alex

[small]----signature below----[/small]
I don't do any programming whatsoever

Ignorance of certain subjects is a great part of wisdom
 
tip: whenever combining aggregates from multiple tables, push the GROUP BYs down into subqueries
Code:
select p.sub_id
     , p.sub_name
     , py_budget
     , cy_budget
     , py_actual
     , cy_actual
     , ay_actual
     , cy_fcst
     , ay_fcst
  from sub_project p 
left 
  join ( 
       select sub_id
            , sum(amount) as py_budget
         from drilldown_approved 
        where cost_budget_year = 2006
       group
           by sub_id
       ) pyb
    on pyb.sub_id = p.sub_id 
left 
  join ( 
       select sub_id
            , sum(amount) as cy_budget
         from drilldown_approved 
        where cost_budget_year = 2007
       group
           by sub_id
       ) cyb
    on cyb.sub_id = p.sub_id 
left 
  join ( 
       select sub_id
            , sum(amount) as py_actual
         from drilldown_actual 
        where cost_budget_year = 2006
       group
           by sub_id
       ) pya
    on pya.sub_id = p.sub_id 
left 
  join ( 
       select sub_id
            , sum(amount) as cy_actual
         from drilldown_actual 
        where cost_budget_year = 2007
       group
           by sub_id
       ) cya
    on cya.sub_id = p.sub_id 
left 
  join ( 
       select sub_id
            , sum(amount) as ay_actual
         from drilldown_actual 
       group
           by sub_id
       ) aya
    on aya.sub_id = p.sub_id 
left 
  join ( 
       select sub_id
            , sum(amount) as cy_fcst
         from drilldown_eac 
        where cost_budget_year = 2007
       group
           by sub_id
       ) cyf
    on cyf.sub_id = p.sub_id 
left 
  join ( 
       select sub_id
            , sum(amount) as ay_fcst
         from drilldown_eac 
       group
           by sub_id
       ) ayf
    on ayf.sub_id = p.sub_id 
 where p.program_id = 818

r937.com | rudy.ca
 
Thanks Alex, Rudy.

Rudy,

This worked perfectly. My only folly is that I tried to simplify this whole thing for everyone's sanity. Yes, I am a victim of the one thing I tell people not to do on this site.

There is one additional level of granularity for this. In addition to Project (sub_id), I need to also group by Cost Category.

Here is my revised attempt:

Code:
select p.sub_id
     , p.sub_name
     [red], c.cost_category_cd[/red]
     , py_budget
     , cy_budget
     , py_actual
     , cy_actual
     , ay_actual
     , cy_fcst
     , ay_fcst
  from sub_project p
[red]left
  join cost c
    on p.sub_id = c.sub_id[/red]
left
  join (
       select sub_id
            [red], cost_category_cd[/red]
            , sum(amount) as py_budget
         from drilldown_approved
        where cost_budget_year = 2006
       group
           by sub_id
            [red], cost_category_cd[/red]
       ) pyb
    on pyb.sub_id = p.sub_id[red] and c.cost_category_cd = pyb.cost_category_cd[/red]
left
  join (
       select sub_id
            [red], cost_category_cd[/red]
            , sum(amount) as cy_budget
         from drilldown_approved
        where cost_budget_year = 2007
       group
           by sub_id
            [red], cost_category_cd[/red]
       ) cyb
    on cyb.sub_id = p.sub_id[red] and c.cost_category_cd = cyb.cost_category_cd[/red]
left
  join (
       select sub_id
            [red], cost_category_cd[/red]
            , sum(amount) as py_actual
         from drilldown_actual
        where cost_budget_year = 2006
       group
           by sub_id
            [red], cost_category_cd[/red]
       ) pya
    on pya.sub_id = p.sub_id[red] and c.cost_category_cd = pya.cost_category_cd[/red]
left
  join (
       select sub_id
            [red], cost_category_cd[/red]
            , sum(amount) as cy_actual
         from drilldown_actual
        where cost_budget_year = 2007
       group
           by sub_id
            [red], cost_category_cd[/red]
       ) cya
    on cya.sub_id = p.sub_id[red] and c.cost_category_cd = cya.cost_category_cd[/red]
left
  join (
       select sub_id
            [red], cost_category_cd[/red]
            , sum(amount) as ay_actual
         from drilldown_actual
       group
           by sub_id
            [red], cost_category_cd[/red]
       ) aya
    on aya.sub_id = p.sub_id [red]and c.cost_category_cd = aya.cost_category_cd[/red]
left
  join (
       select sub_id
            [red], cost_category_cd[/red]
            , sum(amount) as cy_fcst
         from drilldown_eac
        where cost_budget_year = 2007
       group
           by sub_id
            [red], cost_category_cd[/red]
       ) cyf
    on cyf.sub_id = p.sub_id [red]and c.cost_category_cd = cyf.cost_category_cd[/red]
left
  join (
       select sub_id
            [red], cost_category_cd[/red]
            , sum(amount) as ay_fcst
         from drilldown_eac
       group
           by sub_id
            [red], cost_category_cd[/red]
       ) ayf
    on ayf.sub_id = p.sub_id [red]and c.cost_category_cd = ayf.cost_category_cd[/red]
 where p.program_id = 818

however this is returning duplicate rows. if you have any advice it would be greatly appreciated as always.

thanks again,

Cory



*cLFlaVA
----------------------------
[tt]"quote goes here"[/tt]
[URL unfurl="true"]http://www.coryarthus.com/[/url]
 
Good job Cory.

Now where's my star?

-kaht

Lisa, if you don't like your job you don't strike. You just go in every day and do it really half-assed. That's the American way. - Homer Simpson
 
Cory,

I find the title of this thread to be a bit offensive.

[tt][blue]query help - i'm an idiot[/blue][/tt]

If you were an idiot, you would probably not bother to ask for help. When someone realizes that a problem must have a solution, but is unable to recognize what that solution is and is willing to ask for help, then that person is certainly not an idiot. Those people that are unable to find an elegant solution to a problem, and ultimately hack a solution together, especially when that solution is slow and prone to bugs, are idiots. This is not an accurate description of you.

Please do me a favor... In the future, when you are referring to yourself, do NOT call yourself an idiot. The true idiots of the world do not post questions on tek-tips because they simply don't know enough to recognize a hacked solution. As soon as someone is willing to accept that they don't "know it all", they are instantly removed from the idiot classification.

You, my friend, are not an idiot!



-George

"the screen with the little boxes in the window." - Moron
 
Try adding a bunch of things like this to your where clause?

and cya.cost_category_cd is not null

I must confess, it is getting even harder for me to understand your data, but your join to cost_category_cd has clearly messed things up. Can you use an inner join instead, or are there going to be times where you have an unassigned cost_category_cd?

Not having the data at my disposal, something like this is starting to look a lot better to me:

Code:
select p.sub_id
	, p.sub_name
	, c.cost_category_cd
	, (select sum(amount) from drilldown_approved
		where cost_budget_year = 2006 
			and sub_id = p.sub_id
			and cost_category_cd = c.cost_category_cd) py_budget
	, (select sum(amount) from drilldown_approved
		where  cost_budget_year = 2007
			and sub_id = p.sub_id
			and cost_category_cd = c.cost_category_cd) cy_budget
---etc....
from sub_project p
left
  join cost c
    on p.sub_id = c.sub_id


Hope this helps,

Alex

[small]----signature below----[/small]
I don't do any programming whatsoever

Ignorance of certain subjects is a great part of wisdom
 
Combine the pivot methods with the aggregating prior to joining. I should have seen it, although I did say 52 minutes after my flawed query, "you're joining BEFORE the aggregation. If you were joining after aggregating by sub_id you'd be okay, but you're not."

Code:
select
   p.sub_id,
   p.sub_name,
   py_budget = sum(case when b.cost_budget_year = 2006 then b.amount else 0 end),
   cy_budget = sum(case when b.cost_budget_year = 2007 then b.amount else 0 end),
   py_actual = sum(case when a.cost_budget_year = 2006 then a.amount else 0 end),
   cy_actual = sum(case when a.cost_budget_year = 2007 then a.amount else 0 end),
   ay_actual = sum(a.amount),
   cy_fcst = sum(case when f.cost_budget_year = 2007 then f.amount else 0 end),
   ay_fcst = sum(f.amount)
from
   sub_project p
   left join (
      select sub_id, cost_category_cd, amount = sum(amount)
      from drilldown_approved
      group by sub_id, cost_category_cd
   ) b on p.sub_id = b.sub_id and p.cost_category_cd = b.cost_category_cd
   left join (
      select sub_id, cost_category_cd, amount = sum(amount)
      from drilldown_actual 
      group by sub_id, cost_category_cd
   ) a on p.sub_id = a.sub_id and p.cost_category_cd = p.cost_category_cd
   left join (
      select sub_id, cost_category_cd, amount = sum(amount)
      from drilldown_eac
      group by sub_id, cost_category_cd
   ) f on p.sub_id = f.sub_id and p.cost_category_cd = p.cost_category_cd
where
   p.program_id = 818
group by
   p.sub_id,
   p.sub_name,
   p.cost_category_cd
You might see if performance improves by doing a join to sub_project in each derived table to include the "p.program_id = 818" where clause in each one.

[COLOR=black #e0e0e0]For SQL and technical ideas, visit my blog, Squared Thoughts.[/color]
 
ESquared, this won't work because there is no [tt]cost_budget_year[/tt] in the joined subqueries. i'll continue working to see if I can fix this.

Alex, I agree - that is looking like the more attractive solution. I began this as an exercise in learning, but it seems all I've done is confused everyone!

thanks



*cLFlaVA
----------------------------
[tt]"quote goes here"[/tt]
[URL unfurl="true"]http://www.coryarthus.com/[/url]
 
This seems to work beautifully. Although I was trying to avoid the subqueries in the select from the get-go (not even sure why anymore), it runs quickly and gets me what I need.

Code:
select distinct p.sub_id
     , p.sub_name
     , cd.code_short_descr cost_category
     , py_budget = (select sum(amount)
                      from drilldown_approved
                     where sub_id = p.sub_id
                       and cost_budget_year = 2006
                       and cost_category_cd = c.cost_category_cd)
     , py_actual = (select sum(amount)
                      from drilldown_actual
                     where sub_id = p.sub_id
                       and cost_budget_year = 2006
                       and cost_category_cd = c.cost_category_cd)
     , cy_budget = (select sum(amount)
                      from drilldown_approved
                     where sub_id = p.sub_id
                       and cost_budget_year = 2007
                       and cost_category_cd = c.cost_category_cd)
     , cy_actual = (select sum(amount)
                      from drilldown_actual
                     where sub_id = p.sub_id
                       and cost_budget_year = 2007
                       and cost_category_cd = c.cost_category_cd)
     , cy_frcast = (select sum(amount)
                      from drilldown_eac
                     where sub_id = p.sub_id
                       and cost_budget_year = 2007
                       and cost_category_cd = c.cost_category_cd)
     , ay_actual = (select sum(amount)
                      from drilldown_actual
                     where sub_id = p.sub_id
                       and cost_category_cd = c.cost_category_cd)
     , ay_frcast = (select sum(amount)
                      from drilldown_eac
                     where sub_id = p.sub_id
                       and cost_category_cd = c.cost_category_cd)
  from sub_project p
       join cost c on p.sub_id = c.sub_id
       join code_decode cd on c.cost_category_cd = cd.code
 where p.program_id = 818
   and cd.category_id = 7
   and c.cost_active_ind = 1

If you see any red flags here please feel free to chime in.

Thanks to all of you for your help, it is always very much appreciated.

Cory



*cLFlaVA
----------------------------
[tt]"quote goes here"[/tt]
[URL unfurl="true"]http://www.coryarthus.com/[/url]
 
While the correlated subquery is never going to be a speed demon, I think in this case it is fair to use. They eliminate some of the mess that you encountered trying to wade your way through all those joins, and that could be worth 5, even 10 milliseconds in slower execution ;-)

[small]----signature below----[/small]
I don't do any programming whatsoever

Ignorance of certain subjects is a great part of wisdom
 
I think this improvement will be more than 5 or 10 milliseconds:

Code:
select distinct p.sub_id
     , p.sub_name
     , cd.code_short_descr cost_category
     , p.sub_id,
     , p.cost_category_cd,
     , py_budget = sum(case when x.type = 'b' and x.cost_budget_year = 2006 then x.amount else 0 end),
     , cy_budget = sum(case when x.type = 'b' and x.cost_budget_year = 2007 then x.amount else 0 end),
     , py_actual = sum(case when x.type = 'a' and x.cost_budget_year = 2006 then x.amount else 0 end),
     , cy_actual = sum(case when x.type = 'a' and x.cost_budget_year = 2007 then x.amount else 0 end),
     , ay_actual = sum(case when x.type = 'a' then x.amount else 0 end),
     , cy_fcst = sum(case when x.type = 'f' and x.cost_budget_year = 2007 then x.amount else 0 end),
     , ay_fcst = sum(case when x.type = 'f' then x.amount else 0 end)
  from sub_project p
       join cost c on p.sub_id = c.sub_id
       join code_decode cd on c.cost_category_cd = cd.code
       left join (
          select type = 'b', sub_id, cost_category_cd, cost_budget_year, amount
          from drilldown_approved
          union all select 'a', sub_id, cost_category_cd, cost_budget_year, amount
          from drilldown_actual
          union all select 'f', sub_id, cost_category_cd, cost_budget_year, amount
          from drilldown_eac
       ) x
 where p.program_id = 818
   and cd.category_id = 7
   and c.cost_active_ind = 1
And I think it's also worth playing with some variations. It all depends on the size of the data and the unselected rows and indexes and execution plan and, well, everything.

Here's one that joins for each of the union queries in the derived table:

Code:
select distinct p.sub_id
     , p.sub_name
     , cd.code_short_descr cost_category
     , p.sub_id,
     , p.cost_category_cd,
     , py_budget = sum(case when x.type = 'b' and x.cost_budget_year = 2006 then x.amount else 0 end),
     , cy_budget = sum(case when x.type = 'b' and x.cost_budget_year = 2007 then x.amount else 0 end),
     , py_actual = sum(case when x.type = 'a' and x.cost_budget_year = 2006 then x.amount else 0 end),
     , cy_actual = sum(case when x.type = 'a' and x.cost_budget_year = 2007 then x.amount else 0 end),
     , ay_actual = sum(case when x.type = 'a' then x.amount else 0 end),
     , cy_fcst = sum(case when x.type = 'f' and x.cost_budget_year = 2007 then x.amount else 0 end),
     , ay_fcst = sum(case when x.type = 'f' then x.amount else 0 end)
  from sub_project p
       join cost c on p.sub_id = c.sub_id
       join code_decode cd on c.cost_category_cd = cd.code
       left join (
          select type = 'b', sub_id, cost_category_cd, cost_budget_year, amount
          from drilldown_approved b
               join project p on b.sub_id = p.sub_id and b.cost_category_cd = p.cost_category_cd
               join cost c on p.sub_id = c.sub_id
               join code_decode cd on c.cost_category_cd = cd.code
          where p.program_id = 818
                and c.cost_active_ind = 1
                and cd.category_id = 7
          union all
          select 'a', sub_id, cost_category_cd, cost_budget_year, amount
          from drilldown_actual a
               join project p on a.sub_id = p.sub_id and a.cost_category_cd = p.cost_category_cd
               join cost c on p.sub_id = c.sub_id
               join code_decode cd on c.cost_category_cd = cd.code
          where p.program_id = 818
                and c.cost_active_ind = 1
                and cd.category_id = 7
          union all
          select 'f', sub_id, cost_category_cd, cost_budget_year, amount
          from drilldown_eac f
               join project p on f.sub_id = p.sub_id and f.cost_category_cd = p.cost_category_cd
               join cost c on p.sub_id = c.sub_id
               join code_decode cd on c.cost_category_cd = cd.code
          where p.program_id = 818
                and c.cost_active_ind = 1
                and cd.category_id = 7
       ) x
 where p.program_id = 818
   and cd.category_id = 7
   and c.cost_active_ind = 1
You might also try a version of this with the join in the unions to code_decode removed, and its where clause removed. And also try that with the join in the unions to cost removed.

I'm sure to have made a mistake somewhere since I can't test this. In my own testing with some sample data (I don't have a cost or code_decode table) the top query was half the cost of the last one cLFlaVA posted.

Another variation would be to select the list of cost_category_cds where category_id = 7 into a temp table or table variable, and join to that. Or use a CTE if you're using SQL 2005.

[COLOR=black #e0e0e0]For SQL and technical ideas, visit my blog, Squared Thoughts.[/color]
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top