SQL and Star Schema 1

talenx · Mar 2, 2006

Hi all,
I was curious if anyone has heard of SQL server 2000 not optimizing it's self for a star schema.
Well... kind of...
I have a situation where the optimizer tends to display some odd results in execution plan. It wills attempt to first PULL ALL of the data from the fact table (which is huge by the way) something like 367 million last time I checked.
Anyways then it will join to the dimensional table to retrieval the filtered indexed value. It then rolls through a nested loop utilizing both indexes and finally spitting out a result. (10 min later) it’s almost like the optimizer is not take the table weight in to factor prior to execution

I have checked the index to confirm that they were valid and free of major fragmentation and also confirmed the index statistics were up to date.

Here is a dumbed down version of the SQL code

Code:

SELECT count(SP_Key) 
FROM FACT_TABLE with (nolock)
INNER JOIN DIMENSIONAL_TABLE
ON
DIMENSIONAL_TABLE.time_dimension_key = FACT_TABLE .Submitted_Date_Key
WHERE DIMENSIONAL_TABLE.date_number BETWEEN '1/1/2006 00:00:00.000' AND '3/31/2006 00:00:00.000'

Any thoughts

Thanks
Talenx

mrdenny · Mar 2, 2006

Update stats on the table as well as the indexes.
Rebuild the index on the fact table.
Is it doing a table scan, an index scan or an index seek?

Denny
MCSA (2003) / MCDBA (SQL 2000) / MCTS (SQL 2005)

--Anything is possible. All it takes is a little research. (Me)
[noevil]

http://www.mrdenny.com

Geoka · Mar 2, 2006

Denny,

Can you tell me whats the difference between Index Scan and Index Seek and which is better and how can we create it.

mrdenny · Mar 2, 2006

An index seek is better than an index scan.

An index seek happens when the data in the index is all correctly ordered. An index scan happens when SQL needs to read the entire index to find the rows.

With the index seek SQL reads through the index, get's to the correct data, then stops when it's found what it's looking for.

To get SQL to do a seek instead of a scan you'll want to look to rebuild the index, and make sure that your query is using the index correctly.

Denny
MCSA (2003) / MCDBA (SQL 2000) / MCTS (SQL 2005)

--Anything is possible. All it takes is a little research. (Me)
[noevil]

http://www.mrdenny.com

Geoka · Mar 2, 2006

Thanks Denny. That was a great piece of information. Thx a lot.. and yes ofcourse here is your star. Keep it up

talenx · Mar 3, 2006

mrdenny,

ya, that’s the strange thing... the stats are updated on both indexes and I have goes as far as to rebuild both indexes yet it still attempts to pull all of the data from the fact table. Then apply the filter via a nest loop.

Like I said... I haven't seen it before.... WEIRD!

Thanks

TalenX

mrdenny · Mar 4, 2006

Is this the estimated execution plan or the actual execution plan?

Denny
MCSA (2003) / MCDBA (SQL 2000) / MCTS (SQL 2005)

--Anything is possible. All it takes is a little research. (Me)
[noevil]

http://www.mrdenny.com

talenx · Mar 6, 2006

Oddly I thought there would be a difference as well... unfortunately the plans had the same result.

Thanks
TalenX

mrdenny · Mar 6, 2006

Is the index a single column index?

Denny
MCSA (2003) / MCDBA (SQL 2000) / MCTS (SQL 2005)

--Anything is possible. All it takes is a little research. (Me)
[noevil]

http://www.mrdenny.com

talenx · Mar 6, 2006

Yep, the index is on a Date_Key which is an int... OH!! One other thing I noticed about my code that I posted…
The dates are not hardcode as above. Rather they are pulling from another table. So it should look like this.

Code:

SELECT count(SP_Key) 
FROM FACT_TABLE with (nolock)
INNER JOIN DIMENSIONAL_TABLE
ON
DIMENSIONAL_TABLE.time_dimension_key = FACT_TABLE.Submitted_Date_Key
INNER JOIN DIMENSIONAL_AUTOPARAMETER_TABLE
ON
FACT_TABLE.Submitted_Date_Key.Date_Number >= DIMENSIONAL_AUTOPARAMETER_TABLE.From_Date AND
FACT_TABLE.Submitted_Date_Key.Date_Number >= DIMENSIONAL_AUTOPARAMETER_TABLE.To_Date 
WHERE DIMENSIONAL_AUTOPARAMETER_TABLE.ID = '1111'

When the values are hard coded the optimizer work fine.

hmmm...

mrdenny · Mar 6, 2006

I'm assuming that your second join should read:

Code:

FACT_TABLE.Submitted_Date_Key.Date_Number >= DIMENSIONAL_AUTOPARAMETER_TABLE.From_Date AND
FACT_TABLE.Submitted_Date_Key.Date_Number [red]<[/red]= DIMENSIONAL_AUTOPARAMETER_TABLE.To_Date

What if you change to using a between statement on the join?

Is the index clustered or not? (I don't see it in the thread.)

Denny
MCSA (2003) / MCDBA (SQL 2000) / MCTS (SQL 2005)

--Anything is possible. All it takes is a little research. (Me)
[noevil]

http://www.mrdenny.com

talenx · Mar 6, 2006

MrDenny,
Oh, sorry bout that...
Currently the index is not clustered, unfortunately the plan did not change when using the BETWEEN operator in the join clause.

I find it very odd that when a hard coded value is applied it uses the index and pulls from the dimension and pulls only those records that match the filter from the fact. I have even went as far as to code the hard coded dates as variables based on today’s date example ( odd thing here it also doesn't use the index like the hard coded date do.

Code:

SELECT count(SP_Key) 
FROM FACT_TABLE with (nolock)
INNER JOIN DIMENSIONAL_TABLE
ON
DIMENSIONAL_TABLE.time_dimension_key = FACT_TABLE.Submitted_Date_Key
INNER JOIN DIMENSIONAL_AUTOPARAMETER_TABLE
ON
FACT_TABLE.Submitted_Date_Key.Date_Number  BETWEEN dateadd(m, -1, cast(cast(month(getdate()) as varchar)+'/01/'+cast(year(getdate()) as varchar)  as datetime)) AND dateadd(d, -1, cast(cast(month(getdate()) as varchar)+'/01/'+cast(year(getdate()) as varchar)  as datetime))  
WHERE DIMENSIONAL_AUTOPARAMETER_TABLE.ID = '1111'

talenx · Mar 6, 2006

Well, it seems like the optimizer is looking at the date format...
Example
Will NOT use the index

Code:

select * from Time_Dimension where date_number between (dateadd (m, -1, cast (cast (month (getdate ())as varchar)+'/01/' +cast (year (getdate ())as varchar)as datetime))) and (dateadd (d, -1, cast (cast (month (getdate ())as varchar)+'/01/' +cast (year (getdate ())as varchar)as datetime)))

WILL used the index

Code:

select * from Time_Dimension where date_number between getdate() and getdate()

In addtion when appliing the the Date_Number from the Time_Dimension equal to the

Code:

(dateadd (m, -1, cast (cast (month (getdate ())as varchar)+'/01/' +cast (year (getdate ())as varchar)as datetime)))

(Which is the first set of code I had tried. the optimizer uses the index.... ugg I’m at a loss.)

Any thoughts

Thanks
TalenX

mrdenny · Mar 7, 2006

That was something I was going to recommend looking at this morning was data type issues. I know that if you have a function or a convert on the data then it won't use an index.

It may be that SQL is doing an implicit converion where it shouldn't be for some reason.

I assume that all your data types match up correctly?

Denny
MCSA (2003) / MCDBA (SQL 2000) / MCTS (SQL 2005)

--Anything is possible. All it takes is a little research. (Me)
[noevil]

http://www.mrdenny.com

talenx · Mar 7, 2006

I double checked the columns that are being used in the join and where clauses and they are compliant. We run pretty much all of our joining object off a segregate integer based key.

Very odd!! Even when I use an index hint the optimizer still performs a fact table scan.

hmm...

talenx · Mar 7, 2006

I know that if you have a function or a convert on the data then it won't use an index.

Do you happen to know where you found that documentation... that may be the issue here.

Thanks
TalenX

mrdenny · Mar 7, 2006

I don't think that I saw it anywhere. I think I got it from personal experience.

Denny
MCSA (2003) / MCDBA (SQL 2000) / MCTS (SQL 2005)

--Anything is possible. All it takes is a little research. (Me)
[noevil]

http://www.mrdenny.com

talenx · Mar 8, 2006

Ahh… I see. I think I’m going to have to concur with you ... it’s seem once a function is applied with some time of conversion the index becomes invalid to the optimizer.

I wonder if there is any way to force the optimizer to use the index... I have attempted to use an index hint... But with no avail.

mrdenny · Mar 8, 2006

If SQL is going an implicit conversion for some reason it won't use the index (even if you try and force it to).

How's the index on the dimention table look? Maybe it needs rebuilding and it's blocking the server from using the index on the fact table?

Denny
MCSA (2003) / MCDBA (SQL 2000) / MCTS (SQL 2005)

--Anything is possible. All it takes is a little research. (Me)
[noevil]

http://www.mrdenny.com

talenx · Mar 8, 2006

Thats the odd thing... everything looking good. here are the stats:

DBCC SHOW_STATISTICS
Statistics for FACT INDEX
Updated Rows Rows Sampled Steps Density Average key length
-------------------- -------------------- -------------------- ------ ------------------------ ------------------------
Mar 6 2006 2:40PM 71680256 304424 197 160.55275 4.0

Statistics for TIME DIMENSION INDEX .
Updated Rows Rows Sampled Steps Density Average key length
-------------------- -------------------- -------------------- ------ ------------------------ ------------------------
Feb 28 2006 6:46PM 7306 7306 4 1.368738E-4 4.0

DBCC SHOWCONTIG

DBCC SHOWCONTIG scanning Fact_TABLE' table...
Table: 'Pharmacy_Data_Fact_TABLE' (421576540); index ID: 4, database ID: 29
LEAF level scan performed.
- Pages Scanned................................: 155003
- Extents Scanned..............................: 24942
- Extent Switches..............................: 90108
- Avg. Pages per Extent........................: 6.2
- Scan Density [Best Count:Actual Count].......: 21.50% [19376:90109]
- Logical Scan Fragmentation ..................: 13.55%
- Extent Scan Fragmentation ...................: 11.16%
- Avg. Bytes Free per Page.....................: 1143.1
- Avg. Page Density (full).....................: 85.88%
DBCC execution completed. If DBCC printed error messages, contact your system administrator.

DBCC SHOWCONTIG scanning 'Time_Dimension' table...
Table: 'Time_Dimension' (597577167); index ID: 27, database ID: 29
LEAF level scan performed.
- Pages Scanned................................: 14
- Extents Scanned..............................: 2
- Extent Switches..............................: 1
- Avg. Pages per Extent........................: 7.0
- Scan Density [Best Count:Actual Count].......: 100.00% [2:2]
- Logical Scan Fragmentation ..................: 0.00%
- Extent Scan Fragmentation ...................: 0.00%
- Avg. Bytes Free per Page.....................: 268.1
- Avg. Page Density (full).....................: 96.69%
DBCC execution completed. If DBCC printed error messages, contact your system administrator.

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

SQL and Star Schema 1

Programmer

Programmer

Programmer

Programmer

Programmer

Programmer

Programmer

Programmer

Programmer

Programmer

Programmer

Programmer

Programmer

Programmer

Programmer

Programmer

Programmer

Programmer

Programmer

Programmer

Similar threads

Log in

Part and Inventory Search

Sponsor