Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations gmmastros on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Workround not being able to partition in SQL 2005?

Status
Not open for further replies.

Welshbird

IS-IT--Management
Jul 14, 2000
7,378
0
0
DE
Hi All,

I have a (fairly) large table of approx 4,000,000 records each month for presently 27 months.

I diable and rebuild the indices before each month's data load and currently it takes almost 2 hours to rebuild them.

As the veriosn of SQL i have won't allow me to partition the tables - should I be thinking of a new table each month and vuiew that unions them all? Or would this not help at all.

Just feels like I'm not optimising this very well and I'm not sure what to do.

Thanks,

Fee

Fee

"The cure for anything is salt water – sweat, tears, or the sea." Isak Dinesen
 
Have you, (or your DBA's) done any investigation as to how useful those indices really are and if there is any specific and identifiable reason for the time taken?...

Personally, if that's not viable I would separate out your maintenance tasks from your data load tasks and potentially look at archiving older data that is not frequently queried or accessed.

Rhys

"Technological progress is like an axe in the hands of a pathological criminal"
"Two things are infinite: the universe and human stupidity; and I'm not sure about the the universe"
Albert Einstein
 
You should create a new table each month. Before doing this, do some research on "SQL Server 2000 Horizontal Partitioning". The key to this process is adding check constraints to the table. This will allow SQL Server to ONLY check the relevant base tables.

For example, if you have 27 separate tables, each with a check constraint for the particular date range, and then you do a select for a single day, it will ignore 16 tables and only do an index seek on the one table with 4,000,000 rows. This will improve performance.

If the data never changes once it is loaded each month, then you shouldn't need to re-index the old data each month either.




-George
Microsoft SQL Server MVP
My Blogs
SQLCop
twitter
"The great things about standards is that there are so many to choose from." - Fortune Cookie Wisdom
 
Each month a load data which is all for that month. Only one date in that month too (it comes with a character field simply saying Dec 12 or whatever - I convert to 2012-12-01 so that I can use in-built date functionality.)

So if I never have to search on a date within a month - do I still gain performance on separate tables?

(Generally a query wants results from all months you see - hence why I don't know what is best.)

Fee

"The cure for anything is salt water – sweat, tears, or the sea." Isak Dinesen
 
If there is another column you can use to separate the data that would benefit the queries, then you could "partition" on that column instead. Otherwise, you will probably not see much benefit.

If you could put the data on separate physical disks you could probably benefit from parallel query plans, but that seems like a lot of overkill for a mere 4 million rows.

-George
Microsoft SQL Server MVP
My Blogs
SQLCop
twitter
"The great things about standards is that there are so many to choose from." - Fortune Cookie Wisdom
 
I think I'll just leave as is then.

So, my remaining question is this: IF I don't need to disable and rebuild indices each time I load - isn't the index masking the load slower? Do I just leave it there and load and carry on?

(I wish we had a DBA!)

Fee

"The cure for anything is salt water – sweat, tears, or the sea." Isak Dinesen
 
The answer is..... well.... complicated.

Unfortunately, the only way you're going to know for sure is to try it various ways.

Indexes will usually slow down the inserts. This is true. The question is, by how much. If you don't disable the index during the import, how long will it take?

Unfortunately, this (load time) is not the only consideration. By disabling and rebuilding the indexes, you are ensuring that the indexes are "as good as it gets". Indexes take up space in the database. When inserting data (with an index), each insert will need to update the index data. When this happens, there is an opportunity for index fragmentation.

Index fragmentation occurs when there is not enough space on the data page (which is 8 KB) to hold the data you are trying to add. When this happens, SQL Server will split the single page in to 2 pages with 1/2 the data on one page and the other 1/2 on another page.

Think about a phone book. Each page is full of people and phone numbers. When a new person gets a phone, they need to be added to the appropriate page in the book. Since each page is full, you could modify each page after the new entry so that each page remains full, or you can insert a new page at the appropriate spot in the book with the information you need.

Now imagine a phone book with each page filled to 90% (10% free space). When a new entry is made, there is already room for it. It's still possible to get page splits because there is only 10% free space on each page, but it is less likely to occur.

SQL Server implements this "concept" with something called "Fill Factor". Each index can have a different fill factor, and the value you choose depends on your data. With fill factor, you specify the value as a percentage, from 0 to 100. Please note that 0 and 100 do exactly the same thing, they fill up the data pages with as much data as possible.

When you specify a fill factor less than 100 percent, there will be some amount of free space in the pages. To accommodate all of this free space, more pages will be necessary. More pages means larger database size. The other problem with fragmentation is query speed. If your data is fragmented, the query engine will need to examine more pages before it can return the results.

My suggestion is this:

Examine the fragmentation level of your indexes before you load data. You can see what your fragmentation level is for each index with the following code:

Code:
SELECT ps.database_id, ps.OBJECT_ID,
ps.index_id, b.name,
ps.avg_fragmentation_in_percent
FROM sys.dm_db_index_physical_stats (DB_ID(), NULL, NULL, NULL, NULL) AS ps
INNER JOIN sys.indexes AS b ON ps.OBJECT_ID = b.OBJECT_ID
AND ps.index_id = b.index_id
WHERE ps.database_id = DB_ID()
ORDER BY ps.avg_fragmentation_in_percent DESC

You don't need to worry about running this query on a large or heavily used database because it's just looking at meta data.

Then take a look at your fill factors with the following code:

Code:
Select object_name(object_id) As Index_Name, 
       fill_factor 
from   sys.indexes

If the fill factor for your huge indexes is 0, then I encourage you to change it to 90 and then rebuild the index. Be aware that a fill factor of 90% will take 10% additional storage.

Then load your data. After loading, examine your fragmentation again. If it's high, then rebuild just the indexes with high fragmentation.

There are proven scripts that exist that are used by thousands of people all over the world. The best one is probably this:


-George
Microsoft SQL Server MVP
My Blogs
SQLCop
twitter
"The great things about standards is that there are so many to choose from." - Fortune Cookie Wisdom
 
I would suggest that the fillfactor depends on what the system is used for.

If it is an OLTP system then 90% could be a good choice. However if your loading data into OLAP system - on my past experiance (not as much as gmmastros however!) i would use 0 or 100 and disable the indexes during load.

I have found this to be the most efficient way of loading and give nice full indexes on the rebuild. I would also make sure that during the load you load the data in pk order so that the PK doesnt get fragmented.

Just my 2p.

Dan

----------------------------------------

Be who you are and say what you feel, because those who mind don't matter and those who matter don't mind - Bernard Baruch

Computer Science is no more about computers than astronomy is about telescopes - EW Dijkstra
----------------------------------------
 
Dan,

Please don't think I have a lot of experience with this. My largest database is approximately 4 GB.

I was thinking that if the index rebuild is taking a long time, Welshbird may be able to leave the indexes enabled and then do an index reorganize instead of index rebuild which may makes things a little faster. I was also thinking that there are 27 months of data already loaded, so adding another month would represent approximately 4% of the data, and that number will decline in subsequent months. With 4% data load, assuming an good mix of data, I would expect some index fragmentation during the load, but perhaps not too much (depending one the keys in the indexes).

This is why I suggested that Welshbird try it this way. This is the only way to know for sure.

-George
Microsoft SQL Server MVP
My Blogs
SQLCop
twitter
"The great things about standards is that there are so many to choose from." - Fortune Cookie Wisdom
 
Thanks guys,

I'm beginning to feel like I understand! I'll try and see how I get on.

Fee

"The cure for anything is salt water – sweat, tears, or the sea." Isak Dinesen
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top