Clustered Index 1

talenx · Feb 28, 2006

Hello all,
Now if I understand clustered index correctly, they will store the index in the same order as the data there for speeding up grouping and aggregation.
Though one issue that I have seen is during data loads. (In an EDW environment)
I have several data load being posted via bulk insert into various tables and have.
My testing has shown a 50 % decrease in load time with table containing a clustered index verses those that do not contain one.
Now I understand that the clustered index has to re-org it's self for every record inserted. This included updates and deletes.

Given example:
Say we have a file source that has 5 million record in it and it need to be loaded into an exiting table with 20 million records.
TABLE-A has a clustered index on the primary key comprised of 4 columns.
TABLE-A record count 20 million

Would it be more efficient to drop the clustered index prior to performing the load or load with the index in place?
My tests showed neither one were done with in a acceptable time (hence the reason the cluster index has been removed)

Unfortunately here in lies the issue, I want to reap the rewards of a clustered index but don’t want to deal with the time it takes to work around them.

Does anyone have any experience with this type of issue, and type of solution did you find work for you.

Thanks
Talenx

Catadmin · Mar 1, 2006

It would definitely be more efficient to drop the clustered index before the bulk loads and then rebuild it after the bulk loads. I know that can be time consuming and a pain, but truthfully, it's your best solution.

BTW, have you tried setting the database's recovery mode to Bulk-Logged? It might be part of your time is spent because the DB is in FULL recovery mode and logging each and every transaction rather than the sole product of the clustered index being a pain.

Also, clustered indexes don't really reorganize themselves unless you defrag & rebuild them. They reorginize all the data in the database in AlphaNumeric order depending on what column you've indexed. There is a difference between the two. Indexes actually reside on separate pages than the actual data does and they point to the data pages.

SQL Server 2005 makes this process much easier, BTW. You can disable the clustered index which keeps the data for the index and makes rebuilding it much faster and easier.

Hope this helps.

Catadmin - MCDBA, MCSA
"The only stupid question is the one that *wasn't* asked.

talenx · Mar 1, 2006

Catadmin , thanks for the quick’s reply.
Currently I have the recovery model set to simple (which I believe should be the least intrusive when it comes to data loads)
I had the understanding that if a clustered index existed on a table and you had performed a DML operation like and insert it would reorder the index upon commit.

Am I incorrect?
Thanks
TalenX

Catadmin · Mar 3, 2006

Okay. I went back and refreshed my memory of clustered indexes. So, here goes.

What is happening when you update your table or insert rows is that SQL is literally inserting the data inside the pages of the clustered index. So, it is searching for your index and then, yes you are correct, reorganizing those indexes as it inserts data. It even will split pages if enough data is getting inserted into a particular part of the index. Split Pages = BAD THING (most of the time).

Because of all this additional overhead, it is not surprising that having the clustered index on your machine is slowing things down. However, you're left with the choice of two evils. Slow things down on the inserts and keep your index for better data retrieval or kill your index for better insert time and kill your data retrieval time because nothing is sorted.

A third option is to do as I suggested earlier, disable/drop the index for the time of the insert and rebuild it after the insert. However, it's still going to take up server resources and server time to rebuild the index.

You're going to have to do what is best for you and your company. Regardless, you're still looking at possible production down/slow time.

I hope I didn't confuse you even more.

Catadmin - MCDBA, MCSA
"The only stupid question is the one that *wasn't* asked.

talenx · Mar 3, 2006

Catadmin ,
Ok... I thought I was losing my mind ... then again it would be the first time.
I would need to test to see how long it takes to rebuild a clustered index on 20 million line table verses the time it takes to perform any DML operations.
(BTW 20 million is one of the smaller fact tables some db have closer to the 500 million line mark)
I will also have to look into SQL 2005 and those new index properties. DISABLE index hmm... so i wonder if you disable a index prior to load and then re-enable it after the load is complete, if is the same a rebuilding the index.

Thanks
Talenx

Catadmin · Mar 6, 2006

There is no ENABLE or RENABLE command for indices on SQL Server 2005. You can ENABLE a trigger, but not an index. The index has to be rebuilt.

On the plus side, because information is saved when you DISABLE an index, I'm sure rebuilding in 2005 is much faster than rebuilding in 2000.

Catadmin - MCDBA, MCSA
"The only stupid question is the one that *wasn't* asked.

talenx · Mar 6, 2006

Hey CatAdmin,
Sounds like that might be a good assumption… Hope your right! I have a demo copy of 2005 that I will be testing with shortly.
If correct, I can see implementing it into some of my load logic.
Example
Disable the index
Load data
Rebuild index.
Obviously I will need to do some intrusive testing to see if the results are acceptable.

As far as SQL 2000 I think your right... I just wished there was a less intrusive way to gain performance via index maintenance...

Thanks
TalenX

Catadmin · Mar 7, 2006

Let us know how things go with your 2005 testing. I'm sure more than just me are interested in the results.

Catadmin - MCDBA, MCSA
"No, no. Yes. No, I tried that. Yes, both ways. No, I don't know. No again. Are there any more questions?"
-- Xena, "Been There, Done That"

talenx · Mar 7, 2006

Will do.. BTW thank you for the insite. your input has been very usefully.

Catadmin · Mar 7, 2006

Anytime. @=)

Catadmin - MCDBA, MCSA
"No, no. Yes. No, I tried that. Yes, both ways. No, I don't know. No again. Are there any more questions?"
-- Xena, "Been There, Done That"

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Clustered Index 1

talenx

Programmer

Catadmin

Programmer

talenx

Programmer

Catadmin

Programmer

talenx

Programmer

Catadmin

Programmer

talenx

Programmer

Catadmin

Programmer

talenx

Programmer

Catadmin

Programmer

Similar threads

Part and Inventory Search

Sponsor