Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations TouchToneTommy on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Split table to increase performance??

Status
Not open for further replies.

wg26

Programmer
Mar 21, 2002
135
US
Hi Everyone:

I have a table which has 7 columns. But this table becames really really large. I am thinking that I should split table into smaller table, based on the one of column,called "Item Name".So each Item Name will have its own table with same columns. But in that case, I have to create a lookup table, which stores Item Name and Lookup ID. I use LookupID as Name of each Item Name table...So when I retrieve data, my query will go to lookup table and get lookID...and use LookId to find individual table (in this case, my query is dynamic)...I have tested it, but data retrieve performance has not increased at all...instead it has decreased a bit...I don't really understand...the performance decreasing is becasuse of dynamic query or because of extra data instored in database, or the concept itself is wrong, or my query? Can anyone kindly give me some suggesitons? Thanks alot
 
How big is the table? How many rows? How many characters per row? Are the appropriate columns indexed? Have queries been optimized?

SQL Server can handle millions of rows quite easily if tables are properly indexed and queries are optimized. Terry L. Broadbent - DBA
SQL Server Page:
If you want to get the best answer for your question read faq183-874.
 
What kind of indexes have you got on the table? If you search on 'Item Name' the most, make sure it has a clustered index or if it is 'Item Name' and some other field then a clustered index on both. Also, it is quicker to run scans on numeric fields so if you could use an integer to identify an item rather than an alphanumeric name it would speed up the search. Durkin
 
Thanks guys for your suggestions.
here is my table:

[Detail]
[PK Detail] bigint identity(1,1) not null,
[FK Analyis] uniqueidentifier default nowid() not null,
[Detail Name] varchar(300) not null
[detail Min] float null
[detail Max] float null
[detail Avg] float null
[detail num] float null

And I have non clustered index:
IX_Detail including ([Detail Name],[FK Analysis],[Detail Min],[Detail Max],[Detail Avg],[Detail num])
and Clustered Index: on [PK Detail]...

The table could go as large as 120 million rows...Yes,the query has been optimized...since the table became so large, that is why I want to split it to increase performance...I don't know if this is a pratical solution...Any suggestion? Thanks alot...
 
Clustered indexes work best on a sinlgle small column. Remember that a clustered index is carried on ALL non clustered indexes in the table. If you create a clustered index on multiple columns, each column's data will be duplicated in ALL non-clustered indexes. This is usually not good for performance. Terry L. Broadbent - DBA
SQL Server Page:
If you want to get the best answer for your question read faq183-874.
 
Thanks Terry.

But My clustered index only has one column as I listed above and only my non_clustered index has multiple columns...what do you think my concept of spliting the table into multiple tables....do you think it will increase the performance?
 
How is the table queried? What criteria is most used to select data? Do the queries use the indexes? Many queries will not use indexes (i.e., Like '%string%', convert(char(12), colname) = value, etc.).

Do queries normally include all the data of the non-clustered index, either in the WHERE clause or SELECT list? The non-clustered index appears to be a covering index and can be very fast for searches by [Detail Name].

How many rows a normally returned with each query? Do the queries perform aggregation or sorting of the result set? What is the current performance of the main queries? Terry L. Broadbent - DBA
SQL Server Page:
If you want to get the best answer for your question read faq183-874.
 
How is your system configured? Are you spreading the load across different hard disks? Are your indexes and transaction logs also on differnt disk systems?
 
OOOOH, I got an "Excellent" from Terry. That's better than a star!
 
Thank you Terry and SQLSister for your help...

In order to make my question more clear, I'd like to post the whole story here so that we can discuss more if you guy don't mind...

Here is the original query:

SELECT [Wafer].[Tool ID],
[Wafer].[Chamber Number],
[Wafer].[Chamber ID],
[Wafer].[Lot ID],
[Wafer].[Substrate ID],
[Wafer].[Substrate Start Time],
[Wafer].[Recipe Lookup ID],
[Parameters].[Sensor Number],
[Parameters].[Sensor Type],
[Parameters].[File PathName],
[Analyses].[Analysis When],
[Analyses].[Analysis Type],
[Analyses].[Analysis Step],
[Analyses].[ReAnal Time],
[Analyses].[ReAnal Recipe File PathName],
[Detail].[Detail Name],
[Detail].[Detail NumData],
[Detail].[Detail Min],
[Detail].[Detail Max],
[Detail].[Detail Avg],
[Detail].[Detail SDev]

FROM [Wafer] INNER JOIN
( [Parameters] INNER JOIN
( [Analyses] INNER JOIN
[Detail]
ON [Analyses].[PK Run Analyses] = [Detail].[FK Run Analyses])
ON [Parameters].[PK Run Parameters] = [Run Analyses].[FK Run Parameters])
ON [Wafer].[PK Wafer Parameters] = [Run Parameters].[FK Wafer Parameters]

WHERE [Wafer].[Substrate Start Time] Between '1/1/1993 0:0:0' and '1/07/2003 17:20:00'
And ([Detail].[Detail Name] in ('2 amu', '44 amu'))

ORDER BY [Detail].[Detail Name],
[Analyses].[Analysis Step],
[Analyses].[Analysis Type],
[Analyses].[ReAnal Time],
[Parameters].[Sensor Type],
[Wafer].[Substrate Start Time],
[Wafer].[Lot ID],
[Wafer].[Tool ID],
[Wafer].[Chamber ID]

There are covering indexes on all the table above including the columns selected in select clause....but since [Detail] table gets so large...say 500million rows...the data retrieving costs long time...therefore, I split the [Detail] table based on the[Detail Name] and make each [Detail Name] has its own table...and built a lookup table where you can find [Detail Name] and lookupID(uniqueidentifier) and also use lookId as the name of each[Detail Name] table...therefore, query changes to the as the following (dynamic query):

Declare CursorTable Cursor
for
Select [Lookup ID]
from [Detail Name Lookup]
where [Detail Name] in ('2 amu', '44 amu')(name can be changed by user input)

declare @TableName uniqueidentifier
declare @SQLCode varchar(5000)
Open CursorTable
Fetch next from CursorTable into @TableName

while @@Fetch_status = 0
begin
Set @SQLCode = 'SELECT
[Wafer].[Tool ID],
[Wafer].[Chamber Number],
[Wafer].[Chamber ID],
[Wafer].[Lot ID],
[Wafer].[Substrate ID],
[Wafer].[Substrate start time], [Wafer].[Recipe Lookup ID],
[Parameters].[Sensor Number], [Parameters].[Sensor Type],
[Parameters].[File PathName],
[Analyses].[Analysis When], [Analyses].[Analysis Type],
[Analyses].[Analysis Step], [Analyses].[ReAnal Time],
[Analyses].[ReAnal Recipe File PathName],'+
'[' + convert(varchar(50), @TableName) + '].'+ '[Detail Name],' + '[' + convert(varchar(50), @TableName) + '].'+ '[Detail NumData],'+ '[' + convert(varchar(50), @TableName) + '].'+ '[Detail Min],' + '[' + convert(varchar(50), @TableName) + '].'+ '[Detail Max],' + '[' + convert(varchar(50), @TableName) + '].'+ '[Detail Avg],' + '[' + convert(varchar(50), @TableName) + '].'+ '[Detail SDev]' +

'FROM [Wafer] INNER JOIN
( [Parameters] INNER JOIN ( [Analyses] INNER JOIN'+
'[' + convert(varchar(50), @TableName) + ']'+
'ON [Analyses].[PK Run Analyses] =' + '[' + convert(varchar(50), @TableName) + '].' + '[FK Run Analyses])'+
'ON [Parameters].[PK Run Parameters] = [Analyses].[FK Run Parameters]) ON [Wafer].[PK Wafer Parameters] = [Parameters].[FK Wafer Parameters]' +

'WHERE [Wafer].[Substrate Start Time] Between ''1/1/1993 0:0:0'' and ''1/07/2003 17:20:00''

ORDER BY [Analyses].[Analysis Step],
[Analyses].[Analysis Type],
[Analyses].[ReAnal Time],
[Parameters].[Sensor Type],
[Wafer].[Substrate Start Time],
[Wafer].[Lot ID],
[Wafer].[Tool ID],
[Wafer].[Chamber ID]'

exec (@SQLCode)
Fetch next from CursorTable into @TableName
end
close CursorTable
deallocate CursorTable

Sorry for this long post...So do you think the performance will be increased by spliting the [Detail] table or is there anything that I need to change...?I have built all the needed indexes...but I can not put files in different hard disk. Thanks alot for your help...I really appreciate it...
 
You can test the split and see if it helps. I would try a lot other techniques before using cursors and dynamic SQL. Here is one example that uses a temporary table. You may want to use additional temporary tables or table variables in SQL 2000 for other intermediate results.

Select
[Detail].[FK Run Analyses]
[Detail].[Detail Name],
[Detail].[Detail NumData],
[Detail].[Detail Min],
[Detail].[Detail Max],
[Detail].[Detail Avg],
[Detail].[Detail SDev]
Into #tmpdtl
Where [Detail].[Detail Name] IN ('2 amu', '44 amu')

Create unique clustered index on #tmpdtl([FK Run Analyses])

SELECT [Wafer].[Tool ID],
[Wafer].[Chamber Number],
[Wafer].[Chamber ID],
[Wafer].[Lot ID],
[Wafer].[Substrate ID],
[Wafer].[Substrate Start Time],
[Wafer].[Recipe Lookup ID],
[Parameters].[Sensor Number],
[Parameters].[Sensor Type],
[Parameters].[File PathName],
[Analyses].[Analysis When],
[Analyses].[Analysis Type],
[Analyses].[Analysis Step],
[Analyses].[ReAnal Time],
[Analyses].[ReAnal Recipe File PathName],
[Detail].[Detail Name],
[Detail].[Detail NumData],
[Detail].[Detail Min],
[Detail].[Detail Max],
[Detail].[Detail Avg],
[Detail].[Detail SDev]

FROM [Wafer]
INNER JOIN [Parameters]
ON [Wafer].[PK Wafer Parameters] = [Parameters].[FK Wafer Parameters]
INNER JOIN [Analyses]
ON [Parameters].[PK Run Parameters] = [Analyses].[FK Run Parameters])
INNER JOIN [#tmpdtl] Detail
ON [Analyses].[PK Run Analyses] = [Detail].[FK Run Analyses])

WHERE [Wafer].[Substrate Start Time] Between '1/1/1993 0:0:0' and '1/07/2003 17:20:00'

ORDER BY
[Detail].[Detail Name],
[Analyses].[Analysis Step],
[Analyses].[Analysis Type],
[Analyses].[ReAnal Time],
[Parameters].[Sensor Type],
[Wafer].[Substrate Start Time],
[Wafer].[Lot ID],
[Wafer].[Tool ID],
[Wafer].[Chamber ID]
Terry L. Broadbent - DBA
SQL Server Page:
If you want to get the best answer for your question read faq183-874.
 
Thanks Terry for your generous help...

I will try both solutions(spliting the table and yours Temp table) and see if it makes any difference...
 
Questions.
Of the millions of rows of [Detail] is [Detail Name] different in each? If [Detail Name] is repeated many times, it would make sense to normalize it into a DetailNames table and reference with a foreign key from [Detail].

Depending on how the data is most often retrieved, the clustered index on Detail.PK would not benifit from being clustered. A clustered index on [Detail Name] or if normalized DetailNameID should help.

It feels like all the "Order By's" are an expensive part of the query. If the columns referenced by the order by clause are indexed, will their natural retrieval be in order?

Would using a local partitioned view over the [Detail Name] column show performance benifit or hindrance? If partitioned into multiple tables the table sizes would be reduced, and depending on hardware could be split onto more than one hard drive.

SQL BOL 'Using Partitioned Views' doesn't give much discussion regarding performance issues.
 
Thanks NullTerminator for your reply...

There are around one thousand [Detail Name] of the millions of rows. So as you can see, there are many repeated rows of each [Detail Name]...That is reason why I want to split the [Detail] Table into many [Detail Name] table, which has same columns as the [Detail], but each [Detail Name] will have its own table...So that I can get rid of [Detail] table...

Yes, you are right, 'Order By'is an very expensive part of the query. But since I have multiple Joins, even if the columns referenced in the Order By clause are indexed, they still won't be in natural order after Joins...

There is some performance gain in retrieving data after split the table...the average gain is about 20 percent ... But it will slow down writing data into the table since for each [Detail Name], it has to look for if the [Detail Name] already has the table, otherwise, new table for [Detail Name] will be created...There are not much if not at all overall performance gain....I guess that is why SQL BOL did not give much discussion on this issue...
 
I think 'partitiong' would help. You could keep your current structure but partition the data to 'n' tables. You would have the maintenance nightmare of thousands or tables holding similar information. By declaring a constraint
"check detail_name between 'A%' and 'CZZ%'",
"check detail_name between 'D%' and 'FZZ%'",
"check detail_name between 'G%' and 'IZZ%'", etc. on the Detail_Name field for as many partitions ( tables ) as you want, MSSQL will handle most of the nastiness for you.

Retrieve with
select a,b,c from Detail_A_C
union all
select a,b,c from Detail_D_F
union all
select a,b,c from Detail_G_H
...

\0
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top