refreshing summary tables

r937 · Jun 13, 2003

question for the DBAs (among whom i cannot count myself):

what's the best strategy for refreshing summary tables in a sql server database?

the summary tables will be reloaded in toto, i.e. all rows removed (doesn't matter how) and a new set of rows inserted, using, for example, INSERT INTO or SELECT INTO with GROUP BY, etc.

the summary tables may require several columns to have indexes, but these could be populated after loading is complete

i know one strategy: drop then create the table, load it, then create the indexes

another strategy: don't drop/recreate, just truncate the table before loading

advantages? disadvantages? differences in speed? best practice?

please keep in mind i am not a DBA, and i have to be able to explain what i design to someone who is...

thanks

rudy

http://r937.com/

JohnDTampaBay · Jun 13, 2003

Would it be possible to create VIEWs on the base tables that aggregate the summary information that you need instead of refreshing a summary table? If so, the VIEW wouldn't need to be rebuilt at all. The aggregates in the VIEW would change as the data in the base tables changes.

Just a thought. Good luck!

--Angel [rainbow]

-----------------------------------
Every time I lose my mind, I wonder
if it's really worth finding.

r937 · Jun 13, 2003

thanks angel, that was my first reaction too, but views will likely not help achieve some of the benefits of summary tables

for example, if i want to translate codes into descriptions, from several different lookup tables, and allow for missing descriptions, which i want to coalesce into strings that say *missing* or *null* as the case may be, with multiple left outer joins, oh, and some of the data will be coming from various different sources, typically through consecutive INSERTs or perhaps one INSERT with a multiple-subselect UNION, and then combine the results with category/subcategory groupings, a many-to-one consolidation driven by another table, again with left outer joins, so that, for example, the Defects report will include all categories and say "None" (a good thing) where appropriate, cross-referenced against sales region...

... then i can hardly make these views available for random querying in the Summary Database by, ahem, management users, if such views will bring the server to its knees

if i had to run a hundred queries a day against a fifty-row table versus a hundred queries a day using insanely complex views against live production tables with hundred of thousands of rows, i have a suspicion the views are not going to perform, no matter that they offer up-to-the-second accuracy...

it's perfectly okay (in fact it's more or less assumed) that the data will be refreshed overnight, and what you can browse in the Summary Database will be a snapshot of the data as of last night...

i'm hoping for DBA-type opinions on how to populate the summary (and intermediate) tables most efficiently

JamesLean · Jun 13, 2003

To be honest, if the result is only going to be 50 rows then I wouldn't be indexing the table at all anyway, there's no need. In fact, if it's even a few hundred then the same applies.

So if there's no indexes then a simple TRUNCATE and INSERT new data is what I would do.

Do you know how big the result is going to be (how many rows)?

--James

r937 · Jun 13, 2003

thanks james, there will be several summary tables, the largest will probably be 50,000 rows

i've found an article that talks about temp tables, which i might use as intermeiate tables:

"... SELECT INTO should be avoided at all costs in your coding due to the locking it places on system objects while it determines how to build the table. Take the time to script the temporary table out and use a separate INSERT INTO to populate the table. I will qualify this with that you can use a SELECT INTO if it includes WHERE 1=0 to create a table in the quickest way possible, but don't do this just to save a few keystrokes."

"... test the need for a clustered-index on your temporary table. If the data set is large a cluster-index will speed the operations against the temporary table, but you have to weigh in the performance needs of creating that index and inserting into the table with a clustered-index. This is one of those methods that needs to be tested both ways with the largest data set you think will be placed into the temporary table before deciding on the index."

http://www.transactsql.com/Articles/temptabl.html

that's the type of info i was hoping for (e.g. "use INSERT SELECT rather than SELECT INTO, and try your queries on the largest tables to see if indexes are really needed...&quot

any other ideas and suggestions warmly welcomed...

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

refreshing summary tables

r937

Technical User

JohnDTampaBay

Programmer

r937

Technical User

JamesLean

Programmer

r937

Technical User

Similar threads

Part and Inventory Search

Sponsor