×
INTELLIGENT WORK FORUMS
FOR COMPUTER PROFESSIONALS

Log In

Come Join Us!

Are you a
Computer / IT professional?
Join Tek-Tips Forums!
  • Talk With Other Members
  • Be Notified Of Responses
    To Your Posts
  • Keyword Search
  • One-Click Access To Your
    Favorite Forums
  • Automated Signatures
    On Your Posts
  • Best Of All, It's Free!
  • Students Click Here

*Tek-Tips's functionality depends on members receiving e-mail. By joining you are opting in to receive e-mail.

Posting Guidelines

Promoting, selling, recruiting, coursework and thesis posting is forbidden.

Students Click Here

Jobs

Initial Load Strategies

Initial Load Strategies

Initial Load Strategies

(OP)
In my company we are looking for the best strategy for our initial load to the datawarehouse. Will appreciate if people can share different strategies that they have used to accomplish this goal. We are moving data worth 3 years and we have something like this..

Tab1 - 21 Million
Tab2 - 28 Million
Tab3 - 122 Million
Tab4 - 65 Million
Tab5 - 65 Million

Tab 1 ( one to many ) Tab 2
Tab 2 ( One to Many ) Tab 3
Tab 3 ( many to one )  Tab 4
Tab 3 ( Many to one )  Tab 5

Main Target table is at the Tab 3 level. There are other targets fed from other tables but this target is the most difficult to load as it needs some columns from all the 5 tables.

What would be the best strategy for one time load for this target. I tried joining these 5 tables at the source database and then put a date condition on Tab 2 (where create date is between d1 and d2) types, to create smaller loads.However even one month data is taking too long. What would you recommend ?

Is sorting and merging in files is better alternative or moving the source tables to another sataging database and then having a large temp segment to merge and sort to achieve the join ?
 

RE: Initial Load Strategies

First thing that comes to mind is:
how long is too long? You are talking about an intial load, so it may be an option to let that processing take a few days and meanwhile capture (and save) the first incremental loads, to be processed immediately after the initial load.

How many fields of the tables have to be put together in the target? Can you get rid of all fields you don't need?
Do you need ALL records from tab1, tab2, tab4, tab5? or can you discard some of them?


I would probably make an intermediate resultset of the join of tab1 and tab2

What I have done at some time is create an intermediate resultset, with the same structure as the target and use some updates to get all columns right.
In the first step I would join tab3 to tab4 and tab5, next update the intermediate table with the values from tab1 and tab2.
Whether this will work, is dependend on the available resources.
 

RE: Initial Load Strategies

(OP)
Thanks Hans63,

How long = for one month worth of data where I just moved few fields from all the tables (not all the fields that I would need in actual load) - mapping took 5 and half hours to finish and moved 5 million rows. I used date between (d1 and d2)on tab2 to get one month. I will try your suggestion of joining Tab3, Tab4 and Tab5 and then updating with Tab1 and Tab2.

I will be moving this data to ODS tables as well as to Star Schema. Should these tables be partitioned in both the places..Will it help for DW performance later ?

RE: Initial Load Strategies

If you have the storage spac available you might look at staging your data for the intial load.  Export and Load 1 month of data from each table to a seperate set of tables.  No joins just a straight table to table copy of the columns you need.  You can index these tables as needed to support the joins required.  load from these stage tables.

Paul
---------------------------------------
Shoot Me! Shoot Me NOW!!!
                           - Daffy Duck

RE: Initial Load Strategies

Despite that you have identified referential integrity between several of the tabs, loading of data is best accomplished with a minimum of keys present. It's probably better to load the data, then build the relationships and key structure. So, each table can be loaded initially independent of the others.  If your data is clean, you should not get problems with the indexing and linkage. If you do, you would've had the same problem loading with the index in place so you still need to fix the data. But in the scenario I suggest, you have already loaded the data, you only need to build the relationships.  

-------------------------
The trouble with doing something right the first time is that nobody appreciates how difficult it was - Steven Wright

RE: Initial Load Strategies

Hi JTamma,

have you looked at the access-paths for your queries? In Oracle you can ask the database for information on the execution plan and use that to identify why your query is taking so long (e.g. does it make full table scans, does it use a logical starting point or are you missing an index). I don't know if other RDBMS's have similar features.
But I think that in 5.5 hours you should be able to load much more than 5 million rows. (taken that your hardware is able to deal with the amount of data)

When you try the update solution, make sure to disable as much logging as possible as it would really slow the process.

Good luck,
Hans

RE: Initial Load Strategies

(OP)
Thanks to all of you for your suggestions. It looks like we can get extra storage to stage all tables as it is and I would be able to use some of these suggestions.

Hans63 - I checked my query and it is using proper index and is being driven by the table that has the filter on dates. Still it needs three nested loops. No one here has a clue how to make this faster now (without staging tables).

RE: Initial Load Strategies

Hi Jtamma,

As you probably know, nested loop are killing your performance. It can be very hard to find out why exactly the RDBMS is choosing this particular strategy. It may have to do with (still) a missing index, but it may also depend on some RDBMS parameters. Ihave been puzzling on similar situations for quite osme time. It made me feel like I don't understand at all how a RDBMS works.

Extra storage may provide you some breathing space, but does not imply you performance problem has gone.
really solving this calls for intimate knowledge of the specific datamodel, database parameters and some creativity.
From your previous posts I think you should be able to come up with a good solution.

Succes.

RE: Initial Load Strategies

(OP)
Problem here is DBAs think that it is the best strategy Oracle can choose in the situation and no one is working on ways for how to avoid nested loops. Till now I never bothered to look into the Optimizer world and now find myself digging into oracle performance manual.
Anyways Thanks Hans63 - for your moral booster comment. I will update the post for others benefit if some smarter solution comes up.

RE: Initial Load Strategies

Jtamma,

Try to let the DBA explain to you what a nested loop really does. Maybe he can see why nested loops on these amounts of records are bad. (On small sets, it is a viable solution.)

If that does not work, create a simple query, that you can easily change so that the query plan changes from nested loops to index scan (or even a full table scan), hash-join, or some other join. Maybe you can look at the asktom website for samples on this. http://asktom.oracle.com

Good luck with this situation.

RE: Initial Load Strategies

(OP)
Can't thank you enough Hans63. Your comment that Nested loop is the problem, gave me the confidence to go ahead and play around with my newly acquired knowledge on optimization. Even before I read your latest reply, I reduced time to 45 min from 6 hours with hash join and with Merge hint to 10 minutes.

Thanks.

Red Flag This Post

Please let us know here why this post is inappropriate. Reasons such as off-topic, duplicates, flames, illegal, vulgar, or students posting their homework.

Red Flag Submitted

Thank you for helping keep Tek-Tips Forums free from inappropriate posts.
The Tek-Tips staff will check this out and take appropriate action.

Reply To This Thread

Posting in the Tek-Tips forums is a member-only feature.

Click Here to join Tek-Tips and talk with other members!

Close Box

Join Tek-Tips® Today!

Join your peers on the Internet's largest technical computer professional community.
It's easy to join and it's free.

Here's Why Members Love Tek-Tips Forums:

Register now while it's still free!

Already a member? Close this window and log in.

Join Us             Close