Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Chriss Miller on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Merging Two Databases 1

Status
Not open for further replies.

bboffin

Programmer
Nov 26, 2002
553
GB
I have two databases with identical schemas. Each is used by the same application to run a document archiving system for two different companies.

Both companies have been acquired by a third company that wishes to merge the databases into a single database and operate as a single company.

In general all primary keys are called ID and are INT IDENTITY columns and most of the values overlap between the two databases.

I have a broad plan as to how to attack this problem:

1. One of the databases (DB1) will not be altered
2. Identify all cases where the ID values for each table overlap
3. Define an offset to the value of the ID in the 2nd database (DB2) that will take it well beyond the current highest value in DB1
4. Add these offsets to the ID values and all the foreign keys that refer to them in a test copy of DB2
5. Verify that the test copy of DB2 still operates correctly and fix any problems
6. Merge the test copy of DB2 with a test copy of DB1 and verify that the merged database now supports both companies' data correctly
7. Apply the process defined in step 4 to the live DB2 and allow it to run live unmerged for a period to be sure that all problems have been found
8. Finally merge the live versions of DB1 and DB2

This is a much simplifed version of what I will actually have to do but I am seeking advice as to whether this is the best approach and whether there are any tools available to assist.

I am particularly concerned as to the best way of doing steps 4 and 6 as foreign key contraints can make this very tricky. I am quite tempted to dump out all the data to flat files using bcp, adjust the values there and then bulk load all the data back into an empty database but I would appreciate any thoughts on this matter.

I'm not necessarily looking for a SQL only solution. I am quite happy to write utility applications to actually do the some of the work, unless someone can point me at some tools that will do the job for me.



Bob Boffin
 
I would start with a good solid list of all the relationships in the database. I would probably put these in a database with the table names and field names and an order by column (so that you can make sure the child tables that have child tables are fixed after the parent tables) and an updated_when column to fill in as each is fixed.

If you set up cascade update to all tables where there are foreign keys, the id field will update automatically when you change it.

Or alternatively. you could script all the foreign key relationships. Rename the primary key and drop it as an identity. Set up a new identity with a see of a number significantly higher than the database you aren't changing. Then use your table (above) to update the foriegn key fields to this value for all the records. This way you preserve the old values in the database in case you need them later (for instance if you have sent out reports or whatever with these values in the past you may need to look up the record that the report was talking about when researching a problem with a customer.) Once you are finished, then run the script to add back all the FK relationships. I'd probably add these old PK fileds to the database you are going to merge to as ID_Field_Old if you think that customers will be calling up and asking for things by order number and that was the old key.



"NOTHING is more important in a database than integrity." ESquared
 
Thanks for that. It's given me some ideas. I know that not all the necessary foreign keys are currently defined so the first option may not be applicable.

The second option looks much better and the idea of having the original ID value left around for some tables may be useful. Fortunately there are very few ID fields visible to the clients. You picked on just about the only one, the Order ID :)

Clients usually refer to items by by description or barcode neither of which are primary keys although barcodes, where assigned to an item, are unique and different between the two databases.



Bob Boffin
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top