Comparing 2 database tables

138006 · Aug 24, 2004

Hi,
I have 2 database with same tables loaded from 2 different systems. The table structures and values are also expected to be same. I have to write a PERL program that will take table name as input, then output 2 files each from the 2 different databases and then compare all columns of each row. The output will be an excel (preferably) which will have all the rows and the columns where the discrepancy is present (if any). Please let me know how to do it using PERL. Any sample code will be helpful as well.

Thanks in advance.

parkers · Aug 24, 2004

You are correct that perl can be used for this type of task however you're asking for an off-the-shelf solution to a "custom" problem. I'm afraid that concise answers for these type of questions don't exist.

Break the problem into small pieces and tackle logically by identifying if you have issues with concept or code. Once you get to this stage someone will be able to point you in the right direction.

rob51383 · Aug 24, 2004

You should give some more information. Sounds like you have nothing as of yet. The question that comes to my mind is what database are you using. With oracle you can do this quarry:

select * from table1
minus
select * from table2

This will return only the records that are different.

rob51383 · Aug 24, 2004

parkers got it right. Sounds like your asking someone to do it for you (Which I doubt you are, thats just how it sounds). Give it a go and when you run into a problem come back. Like parker said, it is deffinately possible, and in fact is not really something too advanced so there will be people able to answer individual questions. Your problem is just to vague.

dmazzini · Aug 25, 2004

Some tips:

1. First check what are the databases management system (oracle, access, sybase..).

2. Establish a connection from you perl script to the databases. You must choose how is more easy to do the connection (DBI, ODBC, Win32..It easy to use these modules..

3. Test database connectivity with a simple select.

4. From the perl script you could create 2 database connections to the databases at the same time..I have done it before.

5. You can get the table name as the first argument of your perl script. When you run it:

perl myprogram.pl <TABLENAME>

$tablename=$ARGV[0]; #thus yu can get the table name .

6. Proceed doing a select to the table to the first database. Save it in a structure.It can be an array.
@result1.
Repeat the same for @result2.

7. So now you must compare the two arrays..Chek the array functions.

8. Get an final array with the "different data".

9. You could write the "different data" to Excel using the Module: use Spreadsheet::WriteExcel::Big; or save the data in csv format. Excel can open it.

So now, going to work on it!

Remember that the forum will help you, but you need to investigate and work very hard.

Good Luck!

138006 · Aug 25, 2004

Thanks all,
Now, I may have to compare around 400 columns between 2 tables. Can anyone suggest any best practice tips for that.

I initially thought of outputting the rsults of the 2 sql queries (same query but on 2 different databases --all databases are DB2 UDB) in 2 files and then comparing the 2 files and writing the output in another file. In that case each row becomes a string and then I have to create an array where each element is each column of the row
(I have to split each record of file based on a delimiter).

My concern is how effective and performant will this be.

As dmazzini suggested, its better maybe, to output the result of the SQLs in 2 arrays directly instead of a file and then 2 the array.

Please advise of the best practices in these type of cases.

PaulTEG · Aug 26, 2004

Both systems are to be identical, and assumed large tables ??

If this is the case, processing either line by line, or block by block is probably going to be more memory efficient (assuming they're ordered the same), rather than eating all the memory on the machine with 2 large tables.

using DBI fetchrow_array should make the actual comparisons quite easy

Code:

@fields=("field1","field2" ... "fieldn");
while (@array1=sth1->fetchrow_array) {
  @array2=sth2->fetchrow_array;
  $id=$array1[0];
  $loop=0;
  foreach (@array1) {
    if ($_ ne $array[2]) {
       print "record id\|$fields[$loop]\|[$array1[$loop]\|$array2[$loop]\n";
    }
    $loop++;
}

fetchrow_hashref - easier still, coz it should give you the name of the field, rather than it's ordinal position in the query, but from a processing perspective that's more work than what's needed

What type of output have you been asked to provide, in terms of an exception report - the id of the rows might be all that's required, depending on who asked the question, and then it's up to someone else to investigate.

DB2's with tables of over 400 columns, is it safe to assume that this is some form of fiscal audit policy?

HTH
--Paul

It's important in life to always strike a happy medium, so if you see someone with a crystal ball, and a smile on their face ... smack the fecker

138006 · Aug 26, 2004

Thanks Paul for your suggestion.

Your concern is just. around 400 columns and millions of rows.

Is PERL efficient/performant enough to handle this? Or we should think of C and UNIX shell scripts for a better result?

Since PERL is an interpreted language and C willbe an executable, wont C be a faster option?

Thanks.

PaulTEG · Aug 27, 2004

How often are you going to be running this query?
Daily,weekly, monthly?
How important is this audit to your business processes?
Is there a budget in place? is this a job for dedicated hardware, or can you run it from a desktop?

perl is more than able to handle the task, however, if, and only if, speed is a critical issue would I be looking to a compiled executable - and there may be issues there as well. I'm going to assume that this can't be run on either of the Database servers, so network throughput is going to be a bottleneck - Looks like it could just be a case of horses for courses.

The DBI prepares/compiles its SQL queries, dunno how that's done in C, or even what libraries to look for to accomplish this.

This is a Perl forum, so I'm probably a little biased ;-)

You can compile Perl into executables, though I've never done it, or felt the need to, I think it's mostly to obfuscate code, and manage releases than in terms of speed, though there must be some performance gain

Perl is going to be quicker to knock together, so it's probably best to at prototype your algorithm in Perl, and if speed is critical port your algorithm to C

HTH & Best Regards
--Paul

It's important in life to always strike a happy medium, so if you see someone with a crystal ball, and a smile on their face ... smack the fecker

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Comparing 2 database tables

138006

IS-IT--Management

parkers

Vendor

rob51383

Programmer

rob51383

Programmer

dmazzini

Programmer

138006

IS-IT--Management

PaulTEG

Technical User

138006

IS-IT--Management

PaulTEG

Technical User

Similar threads

Part and Inventory Search

Sponsor