Hi,
I just started writing Perl programs,I want to compare two files but my problem is they are so huge(more than 4 million lines in each) and I don't want to do exactly match, I have written a script but it is so slow and it takes more than a few days and I think Perl is much faster than it and something should be wrong in my program.
First I have read files and then push them into hashes and then for each line, first I split my lines to fields and then apply my matching rules, for example first field and second field should be exactly match and third field can have 1 tolerance (for example if I have 4 in third field of first file, it is acceptable if I have 3 or 4 or 5 in second file).
And my other problem is: if there is 2 records in second file that match with 1 record in first one I want to match just one of them,on the other hand I want one by one matches.
I need help to solve it, and I have written this script but it is not complete on its comparing rules.
#!/usr/bin/perl -w
open(mci_1_3_, '/ictedrin/ICTPRD/CdrExtract/26_feb/mednet_2.txt') ||
die "open: $!";
while(<mci_1_3_>){
chomp;
$lines_1{$_}++;
}
close(mci_1_3_);
print "first file was read and push to \n";
open(itc_3_1, '/ictedrin/ICTPRD/CdrExtract/26_feb/ITC_MEDNET_1.txt') ||
die "open: $!";
while(<itc_3_1>){
chomp;
$lines_2{$_}++;
}
close(itc_3_1);
print "second file was read and push to \n";
$same=0;
$not_same=0;
foreach $key_1(keys %lines_1) { # once for each key of %fred
$flag=0;
my @lines_1 = split(/,/, $key_1);
foreach $key_2(keys %lines_2) { # once for each key of %fred
if ($flag==1){
last;}
my @lines_2 = split(/,/, $key_2);
if($lines_2{$key_2}==2){
next;}
if(substr($lines_1[1],1) eq $lines_2[0] && substr($lines_1[2],1) eq substr($lines_2[1],2)){
$lines_2{$key_2}\n";
$lines_2{$key_2}='2';
$same ++;
$flag=1;
print " they are same:\n $key_1 $key_2\n";
print " ############ keys are :\n $lines_1{$key_1} $lines_2{$key_2}\n";
}
}
if ($flag==0){
$not_same ++;}
}
print "we have $same same lines\n";
print "we have $not_same not same lines\n";
Thanks in advance
I just started writing Perl programs,I want to compare two files but my problem is they are so huge(more than 4 million lines in each) and I don't want to do exactly match, I have written a script but it is so slow and it takes more than a few days and I think Perl is much faster than it and something should be wrong in my program.
First I have read files and then push them into hashes and then for each line, first I split my lines to fields and then apply my matching rules, for example first field and second field should be exactly match and third field can have 1 tolerance (for example if I have 4 in third field of first file, it is acceptable if I have 3 or 4 or 5 in second file).
And my other problem is: if there is 2 records in second file that match with 1 record in first one I want to match just one of them,on the other hand I want one by one matches.
I need help to solve it, and I have written this script but it is not complete on its comparing rules.
#!/usr/bin/perl -w
open(mci_1_3_, '/ictedrin/ICTPRD/CdrExtract/26_feb/mednet_2.txt') ||
die "open: $!";
while(<mci_1_3_>){
chomp;
$lines_1{$_}++;
}
close(mci_1_3_);
print "first file was read and push to \n";
open(itc_3_1, '/ictedrin/ICTPRD/CdrExtract/26_feb/ITC_MEDNET_1.txt') ||
die "open: $!";
while(<itc_3_1>){
chomp;
$lines_2{$_}++;
}
close(itc_3_1);
print "second file was read and push to \n";
$same=0;
$not_same=0;
foreach $key_1(keys %lines_1) { # once for each key of %fred
$flag=0;
my @lines_1 = split(/,/, $key_1);
foreach $key_2(keys %lines_2) { # once for each key of %fred
if ($flag==1){
last;}
my @lines_2 = split(/,/, $key_2);
if($lines_2{$key_2}==2){
next;}
if(substr($lines_1[1],1) eq $lines_2[0] && substr($lines_1[2],1) eq substr($lines_2[1],2)){
$lines_2{$key_2}\n";
$lines_2{$key_2}='2';
$same ++;
$flag=1;
print " they are same:\n $key_1 $key_2\n";
print " ############ keys are :\n $lines_1{$key_1} $lines_2{$key_2}\n";
}
}
if ($flag==0){
$not_same ++;}
}
print "we have $same same lines\n";
print "we have $not_same not same lines\n";
Thanks in advance