Matching similar entries between two files 1

usheikh · Apr 25, 2006

Hi,

I am working on a script that will allow me to read from two files, compare the entries and output invalid entries.

TFILE contains entries such as:
11111_
11112_
11113_
11234_
12342_

FILE1 contains entries such as:
11111_A
11112_A
11113_A
11234_A
12342_A
11111_B
11112_B
11113_B
11234_B
12342_B

Now for every entry in TFILE there should be an entry beginning with the same number but ending in B in FILE1. If there is not such an entry in FILE1 then this be considered as an invalid entry and will be output to the screen stating that it is missing.

So far I have the following, but it is not working:

#!/usr/bin/perl -w

use strict;

my $trecord;

open(TFILE, "<tfile.txt") || die "Cannot open: $!";
open (FILE1, "<aaa.txt");
while ($trecord = <TFILE>) {
while (<FILE1>){
if (( $trecord =~ m/($trecord)/) && ( $trecord !~ m/(_B)/)){
print $trecord;
print "\n";
}
}
}

close(FILE1);
close (TFILE);

Currently when I run this program I get a long list of the first entry from TFILE.

Any assistance would be appreciated. Thank you.

PaulTEG · Apr 25, 2006

You're looping a whole file read within a file read, not very efficient.

Code:

open(TFILE, "<tfile.txt") || die "Cannot open: $!";
my %hashofTfile;
while (<TFILE>) {
  chomp;
  $hashofTfile{$_."B"}+=1;
}
close TFILE;
open (FILE1, "<aaa.txt");
while (<FILE1>){
 chomp;
 if ( exists ($hashofTfile{$_}){
   print $trecord."\n";
   }
}
close FILE;

Not tested, but it should give you a better way to check ;-)

HTH
--Paul

Paul
------------------------------------
Spend an hour a week on CPAN, helps cure all known programming ailments ;-)

usheikh · Apr 25, 2006

thank you paul. I have tried running but get syntax errors:

syntax error at ./redwood3.pl line 39, near "){"

line 39: if ( exists ($hashofTfile{$_}){

syntax error at ./redwood3.pl line 42, near "}"
line 42: }

Cant seem to correct this error.

usheikh · Apr 25, 2006

ive tried removing the brackets but no luck. any ideas?

thanks in advance!

usheikh · Apr 25, 2006

I've managed to rectify that error, there was a round bracket missing.

However, the script runs but all I get is a blank screen and then it returns to the prompt again. Any ideas on why this is happening?

usheikh · Apr 25, 2006

I dont quite understand this line that you have proposed. Why is it placed inside the while loop for TFILE? The entries containing B are actually inside FILE1. Was this intentional?

$hashofTfile{$_."B"}+=1;

Just in case I didnt explain myself clearly. There are two files TFILE and FILE1. TFILE contains entries like 11111_. A corresponding entry should be in place in FILE1 e.g. 11111_B. Therefore the first 6 digits are the same but the only difference is the B. Suppose there is an entry in TFILE 11123_ but NO corresponding entry in FILE1 e.g. 11123_B then the user will be alerted of this on screen after the script has run.

I hope that makes sense

brigmar · Apr 25, 2006

Try:

Code:

#!/usr/bin/perl -w
use strict;
my $trecord, $found;
open(TFILE, "<tfile.txt") || die "Cannot open: $!";
open (FILE1, "<aaa.txt");
while ($trecord = <TFILE>) {
  chomp($trecord);
  seek(<FILE1>, 0, 0 );
  $found = false;
  while (<FILE1> && !$found){
    chomp();
    $found = /^($trecord).+$/;
  }
  print "$trecord\n" unless $found;
}
close(FILE1);
close (TFILE);

Again.. haven't run it, and it's not efficient (should really drop FILE1 into an array and loop thru that array).

fishiface · Apr 25, 2006

Code:

# read file1 into hash
open(FILE1, 'file1') or die "$0: file1: $!";
my %valid = map { chomp; $_ => 1 } <FILE1>;
close FILE1;

# check tfile against hash
open (TFILE, 'tfile') or die "$0: tfile: $!";
print map {$_,"\n"} grep { chomp; $valid{$_.'B'} } <TFILE>;
close TFILE;

, based on Paul's code, does what I think you want. I've swapped the files around and collapsed his loops into maps which should be faster (fewer bytecodes).

The first map simply chomps each line and delivers key=>vale pairs to the hash. In a similar style, I use grep to do the hash lookups and the map is just a cheap way of sticking the newlines back.

Yours,

fish

["]As soon as we started programming, we found to our surprise that it wasn't as easy to get programs right as we had thought. Debugging had to be discovered. I can remember the exact instant when I realized that a large part of my life from then on was going to be spent in finding mistakes in my own programs.["]
--Maur

PaulTEG · Apr 25, 2006

Hello ghoti, not seen you around for a while, have a star for map ;-)

Paul
------------------------------------
Spend an hour a week on CPAN, helps cure all known programming ailments ;-)

usheikh · Apr 26, 2006

thanks guys for your replies.

Fish, I have run the program and am now getting output on the screen....great!!

However, the list returned shows the numbers that DO have a corresponding R in the other file, rather than those without an R. Is it possible to easily modify? I havn't worked with map before so not sure about this.

fishiface · Apr 26, 2006

Hi Paul, 'been stupidly busy (will be 'til summer) but still like to look in occasionally.

usheikh - simply reverse the test in grep by sticking a "!" in front of it, like this:

Code:

grep { chomp; [red][b]![/b][/red]$valid{$_.'B'} } <TFILE>;

Yours,

f

["]As soon as we started programming, we found to our surprise that it wasn't as easy to get programs right as we had thought. Debugging had to be discovered. I can remember the exact instant when I realized that a large part of my life from then on was going to be spent in finding mistakes in my own programs.["]
--Maur

usheikh · Apr 26, 2006

thanks so much!!! all working now.

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Matching similar entries between two files 1

usheikh

Programmer

PaulTEG

Technical User

usheikh

Programmer

usheikh

Programmer

usheikh

Programmer

usheikh

Programmer

brigmar

Programmer

fishiface

IS-IT--Management

PaulTEG

Technical User

usheikh

Programmer

fishiface

IS-IT--Management

usheikh

Programmer

Similar threads

Part and Inventory Search

Sponsor