×
INTELLIGENT WORK FORUMS
FOR COMPUTER PROFESSIONALS

Log In

Come Join Us!

Are you a
Computer / IT professional?
Join Tek-Tips Forums!
  • Talk With Other Members
  • Be Notified Of Responses
    To Your Posts
  • Keyword Search
  • One-Click Access To Your
    Favorite Forums
  • Automated Signatures
    On Your Posts
  • Best Of All, It's Free!
  • Students Click Here

*Tek-Tips's functionality depends on members receiving e-mail. By joining you are opting in to receive e-mail.

Posting Guidelines

Promoting, selling, recruiting, coursework and thesis posting is forbidden.

Students Click Here

Jobs

Perl: How to perfectly match specific data between two files and do comparison?

Perl: How to perfectly match specific data between two files and do comparison?

Perl: How to perfectly match specific data between two files and do comparison?

(OP)
I have two files (File A & File B) in same format as below. I would like to match certain pattern of data from both files and do matching. My coding below used very long time to generate result. Other than that, It is wrong somewhere which cause incomplete extraction. Any alternative methods or improvement?


I extracted each line name and score from both files and stored them in two output files. Each output file contains extracted name and score. At first, if score in File A is negative value, do ignore the specific line extraction. Else if score in File A is positive value, match name of File A with File B. There will be three conditions and three result reports generated (pass.rpt, fail.rpt and noCheck.rpt).

For those matched names, it will proceed to compare. If File A score > 50 and File B score > 40, print matched name, score from File A (score A) and score from File B (score B) to pass.rpt and pass_counter($pc) plus one for each comparison. Else if <50 and <40, print matched name, score A and score B to fail.rpt and fail_counter($fc) plus one.

Last condition is for those negative values from File A. If names from both files matched, print name, scoreA and score B to noCheck.rpt and noCheck_counter plus one.


File A

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Report : students A
-science
-math
-language.
Date : Fri Jul 19 17:00:31 2013
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Name science math lang. score
--------------------------------------------------------------------------
----
Jane_let [0] (sa) 58.78 r 66.15 0.00 -33

Alfert_pipe (sa) 74.72 r 92.72 0.00 82

Olive_pipe [8] (sa) 64.28 f 25.40 0.00 58

mass/excel/i60 86.21 r 59.90 0.00 68

Anne_let (sa) 51.98 f 12.69 0.00 -39

yuki/099/pipe 76.52 r 94.32 0.00 -82

frey/let/sa/y589 47.79 f 99.00 0.00 78

alan/excel/sa/y589 97.00 f 96.00 0.00 -70

..
..

File B

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Report : students B
-science
-math
-language.
Date : Fri Jul 19 17:00:31 2013
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Name science math lang. score
--------------------------------------------------------------------------
----
Ash_let [9] (sa) 58.78 r 66.15 0.00 33

Alfert_pipe (sa) 74.72 r 92.72 0.00 57

Olive_pipe [8] (sa) 64.28 f 25.40 0.00 20

mass/excel/i60 86.21 r 59.90 0.00 16

Sam_let (sa) 51.98 f 12.69 0.00 -39

yuki/099/pipe 76.52 r 94.32 0.00 82

frey/let/sa/y589 47.79 f 99.00 0.00 30

alan/excel/sa/y589 67.00 f 96.00 0.00 -90

..
..

coding:


use Getopt::Long qw(:config no_ignore_case);
use Data::Dumper;
use POSIX qw(floor);
use strict;
use warning;

my $orig = '';
my $new = '';

GetOptions('orig=s' => \$orig, 'new=s' => \$new);

if (!$orig|!$new) {
print "\n\t Help: test.pl -orig <file A> -new <file B>\n";
exit;
}

open (PASS, ">pass.rpt") || die "ERROR: cannot open";
open (FAIL, ">fail.rpt") || die "ERROR: cannot open";
open (NC, ">noCheck.rpt") || die "ERROR: cannot open";
open (t1, ">t1") || die "ERROR: cannot open";
open (t2, ">t2") || die "ERROR: cannot open";

my (@array, $line, $end1, $slack1, $b1, $THIS, @arr1, @arr2, @tmp1, @tmp2, @emp, @emp2, $data1, $data2,$emp1,$emp2,$emp3,$emp4,$ep1,$s1,$ep2,$s2,$ncc,$pc,$fc);

$ncc = 0;
$pc = 0;
$fc = 0;

fileA_ext();
fileB_ext();
check();

#_______________________________________________________________________________________________
sub fileA_ext() {

if ($orig =~ /\S+\.gz$/) {
open (FileA,"gunzip -c $orig |") || die "ERROR: can't read $orig\n";
} else {
open (FileA,"$orig") || die "ERROR: can't read $orig\n";
}

while (@array = <FileA>) {

foreach $line(@array) {

if ($line =~ m/(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)/) {


if ($line !~ m/\((sa)\)/) {

@arr1 = @emp;
next if ($line =~ m/Name/);
$name1 = "$1";
$score1 = "$12";

my $data1 = join(";",$name1,$score1);
push (@arr1, $data1);

}

if ($line =~ m/\((sa)\)/) {

@arr1 = @emp2;
@tmp1 = @emp;
next if ($line =~ m/Name/);
push (@tmp1, $line);
#print t "@tmp1\n";

foreach $line (@tmp1) {

if ($line =~ m/(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)/) {

my $name2 = "$1";
substr($name2, -13) = '';
my $score2 = "$12";

my $data1 = join(";",$name2,$score2);
push (@arr1, $data1);
$name2 = $score2 ="";
#print "@arr1\n\n";
}
}
}
print t1 "@arr1\n\n";
}
}
}
close (FileA);
}

#____________________________________________________________________________________________


sub FileB_ext() {

if ($new =~ /\S+\.gz$/) {
open (FileB,"gunzip -c $new |") || die "ERROR: $THIS can't read $new\n";
} else {
open (FileB,"$new") || die "ERROR: $THIS can't read $new\n";
}

while (@array = <FileB>) {

foreach $line(@array) {

if ($line =~ m/(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)/) {
#print "$line\n";

if ($line !~ m/\((sa)\)/) {

@arr2 = @emp;
next if ($line =~ m/Name/);
my $name3 = "$1";
my $score3 = "$12";

my $data2 = join(";",$name3,$score3);
push (@arr2, $data2);

}

if ($line =~ m/\((sa)\)/) {

@arr2 = @emp2;
@tmp2 = @emp;
next if ($line =~ m/Name/);
push (@tmp2, $line);
#print t "@tmp2\n";

foreach $line (@tmp2) {

if ($line =~ m/(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)/) {

my $name4 = "$1";
substr($name4, -13) = '';
my $score4 = "$12";

my $data2 = join(";",$name4,$score4);
push (@arr2, $data2);
$name4 = $score4 ="";
#print "@arr2\n\n";
}
}
}
print t2 "@arr2\n\n";
}
}
}
close (FileB);
}


sub check() {

foreach $data1 (@arr1) {
if ($data1 ne ""){

if ($data1 =~ m/(.*)\;(.*)/) {
$ep1 = $emp1;
$s1 = $emp2;
my $ep1 = "$1";
my $s1 = "$2";
#print r "$ep1 $s1\n\n";

foreach $data2 (@arr2) {
if ($data2 ne "") {

if ($data2 =~ m/(.*)\;(.*)/) {
$ep2 = $emp3;
$s2 = $emp4;
my $ep2 = "$1";
my $s2 = "$2";
#print R "$ep2 $s2\n";


if ( $ep1 eq $ep2 && $s1 =~ m/-/g) {

$ncc++;
#print NC "Total match: $ncc\n\n";
print NC "$ep1 $s1 $s2\n";
}

if ( $ep1 eq $ep2 && $s1 !~ m/-/g && $s1 > 50 && $s2 > 40) {

$pc++;
print PASS "$ep1 $s1 $s2\n";
}

if ( $ep1 eq $ep2 && $s1 !~ m/-/g && $s1 < 50 && $s2 < 40) {

$fc++;
print FAIL "$ep1 $s1 $s2\n";
}


}
}
}
}
}
}
print NC "\nTotal match: $ncc\n\n";
print PASS "\nTotal match: $pc\n\n";
print FAIL "\nTotal match: $fc\n\n";


}


expected result:

pass.rpt
---------------

Name scoreA scoreB
Alfert_pipe (sa) 82 57

fail.rpt
--------------

Olive_pipe [8] (sa) 58 20

mass/excel/i60 68 16

frey/let/sa/y589 78 30

noCheck.rpt
-------------

yuki/099/pipe -82 82

alan/excel/sa/y589 -70 -90

RE: Perl: How to perfectly match specific data between two files and do comparison?

Please post your code (and example data) between [code]...[/code] tags (there's a 'code' button in the reply pane).
It is a long piece of code to check, can you focus on a specific problem and a specific part of your code? (after doing some debugging by printing intermediate results?)
But first of all you should clarify the structure of your files:
-is the heading always terminated by a line of dashes?
-are there only name records after the heading?
-is it possible that duplicate names appear in the same file and, if yes, how to manage them?
-it appears that your name field is everything that appears before a number with format ##.##: is this number always in that format? (i.e. exactly 2 figures before the dot and exactly 2 after it?)
-it appears that you are not interested in anything showing after the name field except the last number in the line: is this correct?

http://www.xcalcs.com : Online engineering calculations
http://www.megamag.it : Magnetic brakes for fun rides
http://www.levitans.com : Air bearing pads

Red Flag This Post

Please let us know here why this post is inappropriate. Reasons such as off-topic, duplicates, flames, illegal, vulgar, or students posting their homework.

Red Flag Submitted

Thank you for helping keep Tek-Tips Forums free from inappropriate posts.
The Tek-Tips staff will check this out and take appropriate action.

Reply To This Thread

Posting in the Tek-Tips forums is a member-only feature.

Click Here to join Tek-Tips and talk with other members! Already a Member? Login

Close Box

Join Tek-Tips® Today!

Join your peers on the Internet's largest technical computer professional community.
It's easy to join and it's free.

Here's Why Members Love Tek-Tips Forums:

Register now while it's still free!

Already a member? Close this window and log in.

Join Us             Close