Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Shaun E on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Verify data from large files

Status
Not open for further replies.

jaytco

Vendor
Dec 13, 2001
88
US
I have two csv files that are greater that 250K lines and would like to compare some data values. I tried to read file2 into an array and grep for each value of file1…works, but is very slow and inefficient. Both files can be sorted on the data value to be verified.

Here is what I am thinking about doing, but not sure how to do the file handle to step to the next line. (This is just rough sudo code)
Code:
While <FILE1>
   $linefile1 = $_
   ($valF1a,$valF1b) = split(/,/, $linefile1)

   ##Not sure how to do <FILE2>
   <FILE2>
   $linefile2 = $_
   ($valF2a,$valF2b) = split(/,/, $linefile2)



   ## Here is the logic I would like to use
   if ($valF2a < $valF1a) {
      # Not sure if I have a match
      Read next line of <FILE2>

   } elsif ( $valF2a == $valF1a ) {
      # Had a match
      Print "Match on $valF1a\n"
      Read next line of <FILE1>

   } elseif ($valF2a > $valF1a) {
      # There was no match for $valF1a
      print "$valF1a no match in file2"
      Read next line of <FILE1>
   }
 
I'm more apt to just throw all of the records into a database, but something like this maybe?

Code:
open (FILE1, "csv1.csv");
open (FILE2, "csv2.csv");
#start parsing file 1
while ($linefile1 = <FILE1>)
{
    #grab the values of file 1's current line
    ($valF1a, $valF1b) = split(/\,/, $linefile1);
    #Start parsing file 2
    while ($linefile2 = <FILE2>)
    {
        #grab the values of file 2's current line
        ($valF2a, $valF2b) = split /\,/, $linefile2);
         #Verify all records from file 2 against the
         #the single record in file 1.  This might
         #still run pretty long though
    }
    #done processing this line of file 1
    #move to next line
}

close (FILE2);
close (FILE1);

- Rieekan
 
Have you tried using a hash, for example (sorry, pseudo code only, don't have time to work on a full script)
Code:
use strict;
use warnings;

my (%one, %two);
my @missing;

open FILE1, "array1"
	or die "Can't open file, $!";
while (<FILE1>) {
	chomp;
	$one{$_} = " ";
}

open FILE2, "array2"
	or die "Can't open file, $!";
while (<FILE2>) {
	chomp;
	$two{$_} = " ";
}

foreach my $element (keys %two){
	push @missing, $element if !exists $one{$element};
}
foreach my $element (keys %one){
	push @missing, $element if !exists $two{$element};
}

foreach my $element (@missing){
	print "\t$element\n";
}
Would like to know if it works!!

Rob Waite
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top