Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations bkrike on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Need Urgent Help with Perl Lookup & Substitution Routine 1

Status
Not open for further replies.

EricSilver

Technical User
Mar 18, 2008
19
US
Hello,

I am a new user having a problem getting what should be a simple routine to work.

What I am doing is opening an address file, and a state name file, so I can change state abbreviations, i.e. “AZ” to full state names, i.e. “Arizona.”

After the address file is opened, the state name file is opened. This file consists of three fields: 1.( Unique identifier; 2.) State Abbreviation; 3.) Full state name.

The routine compares the address file state abbreviation data value to the abbreviation field value in the state name file. If it matches, the Address File abbreviation data element is changed to the State Name File full name data element.

If the Address File abbreviation is not in the state name file, I want the routine to print “error” in place of the full state name. Unfortunately, that part is not working. Instead of printiong "ERROR" for one Address file record, it prints "ERROR" for all of them. Any assistance would be appreciated. Here is what I have:


## FILE LOCATIONS
$file='/File.txt'; ## THE ADDRESS FILE
$maplocation='/Map.txt'; ## THE STATE NAME LOOKUP FILE
$file2='File2.txt'; ## THE MODIFIED ADDRESS FILE


## OPEN ADDRESS FILE AND ADD CONTENTS TO AN ARRAY
open(FILE,"<$file")||die "Could not open $file";
@file=<FILE>;
close FILE;

## FOR EACH RECORD IN THE ADDRESS FILE, DO THE FOLLOWING
foreach $line (@file) {
@data=split(/t/,$line);

## CREATE VARIABLES CORRESPONDING TO ADDRESS FILE DATA
## (This step is not really necessary)
$d0=$data[0];
$d1=$data[1];
$d2=$data[2];
$d3=$data[3];
$d4=$data[4];
$d5=$data[5];
$d6=$data[6];
$d7=$data[7
$d8=$data[8];
$d10=$data[10];

## OPEN STATE NAMES FILE AND ADD CONTENTS TO AN ARRAY
open(MAP,"<$maplocation");
@entries = <MAP>;
close MAP;

## FOR EACH RECORD IN THE ADDRESS FILE, DO THE FOLLOWING
foreach $line2 (@entries) {
@fields=split(/,/,$line2);

## COMPARE ADDRESS FILE STATE ABBREVIATION DATA TO STATE FILE ABBREVIATION DATA. (THIS CODE WOKS PERFECTLY)

if ($d8 eq $fields[1]) {$d8=$fields[2]};

## $d8 is the address file abbreviation value; $fields[1]
## is the State File abbreviation value; and $fields[2] is
## the state file full name value.

## IF ADDRESS FILE STATE ABBREVIATION IS NOT PRESENT IN STATE NAME FILE, PRINT ERROR (THIS CODE FAILS):

if ($d8 eq $fields[1]) {$d8=$fields[2]} else {$d8=”error”};
}

## INSTEAD OF PRINTING "ERROR" FOR ONE RECORD, IT PRINTS ERROR FOR ALL OF THEM.

## WRITE OUTPUT TO FILE
$line= ”$d0”.”$d1”.”$d2”.”$d3”.”$d4”.”$d5”.”$d6”.”$d7”.”$d8”.”$d9”.”$d10”."\n";

};

open(DATA,">$file2");
print DATA (@file);
close DATA;
 
instead of:

Code:
if ($d8 eq $fields[1]) {$d8=$fields[2]};

## $d8 is the address file abbreviation value; $fields[1]
## is the State File abbreviation value; and $fields[2] is
## the state file full name value.

## IF ADDRESS FILE STATE ABBREVIATION IS NOT PRESENT IN STATE NAME FILE, PRINT ERROR  (THIS CODE FAILS):

if ($d8 eq $fields[1]) {$d8=$fields[2]} else {$d8=”error”};
}

you can use the ternary operator to compare and assign all at once:

Code:
$d8 = ($d8 eq $fields[1]) ? $fields[2] : 'error';

Please note, all questions posted here have the same priority: none. There is no urgent assistance provided to urgent problems.



------------------------------------------
- Kevin, perl coder unexceptional! [wiggle]
 
I'd suggest using a hash to store your state abbreviations
Code:
open(MAP,"<$file")||die "Could not open $file";
	while (<MAP>) {
		chomp;
		($code, $state, $desc)=spilt /\t/, $_;
		$states{$state}=$desc;
	}
close MAP;
and then when you want to check if it exists,
Code:
if (exists ($states{$lookup}) ) {
} else {
   print "Error: $lookup is not listed as a state\n";
}

Paul
------------------------------------
Spend an hour a week on CPAN, helps cure all known programming ailments ;-)
 
If that does not work the problem might be the way you are comparing the data using "eq" which means both strings have to be exactly equal: upper/lower-case and spaces (and everything else) included.

------------------------------------------
- Kevin, perl coder unexceptional! [wiggle]
 
Thanks for all the good replies. And I will refrain form the use of the word 'urgent'. :)

I tried doing this:
$d8 = ($d8 eq $fields[1]) ? $fields[2] : 'error';

and it changed every Address file state field to "error."

There should not be any case sensitivity issues since the state abbreviation cases match (Uppercase) in both files.

It also works fine in the negative:
if (!$d8 eq $fields[1]) {$d8=$fields[2]};

In the above, the state abbreviations are left alone.

I am mystified by why it fails when including 'else' in the code. I suspect it has something to do with having one "for" loop nested within the other. If I change the code to this:

if ($d8 eq $fields[1]) {$d8=$fields[2]} else {($d8 eq $fields[0])};

where $fields[0] is the unique identifier for each record of the State Names file, then every address record's state field will be populated with the $fields[0] value for the *Last* record in the State Names file (Wyoming).

With a straight if-then comparison, the two files are in synch; throw in the 'else' and the comparison appears to be based only on the last record of the State Names file.
 
There are some errors in your code: please be sure to copy and paste a working version and put it between [tt][ignore]
Code:
[/ignore][/tt] tags.
There is nothing misterious with your problem: you should get out of the inner for loop with [tt]last;[/tt] as soon as you find the matching state.
Your code has this and other efficiency and logic issues: come back if you want advice on how to write it in a neater and more efficient way. The suggestion by PaulTEG above is one first important point.

Franco
: Online engineering calculations
: Magnetic brakes for fun rides
: Air bearing pads
 
I am mystified by why it fails when including 'else' in the code.
The reason is that your code is doing this (pseudocode):
Code:
for each line in the address file {
   for each line in the state file {
      if it matches, swap it, else put error
   }
}
The problem is, if it finds a match it just ploughs on to the next iteration of the inner loop and overwrites the matched value with "error". The only matched values that will survive to appear in the final result are addresses which match the last line in the state file.

At the very least, you should change your script to exit the inner loop once a match is found. There's also no reason to re-populate the @entries array from scratch every time you go around the outer loop. Paul's suggestion of using a hash will save you having to loop through the state values at all.

-- Chris Hunt
Webmaster & Tragedian
Extra Connections Ltd
 
Thanks once again.

As background, my coding experience is essentially with Paradox/ObjectPal and in this very situation an “if match found then quit loop” command would have been used without thought. I’ll need to look at Perl from the same perspective.

I will redo the coding later today and report back. I truly appreciate the feedback. Thanks to everyone.
 
Building on PaulTEG's recommendation about hashes
Code:
use strict;
use warnings;

my %states;
my $col = 8;
my $map = "map.txt";

open(MAP, $map) or die "Can't open $map: $!"; 

while (<MAP>) {
  chomp;
  my (undef, $abbrev, $state) = split;
  $states{$abbrev} = $state;
}

close(MAP); 

while (<>) {
  chomp;
  my @columns = split /\t/;
  my $abbrev = uc($columns[$col]);
  $columns[$col] = exists $states{$abbrev} ? $states{$abbrev} : 'Error';
  print join("\t", @columns), "\n";
}

Steve

[small]"Every program can be reduced by one instruction, and every program has at least one bug. Therefore, any program can be reduced to one instruction which doesn't work." (Object::perlDesignPatterns)[/small]
 
Thanks again for all the good help. Below is simplified code for what I have done. I have used the hash as suggested, eliminating the inner 'for' loop, but am still not getting results.

It is correctly generating the desired "error" value for bogus state abbreviations, however, it populates every valid state field with "Wyoming" (the last record in State Names File).

Code:
open(MAP,"<$maplocation")||die "Could not open $maplocation";
    while (<MAP>) {
        chomp;
        ($code, $statename, $desc)=split /::/, $_;
        $states{$statename}=$desc;
    }
close MAP;


open(FILE,"<$file")||die "Could not open $file";
@file=<FILE>;
close FILE;
foreach $line (@file) {
@data=split(/\t/,$line);      

if (exists ($states{$data[8]}) ) { $data[8]=$desc } else { $data[8]=”Error” }; 

$line= ”$data[0]”.”$data[1]”.”$data[2]”.”$data[3]”.”$data[4]”.”$data[5]”.”$data[6]”.”$data[7]”.”$data[10]”.” \n";

}; 

open(DATA,">$file2");		
print DATA (@file);
close DATA;
 
change this line:

Code:
if (exists ($states{$data[8]}) ) { $data[8]=$desc } else { $data[8]=”Error” };

to:

Code:
if (exists $states{$data[8]} ) { $data[8]=$states{$data[8]} else { $data[8]=”Error” };

or what I prefer:

Code:
$data[8] = (exists $states{$data[8]}) ? $states{$data[8]} : 'Error';

You are mistakenly assigning $desc to $data[8], which must be the last value from the maplocation file, "Wyoming".



------------------------------------------
- Kevin, perl coder unexceptional! [wiggle]
 
Eric

This is why you should always use strict; at the start of your file. If you look back at my example, you can see that when I read the map file, I have been forced to declare the variables with my because of the strict pragma. Because they are declared inside the {} of the while loop, their scope is limited to the loop, so they can't be 'seen' by the rest of the program. So if I had tried to use $state outside of the loop as you did with $desc, the interpreter would have complained about it.



Steve

[small]"Every program can be reduced by one instruction, and every program has at least one bug. Therefore, any program can be reduced to one instruction which doesn't work." (Object::perlDesignPatterns)[/small]
 
I seem to remember reading somewhere that unless you use exists Perl will create a new hash entry on the fly for the 'not found' key, and set the value to undef. Perhaps someone can either corroborate this or shoot me down in flames...

Steve

[small]"Every program can be reduced by one instruction, and every program has at least one bug. Therefore, any program can be reduced to one instruction which doesn't work." (Object::perlDesignPatterns)[/small]
 
Yes, that's what it'll do, Steve, but so what? [tt]undef || "Error"[/tt] will enumerate to [tt]"Error"[/tt], which is what the OP wanted.

Maybe it's cos I'm an Oracle programmer used to the NVL() function, but it just seems cleaner to me that way than faffing around with [tt]exists[/tt].

The end result is the same though.


-- Chris Hunt
Webmaster & Tragedian
Extra Connections Ltd
 
In this trivial example, probably yes. But (resorts to theory to defend his position) I don't like the idea of an ostensibly read-only lookup adding another key to the hash, even if it does have an undefined value. Things like this can cause hard to find bugs in bigger programs.

TIMTOWTDI

Steve

[small]"Every program can be reduced by one instruction, and every program has at least one bug. Therefore, any program can be reduced to one instruction which doesn't work." (Object::perlDesignPatterns)[/small]
 
I don't like the idea of an ostensibly read-only lookup adding another key to the hash
That's not how it works. Nothing gets added to the hash, you just get an undef back if you ask for a non-existant value. Look, I'll prove it:
Code:
use strict;
use warnings;

my %test;
my $dummy;

$test{"one"} = "one";

print "number of items in the hash = " . keys(%test) . "\n";

# Let's see if this adds a row...
$dummy = $test{"two"} || "error";

print "dummy = $dummy, number of items in the hash = " . keys(%test) . "\n";
If you run that, you'll see that the number of values in %test remains constant at 1, even after you've supposedly "added" the undef value implied at $test{"two"}.

Of course, you should code in a manner you feel comfortable with, but believing that hash values are being created when they aren't might cause bugs in bigger programs.

-- Chris Hunt
Webmaster & Tragedian
Extra Connections Ltd
 
Yes, I'd run a simpler but just as damning example myself. But I think I've found what made me think this
perfunc said:
exists EXPR

Given an expression that specifies a hash element or array element, returns true if the specified element in the hash or array has ever been initialized, even if the corresponding value is undefined. [red]The element is not autovivified if it doesn't exist.[/red]
Based on this caveat, I'd kind of assumed that if you didn't use exist the element would be autovivified. That'll teach me...

Steve

[small]"Every program can be reduced by one instruction, and every program has at least one bug. Therefore, any program can be reduced to one instruction which doesn't work." (Object::perlDesignPatterns)[/small]
 

Appears the 'while' loop is matching the state abbreviation, but not associating it with the full state name, which suggests I need another array and/or 'for' loop within the while, which seems more convoluted than my original code. But if it works, I'll use it. Any thoughts?
 
I think autoviv only occurs when a reference is used and the reference is more than one level deep in the data structure. In which case no even "exist" will prevent the autovivification from occuring:

Code:
use strict;
use warnings;
use Data::Dumper;

use strict;
use warnings;

my $test;
my $dummy;

$test->{"one"}{1} = "one";

if (exists $test->{two}{2}) {}; 

print Dumper \$test;

$test->{two}{2} will jump into existance in the above code. So the simpler '||' assignment should work just fine for a single level hash or array.

------------------------------------------
- Kevin, perl coder unexceptional! [wiggle]
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top