Need Urgent Help with Perl Lookup & Substitution Routine 1

EricSilver · Apr 6, 2008

Hello,

I am a new user having a problem getting what should be a simple routine to work.

What I am doing is opening an address file, and a state name file, so I can change state abbreviations, i.e. “AZ” to full state names, i.e. “Arizona.”

After the address file is opened, the state name file is opened. This file consists of three fields: 1.( Unique identifier; 2.) State Abbreviation; 3.) Full state name.

The routine compares the address file state abbreviation data value to the abbreviation field value in the state name file. If it matches, the Address File abbreviation data element is changed to the State Name File full name data element.

If the Address File abbreviation is not in the state name file, I want the routine to print “error” in place of the full state name. Unfortunately, that part is not working. Instead of printiong "ERROR" for one Address file record, it prints "ERROR" for all of them. Any assistance would be appreciated. Here is what I have:

## FILE LOCATIONS
$file='/File.txt'; ## THE ADDRESS FILE
$maplocation='/Map.txt'; ## THE STATE NAME LOOKUP FILE
$file2='File2.txt'; ## THE MODIFIED ADDRESS FILE

## OPEN ADDRESS FILE AND ADD CONTENTS TO AN ARRAY
open(FILE,"<$file")||die "Could not open $file";
@file=<FILE>;
close FILE;

## FOR EACH RECORD IN THE ADDRESS FILE, DO THE FOLLOWING
foreach $line (@file) {
@data=split(/t/,$line);

## CREATE VARIABLES CORRESPONDING TO ADDRESS FILE DATA
## (This step is not really necessary)
$d0=$data[0];
$d1=$data[1];
$d2=$data[2];
$d3=$data[3];
$d4=$data[4];
$d5=$data[5];
$d6=$data[6];
$d7=$data[7
$d8=$data[8];
$d10=$data[10];

## OPEN STATE NAMES FILE AND ADD CONTENTS TO AN ARRAY
open(MAP,"<$maplocation");
@entries = <MAP>;
close MAP;

## FOR EACH RECORD IN THE ADDRESS FILE, DO THE FOLLOWING
foreach $line2 (@entries) {
@fields=split(/,/,$line2);

## COMPARE ADDRESS FILE STATE ABBREVIATION DATA TO STATE FILE ABBREVIATION DATA. (THIS CODE WOKS PERFECTLY)

if ($d8 eq $fields[1]) {$d8=$fields[2]};

## $d8 is the address file abbreviation value; $fields[1]
## is the State File abbreviation value; and $fields[2] is
## the state file full name value.

## IF ADDRESS FILE STATE ABBREVIATION IS NOT PRESENT IN STATE NAME FILE, PRINT ERROR (THIS CODE FAILS):

if ($d8 eq $fields[1]) {$d8=$fields[2]} else {$d8=”error”};
}

## INSTEAD OF PRINTING "ERROR" FOR ONE RECORD, IT PRINTS ERROR FOR ALL OF THEM.

## WRITE OUTPUT TO FILE
$line= ”$d0”.”$d1”.”$d2”.”$d3”.”$d4”.”$d5”.”$d6”.”$d7”.”$d8”.”$d9”.”$d10”."\n";

};

open(DATA,">$file2");
print DATA (@file);
close DATA;

EricSilver · Apr 8, 2008

... what I prefer:
$data[8] = (exists $states{$data[8]}) ? $states{$data[8]} : 'Error';

You are mistakenly assigning $desc to $data[8], which must be the last value from the maplocation file, "Wyoming".

Works perfectly now! Thanks so much for all the good feedback.

I Will eventually need to apply this lookup logic to additional files, but I do not anticipate too many problems.

EricSilver · Apr 9, 2008

Out of curiosity, would this also work with wildcards?

For example, if I wanted all state abbreviations that began with "A" -- AZ, AL, AK, AR, -- all return "Arizona" as the $data[d8] value.

Would this code accommodate the use of "." or "^" and other wildcard characters?

Code:

$data[8] = (exists $states{$data[8]}) ? $states{$data[8]} : 'Error';

EricSilver · Apr 9, 2008

Actually, let me clarify that.

If a state abbreviation field had one or more extra characters, i.e. "AZ" was written as AZX in the lookup file, and I wanted make sure it was interpreted as "AZ" using a wildcard, could that work?

I dont see how "^AZ" could be incorporated into this code.

Code:

$data[8] = (exists $states{$data[8]}) ? $states{$data[8]} : 'Error';

KevinADC · Apr 9, 2008

It sounds possible but I guess the trick is to make it work for all states. Probably the best time to do that though would be while reading in the file that has the state abbreviations, not afterwards.

------------------------------------------
- Kevin, perl coder unexceptional! [wiggle]

EricSilver · Apr 9, 2008

It sounds possible but I guess the trick is to make it work for all states. Probably the best time to do that though would be while reading in the file that has the state abbreviations, not afterwards.

Exactly what I am thinking, between lines 5 and 6 below.

Code:

open(MAP,"<$maplocation")||die "Could not open $maplocation";
    while (<MAP>) {
        chomp;
        ($code, $abbrev, $fullname)=split /::/, $_;
        $states{$abbrev}=$fullname;
    }
close MAP;

KevinADC · Apr 9, 2008

If the only problem is extra characters and not something else:

Code:

        ($code, $abbrev, $fullname)=split /::/, $_;
        $abbrev = substr($abbrev,0,2); 
        $states{$abbrev}=$fullname;

------------------------------------------
- Kevin, perl coder unexceptional! [wiggle]

EricSilver · Apr 9, 2008

By "something else" what do you mean? More than one or two extra characters?

For ewxample, would the code you just submitted apply if the $abbrev field was: "The quick brown fox" and the comparison field was: "The quick brown fox chases cars"?

stevexff · Apr 9, 2008

A hash uses key-value pairs. It works by taking the key, running it through a hashing algorithm to produce a number, and then using that number as an index to store the value at a memory location. This explains why hash lookups are so fast, and also why the keys function doesn't guarantee what order they will be returned in. It also means that they don't support any kind of wildcarding as you must have the whole key to support the hash lookup.

Your best bet is to standardise the key in some way before you store it in the hash, and also before you look it up. So for example you could take the first two characters only (as KevinADCs example shows) and convert them to upper case (as I did in my original post) to make the process more robust.

Steve

[small]"Every program can be reduced by one instruction, and every program has at least one bug. Therefore, any program can be reduced to one instruction which doesn't work." (Object:erlDesignPatterns)[/small]

KevinADC · Apr 9, 2008

By "something else" what do you mean? More than one or two extra characters?

By "something else" I meant you are not needing to do anything except get the first two letters for the state abbreviation. The code I posted just returns the first two characters in the $abbrev variable, so if those first two characters can not safely be used to create the hash key my suggestion would not work. But if you just have stuff like AZX instead of AZ than you will be fine.

I would also convert the state "keys" to all lower case or all upper case as Steve mentions above to normalize the hash keys so you always know what you are working with: AZ or az for example, instead of some being AZ and some being az or whataver.

------------------------------------------
- Kevin, perl coder unexceptional! [wiggle]

EricSilver · Apr 10, 2008

I plan to get back to it later today, using something like this (looks OK here, but won't know for sure until I try it):

Code:

open(MAP,"<$maplocation")||die "Could not open $maplocation";
    while (<MAP>) {
    chomp;
    ($code, $abbrev, $fullname)=split /::/, $_;

     $ab = “$abbrev”;
     $ab =~ s/$ab/$ab#WILDCARD CHAR#/; #Everything after the characters in $ab is ignored/considered valid
     $abbrev = $ab;
     $states{$abbrev}=$fullname;
    }
close MAP;

EricSilver · Apr 10, 2008

Is there a means of editing/deleting posts? In my previous post, I have this backwards:

Code:

$ab = “$abbrev”;
$ab =~ s/$ab/$ab#WILDCARD CHAR#/;

Since $abbrev is the reference value, what I have there is wrong. The submitted value ( $data[8] ) is what needs to be wildcarded.

KevinADC · Apr 10, 2008

You can't delete or edit posts. Make sure to use the "Preview Post" button and check your posts for any errors or changes before finally clicking on the submit button. In the preview screen there is an "Edit Post" button you use to makes edits.

------------------------------------------
- Kevin, perl coder unexceptional! [wiggle]

EricSilver · Apr 10, 2008

OK, what am I missing here? The abbreviated states lookup are all uppercase, 2-characters, except Arizona, which, for testing purposes I have formatted like this:

AZABBCCDDDGGGEE

The below code works fine for all states, except Arizona which is generating an error.

Code:

open(MAP,"<$maplocation")||die "Could not open $maplocation";
    while (<MAP>) {
        chomp;
($code, $abbrev, $fullname)=split /::/, $_;
$states{$abbrev}=$fullname;
 }
close MAP;

open(COLS,"<$file")||die "Could not open $file";
@file=<COLS>;
close COLS;
foreach $line (@file) {
@data=split(/\|\|/,$line);      

########################### WILDCARD

$d8 =”$data[8]”;
$d8 =~ s/$abbrev.*//g; 

########################### WILDCARD

$data[8] = (exists $states{$d8}) ? $states{$data[8]} : 'Error';

KevinADC · Apr 10, 2008

I thought we already cleared up this part of your question.

Code:

        ($code, $abbrev, $fullname)=split /::/, $_;
        $abbrev = substr($abbrev,0,2);
        $states{$abbrev}=$fullname;

------------------------------------------
- Kevin, perl coder unexceptional! [wiggle]

EricSilver · Apr 10, 2008

What I mean is, what if the field length is variable, i.e. the full state name field, where each text string would be a different length,and the ($abbrev,0,2) would not apply? Is there a wildcard that could be used in such situations?

Here is my previous question from April 9th:

For ewxample, would the code you just submitted apply if the $abbrev field was: "The quick brown fox" and the comparison field was: "The quick brown fox chases cars"?

EricSilver · Apr 10, 2008

Some further clarification:

Say I have a lookup field that contains:
"The Box is Blue" ($abbrev)

I would want to match it exactly, but I know some comparison data ($data[8]) will contain:

"The Box is Blue and Large"

Therefore, I would want $abbrev to be "$abbrev.*" (The Box is Blue.*) so if there are erroneous characters in the comparison string, no error will be generated so long as everything before the ".*" matches.

prex1 · Apr 11, 2008

EricSilver, you are messing up things, making your point and position unclear.
Let me try to clear up your question, to check if I'm correct:
-you have a maplocation lookup file where the state abbreviations are all correct: 2 uppercase letters everywhere, and that's fine
-now you have an address file where the state abbreviations may not exactly correspond to those in the lookup
-here you should decide first to what extent you assume them to not correspond (this was the object of a Kevin's question above): you can have lowercase letters (simple to solve), the two letter code embedded in a longer string with extra characters before and after (with possible multiple correspondences) or else
-let's assume, as you confirmed above, that you take the first two letters as correct (except for the case), and that you expect only extra characters to the right (of any length and type)
Kevin gave you already the answer for this, except that he used it for the lookup file, because you told us first that the extra characters were in the lookup file.
Now, if my clarification above is correct, you simply have to do something like this (derived from your code and untested)

Code:

open(MAP,"<$maplocation")||die "Could not open $maplocation";
while (<MAP>) {
  chomp;
  ($code, $abbrev, $fullname)=split /::/, $_;
  $states{$abbrev}=$fullname;
}
close MAP;

open(COLS,"<$file")||die "Could not open $file";
@file=<COLS>;
close COLS;
foreach $line (@file) {
  @data=split(/\|\|/,$line);      
  $d8=uc(substr($data[8],0,2));
  $data[8]=(exists $states{$d8})?$states{$d8}:'Error';
}

As already recalled above, you cannot use wildcards with the [tt]exists[/tt] function: for more complex corrections to the abbreviations in the address file (e.g.extra characters before and after) you should necessarily check all the keys in [tt]%states[/tt] , possibly using a regexp.

Franco

http://www.xcalcs.com

: Online engineering calculations

http://www.megamag.it

: Magnetic brakes for fun rides

http://www.levitans.com

: Air bearing pads

KevinADC · Apr 11, 2008

I am dropping out of this thread, it has eaten it's own tail a couple of times now and continues to just turn circles. All the best.

------------------------------------------
- Kevin, perl coder unexceptional! [wiggle]

EricSilver · Apr 11, 2008

My apologies for the confusion; I am asking two separate, but similar questions in the same thread, and not taking into account that some of the "wildcard" functionality I want will occur by default. Thank you Kevin for your good help to this point. Franco, your understanding of the state lookup and abbreviation files is correct. That part already works perfectly.

I understand what to do if the length of the source file values are all 2-character, and the length of the state lookup file values are not: Use substr(xxx,0,2).

But if the length of the source values are not all 2 characters, things begin to fog up for me.

Example:

Source Values Lookup Values
------------- -------------
AZ AZ
AZ123 AZ
CAXYZ CA
NY123 NY
NYCCVB NYC
NYCVB NYCVB
MICH6789 MICH678

The current substr(x,0,2) code would work fine for the first four source/lookup values on the above list, but will generate errors for the last three. For those, I need to change the substr length in order to match them correctly.

Right now, I am wondering if it is possible for the substr length to be a variable, i.e.,

$data[8] = substr($abbrev,0,$var);

I could then insert code which, before conducting each lookup, counts the length of the target lookup value and makes that length the $var value.

EricSilver · Apr 11, 2008

To answer part of my own question, substr length can be a variable, so doing an the on-the-fly length change, just before the lookup, is where I will focus my energies, and hopefully get the result I need.

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Need Urgent Help with Perl Lookup &amp; Substitution Routine 1

Technical User

Technical User

Technical User

Technical User

Technical User

Technical User

Technical User

Technical User

Programmer

Technical User

Technical User

Technical User

Technical User

Technical User

Technical User

Technical User

Technical User

Programmer

Technical User

Technical User

Technical User

Similar threads

Log in

Part and Inventory Search

Sponsor

Need Urgent Help with Perl Lookup & Substitution Routine 1