puns0steel
Technical User
- Jun 12, 2008
- 4
I'm brand new to perl, so any help would be great! I'm using ActiveState on XP. I'm trying to extract only the zip codes from an html file and put them into another file separated with line breaks or commas or something so I can put them in a spreadsheet.
Here's the code i'm using (i got it from a friend):
open(INFILE, '<', "alldata.html") or die("Could not open output file.\n");
open(OUTFILE, '>', "justzipcodes.html") or die ("Could not open output file.\n");
my $line;
while ($line = <INFILE>)
{
if ($line =~ /\b\d{5}(?:[-\s]\d{4})?\b/)
{
print OUTFILE $line;
}
}
close(OUTFILE);
close(INFILE);
it outputs to the file, but it includes the whole line of data the has the zip code, it's all hyperlinked, and there's nothing separating the data--no line breaks or anything. I'd like just a simple 5-digit zip code with no links or anything.
I also need to get rid of duplicates, but i'm guessing that's the next step.
Please help, thanks!
Here's the code i'm using (i got it from a friend):
open(INFILE, '<', "alldata.html") or die("Could not open output file.\n");
open(OUTFILE, '>', "justzipcodes.html") or die ("Could not open output file.\n");
my $line;
while ($line = <INFILE>)
{
if ($line =~ /\b\d{5}(?:[-\s]\d{4})?\b/)
{
print OUTFILE $line;
}
}
close(OUTFILE);
close(INFILE);
it outputs to the file, but it includes the whole line of data the has the zip code, it's all hyperlinked, and there's nothing separating the data--no line breaks or anything. I'd like just a simple 5-digit zip code with no links or anything.
I also need to get rid of duplicates, but i'm guessing that's the next step.
Please help, thanks!