Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations derfloh on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Pattern Matching

Status
Not open for further replies.

pedros007

Programmer
Joined
Nov 14, 2004
Messages
8
Location
GB
Hi,

I am trying to extract a list of URl's from a website, and then store them in a file. I have written a regular expression to extract all of the a href links from the site and store them in a file, but I would like to format them better.

Current format when they are extracted:

E.g.
<a href=" UK</a>

What formatting I would like to do:
"Karting UK", "
Is this possible? If so do you what is the best way to do it? I have a couple of the SAMS guide to Perl books, but they don't really talk about formatting.

Thank you

Pete
 
I hate regexes, they give my nightmares and people always correct me on them. But here I go! This is untested.

Code:
$string =~ m/<a href="([^"]+)">([^<\/a>]+)<\/a>/gi;
 
I think this does what you want.
Code:
#!perl
use strict;
use warnings;

while (<DATA>) {
    chomp;
    my ($addr, $co) = m|//([^"]+)">(.*?)<|;
    print qq("$co", "$addr"\n);
}

__DATA__
<a href="[URL unfurl="true"]http://www.karting.co.uk">Karting[/URL] UK</a>
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top