Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations bkrike on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Advanced matching

Status
Not open for further replies.

tom79

Programmer
Apr 15, 2007
2
NL
Hi, I'm trying to write a script that visits webpages containing thumbnails linking to movies and fetches the thumbnail urls. Here is what I came up with:

$source =~ /href[^>]*\.$match[^>]*>[^<]*<[\s]*img[^>]*src[\s]*=["'\s]*([^"'\s>]*)/i;
$iurl = $1;

...where $match would be different movie extensions, like mpg/wmv/avi. This works well with the exception that if there are any html tags between the href and img tags, it won't match the image url. Basically, it always checks the html tag following the href tag and if it happens to be something different than an img tag, it won't work.

Does anyone have any idea how I could fix this? Any help would be greatly appreciated!

Thanks,
Tom
 
post an example of what you are having trouble matching.

------------------------------------------
- Kevin, perl coder unexceptional! [wiggle]
 
Hi, for example:

<a href="clip.mpg"><font color="Red"><img src="thumbnail.jpg"></font></a>

Thanks,
Tom
 
HTML is notoriously difficult to parse using regexps. I'd go with a proper, tag-aware parser for this, something like HTML::TokeParser::Simple. It'll make things much easier (and reliable) for you.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top