Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Chriss Miller on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

match regex multiple times per line 1

Status
Not open for further replies.

RedRobotHero

Technical User
May 2, 2003
94
CN
I'm trying to harvest links from an html page. (Not for anything nefarious, I assure you.) This is the code I've used to match the links:
Code:
foreach (@page) {
 if (/href=\s*['"]?([^\s'"]*)['"]?/)
  {
   push @links, $1;
  }
}

My trouble is this will only match once per line. How would I acquire all of the links on one line, instead of just the first?

(If all else fails, I'm going to take the lines and split them on the whitespace. But somehow I think this must be a common enough problem that there's a simpler way of doing it.)
 
You can use the gm modifier

so

/href=\s*['"]?([^\s'"]*)['"]?/gm

that means 'Global, find all matches'(g) and 'Treat string as multiple lines'(m)

If that doesn't work use 's' instead of 'm' (treat string as one line). I always forget which one does what.

John-
 
raklet,

Sounds like a "Hero".

I think its important to let people write their own "unreliable" code, and learn from their own mistakes, and readily realise the potential of reusable code, fragments, libraries, objects, modules or even require/use files as needed.

But in a new language, walking means you can cover some distance, running means you can cover some ground (g^2* > d^2), but running implies/infers/requires walking if you know what I mean ... (Long Day)

--Paul

PS Got bit in the ass once re reusable code, and NetAdmins who were 'just' doing their job - still waiting for invoice to clear
 
I'll admit it's probably a bad habit of mine to not use more modules. It's laziness, I guess. I can't bring myself to spend 20 times as much time to teach myself how to use a module as it took to write five lines of code that did the very specialized thing I needed it to do.

I think I'll force myself to use the module in this case, because it could pay off in the long run.

But I'd still like an answer to the question, for the sake of the more general principle of it, stripped of the HTML context.
 
Did you read my post?

You can use the gm modifier

so

/href=\s*['"]?([^\s'"]*)['"]?/gm

that means 'Global, find all matches'(g) and 'Treat string as multiple lines'(m)

If that doesn't work use 's' instead of 'm' (treat string as one line). I always forget which one does what.

John-
 
Yes, I must admit I spend a lot of time listening to "Hero".

Paul, I heard a rumor that you were "Axweildr" in another life? Is that true? If so, why did you disappear?

 
Sorry, siberian. I read your post, but it doesn't work. That would be a good approach for if I were using the form
Code:
s/XXX/XXX/gm
but not when I am doing
Code:
if (/XXX/gm) { action }
. All the matches will be replaced in the first case, but the action still is only executed one time in the second.
 
Sure what you are doing won't work.

The reason is this.

an IF statement evaluates to TRUE or FALSE, it has no understanding of what its matching.

The only way a multi match will work is to pass it out to an array. Without an array it returns a number.


So you should be doing this :

@links = /href=\s*['"]?([^\s'"]*)['"]?/gm ;

if($links[0]){ # do stuff

 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top