Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations bkrike on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

RegExp Remove Tag and its Attributes, but not the containing text. 2

Status
Not open for further replies.

Aarem

Programmer
Oct 11, 2004
69
US
Hi. Is there a regexp that can erase the <b style="whatever"> and the </b> in the following code, but leave bagels and blarneystones? I need this for my own context ad service.

<b style="whatever">bagels and blarneystones</b>
 
There is a regex to do this, but if you look at for HTML Parser, you'll find a few modules that are ready made for this purpose

HTH
--Paul

Nancy Griffith - songstress extraordinaire,
and composer of the snipers anthem "From a distance ...
 
By all means go with Paul's suggestion, but here is a way to do it with an re:
Code:
my $stuff = '<b style="whatever">bagels and blarneystones</b>';
$stuff =~ [b]s/<.*?>//g;[/b]
print "$stuff\n";

 
The first one won't match if there's a newline within the tag (without using the /s modifier, '.' doesn't match \n). I believe the second's more efficient too, though I can't remember why.
 
less greedy??
--Paul

Nancy Griffith - songstress extraordinaire,
and composer of the snipers anthem "From a distance ...
 
ishnid, I think you're right about the second regex being more efficient. I don't remember exactly why either (and I'm not about to go wading through docs or Mastering Regular Expressions right now). Aarem, if you're going to go with one or the other, the second is probably better.
 
This is a good reference on this and backs up what I suspected. In particular, see the paragraph that begine "Tracking is another problem with both of them". Apparently "Mastering Regular Expressions" deals with this on page 226.
 
The first one is too greedy, and will remove everything between the first < and the last > Including the bagels and blarneystones. The second one works as expected as it only matches up to the closing > of a tag.
 
Thanks for the reference, ishnid. Shame on me. I'll try to avoid expressions like this in the future.

 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top