Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations TouchToneTommy on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

pattern matching between <ol> and <ul> tags 1

Status
Not open for further replies.

loosecannon1

Programmer
Feb 19, 2002
55
US
Hello All -

I have some text(paragraphs, tables, ordered and unordered lists) stored in variable $text_block. I need to convert every &quot;\n&quot; into &quot;<br>&quot;. The kicker is that the conversion only needs to takes place within <ol> or <ul> tags.


I've been unsuccessful at trying to determine if the &quot;\n&quot; is within a list or possible a nested list, or not, with conditional statements and regexs.

I'm really stumped and would appreciate *any* suggestions
 
HTML::parse It can do it, but takes a bit of reading and setting up. You can tell it to only capture OL and UL tags, and just perform the replacement within.

----------------------------------------------------------------------------------
...but I'm just a C man trying to see the light
 
Thanks for the link...any ideas on how to do this with a regex?
 
Depending on how you nest:
Code:
use strict;
use warnings;

$_ = <<'END';
<h1>HEADER</h1>
<ul>
	<li>list one</li>
	<li>list two</li>
</ul>
Just a line.
<ol>
	<li>Me got number</li>
	<ol>
		<li>and another number</li>
	</ol>
</ol>

<ul>
	<li>Me got bullet</li>
	<ol>
		<li>and number</li>
	</ol>
</ul>
<h4>FOOTER</h4>

END

print;
s/<(ul|ol)>(.*?)<\/\1>/filter($1,$2)/ges;
print;

sub filter
{
	my $tag = shift;
	my $inside = shift;
	$inside =~ s/\n/<br>/g;
	return &quot;<$tag>$inside</$tag>&quot;;
}
It will work for the first list (simple case) and the third list (mixed case), but not the second (single-type, nested). It matches the nearest closing tag, not the matching closing tag. To match the closing tag, you'd essentially have to make a simplified version of HTML::parser, and if that's the case, just use the module.

Search for &quot;(??{ code })&quot; in here: It's possible something like that could be used to make a one-liner, but it'd be easier to just use the Parser module, which would also account for the matching shortcomings of the above method.

----------------------------------------------------------------------------------
...but I'm just a C man trying to see the light
 
When you put things into groups by using ( ) in a regex, you make back references to what was matched. Usually, you use it outside the regex or in the substitution part as $1 for the first group, $2 for the second, etc. You can use it inside the regex itself, too, but you say \1, \2, etc instead.
Code:
s/<(ul|ol)>(.*?)<\/\1>/filter($1,$2)/ges;
   1       2
The \1 is matching whatever was matched in the first group, which is either ul or ol. It matches the first closing tag like the one opened. Notice that outside of the match part of the regex, use $ instead of \.

----------------------------------------------------------------------------------
...but I'm just a C man trying to see the light
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top