pattern matching between <ol> and <ul> tags 1

loosecannon1 · Jul 7, 2003

Hello All -

I have some text(paragraphs, tables, ordered and unordered lists) stored in variable $text_block. I need to convert every "\n" into "<br>". The kicker is that the conversion only needs to takes place within <ol> or <ul> tags.

I've been unsuccessful at trying to determine if the "\n" is within a list or possible a nested list, or not, with conditional statements and regexs.

I'm really stumped and would appreciate *any* suggestions

icrf · Jul 7, 2003

HTML:

arse

http://search.cpan.org/author/SBURKE/HTML-Tree-3.17/lib/HTML/Parse.pm

It can do it, but takes a bit of reading and setting up. You can tell it to only capture OL and UL tags, and just perform the replacement within.

----------------------------------------------------------------------------------
...but I'm just a C man trying to see the light

loosecannon1 · Jul 8, 2003

Thanks for the link...any ideas on how to do this with a regex?

icrf · Jul 8, 2003

Depending on how you nest:

Code:

use strict;
use warnings;

$_ = <<'END';
<h1>HEADER</h1>
<ul>
	<li>list one</li>
	<li>list two</li>
</ul>
Just a line.
<ol>
	<li>Me got number</li>
	<ol>
		<li>and another number</li>
	</ol>
</ol>

<ul>
	<li>Me got bullet</li>
	<ol>
		<li>and number</li>
	</ol>
</ul>
<h4>FOOTER</h4>

END

print;
s/<(ul|ol)>(.*?)<\/\1>/filter($1,$2)/ges;
print;

sub filter
{
	my $tag = shift;
	my $inside = shift;
	$inside =~ s/\n/<br>/g;
	return &quot;<$tag>$inside</$tag>&quot;;
}

It will work for the first list (simple case) and the third list (mixed case), but not the second (single-type, nested). It matches the nearest closing tag, not the matching closing tag. To match the closing tag, you'd essentially have to make a simplified version of HTML:

arser, and if that's the case, just use the module.

Search for "(??{ code })" in here:

http://www.perldoc.com/perl5.8.0/pod/perlre.html

It's possible something like that could be used to make a one-liner, but it'd be easier to just use the Parser module, which would also account for the matching shortcomings of the above method.

----------------------------------------------------------------------------------
...but I'm just a C man trying to see the light

loosecannon1 · Jul 8, 2003

Thanks for your time and direction.

loosecannon1 · Jul 8, 2003

What does the "\1" mean?

icrf · Jul 9, 2003

When you put things into groups by using ( ) in a regex, you make back references to what was matched. Usually, you use it outside the regex or in the substitution part as $1 for the first group, $2 for the second, etc. You can use it inside the regex itself, too, but you say \1, \2, etc instead.

Code:

s/<(ul|ol)>(.*?)<\/\1>/filter($1,$2)/ges;
   1       2

The \1 is matching whatever was matched in the first group, which is either ul or ol. It matches the first closing tag like the one opened. Notice that outside of the match part of the regex, use $ instead of \.

----------------------------------------------------------------------------------
...but I'm just a C man trying to see the light

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

pattern matching between <ol> and <ul> tags 1

loosecannon1

Programmer

icrf

Programmer

loosecannon1

Programmer

icrf

Programmer

loosecannon1

Programmer

loosecannon1

Programmer

icrf

Programmer

Similar threads

Part and Inventory Search

Sponsor

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

pattern matching between &lt;ol&gt; and &lt;ul&gt; tags 1

loosecannon1

Programmer

icrf

Programmer

loosecannon1

Programmer

icrf

Programmer

loosecannon1

Programmer

loosecannon1

Programmer

icrf

Programmer

Similar threads

Log in

Part and Inventory Search

Sponsor

pattern matching between <ol> and <ul> tags 1