mpalmer12345
Programmer
I'd like to create an ultra simple HTML parser/replacement tool that looks for any and all alphanumeric text between > and <\n and blanks it out with xs. So a string containing
$text = "<TITLE>This is a title<\TITLE>";
would become
<TITLE>xxxx xx x xxxxx<\TITLE>.
$text = "<TITLE>This is a title<\\TITLE>";
$text =~ s/(.*?)<(.*?)>(.*?)<\\(.*)>/$1<$2>/;
$tex2 = $3;
$tex3 = $4;
$tex2 =~ s/\w/x/g;
$text .= "$tex2<\\$tex3>";
This code works for the $text sample above, but it doesn't work for
$text = '<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
<HTML>
<HEAD>
<TITLE>Index of /</TITLE>';
and I can't figure out why. Maybe the newlines are screwing it up?
$text = "<TITLE>This is a title<\TITLE>";
would become
<TITLE>xxxx xx x xxxxx<\TITLE>.
$text = "<TITLE>This is a title<\\TITLE>";
$text =~ s/(.*?)<(.*?)>(.*?)<\\(.*)>/$1<$2>/;
$tex2 = $3;
$tex3 = $4;
$tex2 =~ s/\w/x/g;
$text .= "$tex2<\\$tex3>";
This code works for the $text sample above, but it doesn't work for
$text = '<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
<HTML>
<HEAD>
<TITLE>Index of /</TITLE>';
and I can't figure out why. Maybe the newlines are screwing it up?