Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Chriss Miller on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

"For" substitutions 2

Status
Not open for further replies.

Kirsle

Programmer
Joined
Jan 21, 2006
Messages
1,179
Location
US
I'm wondering if there is a way to search and replace in a scalar in a "for" kind of basis. For an example, say you have an HTML page which gets parsed and printed to the browser, and the HTML page can <include> other HTML pages:

Code:
<html>
<head>
<title>My Website</title>
</head>
<body>

<include file="top.html">
<include file="leftnav.html">

main page content goes here,
oh, and we wanted a web poll!<p>

<include file="poll.html">

<include file="footer.html">

</body>
</html>

The only way I would know to parse this, is to have a while() loop to see if the page contains an <include> tag in a regexp, and then process its data and then substitute the <include> regexp out one time, so that each <include> tag on the page is checked:

Code:
open (PAGE, "index.html");
my @html = <PAGE>;
close (PAGE);
chomp @html;

my $src = join ("\n",@html);

while ($src =~ /<include file="(.+?)">/i) {
   my $file = $1;

   open (INC, "$file");
   my @data = <INC>;
   close (INC);
   chomp @data;

   my $include = join ("\n",@data);

   $src =~ s/<include file="(.+?)">/$include/i;
}

This method seem to be inefficient to me. Sometimes when I'd program this for more complicated regular expressions, the s/// part doesn't seem to work and the program gets caught on an infinite loop. Is there a way to have a "for" kind of loop which would check each occurrence of the regexp and then quit and not cycle back through?

I.E. if I didn't s/// the regexp out, the while loop would loop forever, is there a way for it to loop through the scalar once for each time the regexp matched and then quit?

I tried something like:
Code:
for ($src =~ /<include file="(.+?)">/) {
}

But that block was only called once for the first <include> tag.

Thanks.

-------------
Kirsle.net | Kirsle's Programs and Projects
 
The biggest problem that I see in your approach is the needless use of the while loop. The regex substitution operator already has functionality that will search through each occurance of a pattern, it's the 'g' or global modifier. That will automatically find each occurance of a pattern, and then you can do whatever operations that you deem necessary in the replacement code. It also appears that you are trying to do a recursive include code. In this instance, I would parse each file individually, instead of inputting the file and parsing the whole document over again. It would look something like this:

Code:
my $page = 'index.html';
print parseIncludes($page);

sub parseIncludes {
	my ($file, @parents) = @_;

	# Read in contents
	open(PAGE, $file) or die "open $file: $!";
	my @html = <PAGE>;
	close (PAGE) or die "close $file: $!";
	chomp @html;
	my $src = join "\n", @html;

	# Global search and replace.
	$src =~ s{<include file="(.+?)">}{
		my $include = $1;
		
		# Detect recursive includes
		if (grep {$include eq $_} @parents) {
			warn "Recursive include of $include detected";
			"[NOT INCLUDING $include]";
		
		# Parse new file
		} else {
			parseIncludes($include, @parents, $file);
		}
	}eg;

	return $src;
}

I would also consider advising you to use some more standard technologies like shtml or Template::Toolkit instead of a regular expression. But this will at least give you want you describe I think. Good Luck.
 
untested code:

Code:
open (PAGE, "index.html");
my @html = <PAGE>;
close (PAGE);
chomp @html;

my $src = join ("\n",@html);

$src =~ s/<include file="([^"]+)">/include($1)/esig;

sub include {
   my $file = shift;
   open (INC, $file) or die "$!";;
   my @data = <INC>;
   close (INC);
   chomp @data;
   return(join ("\n",@data));
}

- Kevin, perl coder unexceptional!
 
ahh.... the "e" switch on the regexp tells it to execute code in the second half? I had seen code in regexps before but wasn't sure how they did it. Thanks for the help!

-------------
Kirsle.net | Kirsle's Programs and Projects
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top