Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Chriss Miller on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

clearing the memory

Status
Not open for further replies.

richardko

Programmer
Joined
Jun 20, 2006
Messages
127
Location
US
hi, i am trying to parse a bunch of html files in a directory and i need help clearing the memory.
I get the same output from "$table_html" for all the files.

Seems like table_html doesnt get emptied out. I tried the delete and delete_content() methods but it did not work.
Any idea?


Code:
foreach $file (@dir_contents)
{
        if(!(($file eq ".") || ($file eq "..")))
        {
                $filename = "/var/[URL unfurl="true"]www/cgi-bin/pages/$file";[/URL]           
                $te->parse_file($filename);                
                $table = $te->first_table_found;
                $table_tree = $table->tree;                
                $table_html = $table_tree->as_HTML;
                print $table_html."\n";            
         }
}
 
my $table_html = $table_tree->as_HTML;

------------------------------------------
- Kevin, perl coder unexceptional! [wiggle]
 
Thanks for the reply. However, defining
Code:
my $table_html = $table_tree->as_HTML;
still gives me the same result. I think the problem lies with $table->tree being the same every time but I am not sure.
 
Your problem lines in your parser "$te". Without knowing any about what it is or how it was created, we can't possible suggest how to flush the buffer. I would expect that there will be some way of clearing it's buffer though.

If all else fails, look into recreating the parser each time.

- Miller
 
maybe:

my $te->parse_file($filename);

------------------------------------------
- Kevin, perl coder unexceptional! [wiggle]
 
redeclaring the object does seem to work. Thanks for the response.
 
Glad we could help.

Just as a note concerning stylistic coding. I personally do not like to include a large if block directly within a while. Most often you can invert or move the logic in such a way as to remove the block entirely and make the code a lot more readable.

For example, take a look at the following examples:

Using next to specifically handle those special cases.
Code:
foreach $file (@dir_contents) {
	[COLOR=green]next if $file eq "." || $file eq "..";[/color]

	$filename = "/var/[URL unfurl="true"]www/cgi-bin/pages/$file";[/URL]
	$te->parse_file($filename);
	$table = $te->first_table_found;
	$table_tree = $table->tree;
	$table_html = $table_tree->as_HTML;
	print $table_html."\n";
}

Using a regex instead:
Code:
foreach $file (@dir_contents) {
	[COLOR=green]next if $file =~ /^\.+$/;[/color]

	$filename = "/var/[URL unfurl="true"]www/cgi-bin/pages/$file";[/URL]
	$te->parse_file($filename);
	$table = $te->first_table_found;
	$table_tree = $table->tree;
	$table_html = $table_tree->as_HTML;
	print $table_html."\n";
}

Moving the filtering to the while by adding a grep
Code:
foreach $file ([COLOR=green]grep {!/^\.+$/}[/color] @dir_contents) {
	$filename = "/var/[URL unfurl="true"]www/cgi-bin/pages/$file";[/URL]
	$te->parse_file($filename);
	$table = $te->first_table_found;
	$table_tree = $table->tree;
	$table_html = $table_tree->as_HTML;
	print $table_html."\n";
}

As I hope you can see, each of the above coding styles is a lot more readable then having a giant if statement. Another example would be to add the grep to the statement creating @dir_contents itself, as this is a common type of filtering that is needed.

- Miller
 
Useful tips from Miller. While we are on the subject of grepping out things we don't want, what happens if @dir_contents contains other directories?

Steve

[small]"Every program can be reduced by one instruction, and every program has at least one bug. Therefore, any program can be reduced to one instruction which doesn't work." (Object::PerlDesignPatterns)[/small]
 
On the subject of grep I have another dilemma regarding html i need to replace.
Part of the html contains "img src" tag like this:
Code:
<img alt="pad" border="0" height="5" src="[URL unfurl="true"]http://some.example.com/Img/trans_1x1.gif"[/URL] width="1" />

The html contains more than one "img src" tags. I need to replace the
Code:
[URL unfurl="true"]http://some.example.com/Img/[/URL]

with a different url. (something like I am not even sure where to start looking in regex to match all the way from http to the last backslash "/". I could do that once and replace with a url not once but where ever this occurs. I am guessing backreference is the way to go on this one.
Any idea how to tell regex to match all the backslash upto the last one?
thanks
 
richardko said:
On the subject of grep I have another dilemma regarding html i need to replace.
Part of the html contains "img src" tag like this:

It is certainly possible to create complicated regex'es to parse and translate HTML, but it is generally a waste of your time. Instead, learn one of the HTML parsers available via CPAN, and make it do the translation.

It's a much better investment of your time (let alone ours), then making single use regex's simply as a programming challenge.

- Miller
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top