clearing the memory

richardko · Mar 4, 2007

hi, i am trying to parse a bunch of html files in a directory and i need help clearing the memory.
I get the same output from "$table_html" for all the files.

Seems like table_html doesnt get emptied out. I tried the delete and delete_content() methods but it did not work.
Any idea?

Code:

foreach $file (@dir_contents)
{
        if(!(($file eq ".") || ($file eq "..")))
        {
                $filename = "/var/[URL unfurl="true"]www/cgi-bin/pages/$file";[/URL]           
                $te->parse_file($filename);                
                $table = $te->first_table_found;
                $table_tree = $table->tree;                
                $table_html = $table_tree->as_HTML;
                print $table_html."\n";            
         }
}

KevinADC · Mar 4, 2007

my $table_html = $table_tree->as_HTML;

------------------------------------------
- Kevin, perl coder unexceptional! [wiggle]

richardko · Mar 5, 2007

Thanks for the reply. However, defining

Code:

my $table_html = $table_tree->as_HTML;

still gives me the same result. I think the problem lies with $table->tree being the same every time but I am not sure.

MillerH · Mar 5, 2007

Your problem lines in your parser "$te". Without knowing any about what it is or how it was created, we can't possible suggest how to flush the buffer. I would expect that there will be some way of clearing it's buffer though.

If all else fails, look into recreating the parser each time.

- Miller

KevinADC · Mar 5, 2007

maybe:

my $te->parse_file($filename);

------------------------------------------
- Kevin, perl coder unexceptional! [wiggle]

richardko · Mar 5, 2007

redeclaring the object does seem to work. Thanks for the response.

MillerH · Mar 5, 2007

Glad we could help.

Just as a note concerning stylistic coding. I personally do not like to include a large if block directly within a while. Most often you can invert or move the logic in such a way as to remove the block entirely and make the code a lot more readable.

For example, take a look at the following examples:

Using next to specifically handle those special cases.

Code:

foreach $file (@dir_contents) {
	[COLOR=green]next if $file eq "." || $file eq "..";[/color]

	$filename = "/var/[URL unfurl="true"]www/cgi-bin/pages/$file";[/URL]
	$te->parse_file($filename);
	$table = $te->first_table_found;
	$table_tree = $table->tree;
	$table_html = $table_tree->as_HTML;
	print $table_html."\n";
}

Using a regex instead:

Code:

foreach $file (@dir_contents) {
	[COLOR=green]next if $file =~ /^\.+$/;[/color]

	$filename = "/var/[URL unfurl="true"]www/cgi-bin/pages/$file";[/URL]
	$te->parse_file($filename);
	$table = $te->first_table_found;
	$table_tree = $table->tree;
	$table_html = $table_tree->as_HTML;
	print $table_html."\n";
}

Moving the filtering to the while by adding a grep

Code:

foreach $file ([COLOR=green]grep {!/^\.+$/}[/color] @dir_contents) {
	$filename = "/var/[URL unfurl="true"]www/cgi-bin/pages/$file";[/URL]
	$te->parse_file($filename);
	$table = $te->first_table_found;
	$table_tree = $table->tree;
	$table_html = $table_tree->as_HTML;
	print $table_html."\n";
}

As I hope you can see, each of the above coding styles is a lot more readable then having a giant if statement. Another example would be to add the grep to the statement creating @dir_contents itself, as this is a common type of filtering that is needed.

- Miller

stevexff · Mar 5, 2007

Useful tips from Miller. While we are on the subject of grepping out things we don't want, what happens if @dir_contents contains other directories?

Steve

[small]"Every program can be reduced by one instruction, and every program has at least one bug. Therefore, any program can be reduced to one instruction which doesn't work." (Object:erlDesignPatterns)[/small]

richardko · Mar 7, 2007

On the subject of grep I have another dilemma regarding html i need to replace.
Part of the html contains "img src" tag like this:

Code:

<img alt="pad" border="0" height="5" src="[URL unfurl="true"]http://some.example.com/Img/trans_1x1.gif"[/URL] width="1" />

The html contains more than one "img src" tags. I need to replace the

Code:

[URL unfurl="true"]http://some.example.com/Img/[/URL]

with a different url. (something like

http://website.com/images)

I am not even sure where to start looking in regex to match all the way from http to the last backslash "/". I could do that once and replace with a url not once but where ever this occurs. I am guessing backreference is the way to go on this one.
Any idea how to tell regex to match all the backslash upto the last one?
thanks

MillerH · Mar 8, 2007

richardko said:
On the subject of grep I have another dilemma regarding html i need to replace.
Part of the html contains "img src" tag like this:

It is certainly possible to create complicated regex'es to parse and translate HTML, but it is generally a waste of your time. Instead, learn one of the HTML parsers available via CPAN, and make it do the translation.

It's a much better investment of your time (let alone ours), then making single use regex's simply as a programming challenge.

- Miller

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

clearing the memory

richardko

Programmer

KevinADC

Technical User

richardko

Programmer

MillerH

Programmer

KevinADC

Technical User

richardko

Programmer

MillerH

Programmer

stevexff

Programmer

richardko

Programmer

MillerH

Programmer

Similar threads

Part and Inventory Search

Sponsor