Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations TouchToneTommy on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Module re-write.. for those that really bored :)

Status
Not open for further replies.

travs69

MIS
Dec 21, 2006
1,431
US
So.. here is the module HtmlTableParser. Is there any way to condense it (I don't care if it stays as a module or it is just done directly in HTML::parser) into as few char's as possible (new lines don't count) regardless of the readability as long as the functionality stays the same.

There really is no goal in this other than fun.. (plus the forum has been kind of boring lately).



Code:
package HTML::TableContentParser;
use HTML::Parser;
@ISA = qw(HTML::Parser);
use strict;
our $VERSION = 0.13;
our $DEBUG = 0;
# The tags we're interested in.
my @tag_names = qw(table tr td th caption);
sub start
{
            my ($self, $tag, $attr, $attrseq, $origtext) = @_;
            $tag = lc($tag);
# Store the incoming details in the current 'object'.
            if ($tag eq 'table') {
                        my $table = $attr;
                        push @{$self->{STORE}->{tables}}, $table;
                        $self->{STORE}->{current_table} = $table;
            } elsif ($tag eq 'th') {
                        my $th = $attr;
                        push @{$self->{STORE}->{current_table}->{headers}}, $th;
                        $self->{STORE}->{current_header} = $th;
                        $self->{STORE}->{current_element} = $th;
            } elsif ($tag eq 'tr') {
                        my $tr = $attr;
                        push @{$self->{STORE}->{current_table}->{rows}}, $tr;

                        $self->{STORE}->{current_row} = $tr;

                        $self->{STORE}->{current_element} = $tr;

 

            } elsif ($tag eq 'td') {

                        my $td = $attr;

                        push @{$self->{STORE}->{current_row}->{cells}}, $td;
                        $self->{STORE}->{current_data_cell} = $td;
                        $self->{STORE}->{current_element} = $td;
            } elsif ($tag eq 'caption') {
                        my $cap = $attr;
                    $self->{STORE}->{current_table}->{caption} = $cap;
                     $self->{STORE}->{current_element} = $cap;
            } else {
## Found a non-table related tag. Push it into the currently-defined td
## or th (if one exists).
                        my $elem = $self->{STORE}->{current_element};
                        if ($elem) {
                                    $self->debug('TEXT(tag) = ', $origtext) if $DEBUG;
                                    $elem->{data} .= $origtext;
                       }
          }
            $self->debug($origtext) if $DEBUG;
}
sub text
{
            my ($self, $text) = @_;
            my $elem = $self->{STORE}->{current_element};
            if (!$elem) {
                        return undef;
            }
            $self->debug('TEXT = ', $text) if $DEBUG;
            $elem->{data} .= $text;
}

sub end
{
            my ($self, $tag, $origtext) = @_;
            $tag = lc($tag);
# Turn off the current object
            if ($tag eq 'table') {
                        $self->{STORE}->{current_table} = undef;
                        $self->{STORE}->{current_row} = undef;
                        $self->{STORE}->{current_data_cell} = undef;
                        $self->{STORE}->{current_header} = undef;
                        $self->{STORE}->{current_element} = undef;
            } elsif ($tag eq 'th') {
                        $self->{STORE}->{current_row} = undef;
                        $self->{STORE}->{current_data_cell} = undef;
                        $self->{STORE}->{current_header} = undef;
                        $self->{STORE}->{current_element} = undef;
            } elsif ($tag eq 'tr') {
                        $self->{STORE}->{current_row} = undef;
                        $self->{STORE}->{current_data_cell} = undef;
                        $self->{STORE}->{current_header} = undef;
                        $self->{STORE}->{current_element} = undef;
            } elsif ($tag eq 'td') {
                        $self->{STORE}->{current_data_cell} = undef;
                        $self->{STORE}->{current_header} = undef;
                        $self->{STORE}->{current_element} = undef;
            } elsif ($tag eq 'caption') {
                        $self->{STORE}->{current_element} = undef;
            } else {
## Found a non-table related close tag. Push it into the currently-defined
## td or th (if one exists).
                        my $elem = $self->{STORE}->{current_element};
                        if ($elem) {
                                    $self->debug('TEXT(tag) = ', $origtext) if $DEBUG;
                                    $elem->{data} .= $origtext;
                        }
            }
            $self->debug($origtext) if $DEBUG;
}
sub parse
{
            my ($self, $data) = @_;
            $self->{STORE} = undef;
# Ensure the following keys exist
            $self->{STORE}->{current_data_cell} = undef;
            $self->{STORE}->{current_row} = undef;
            $self->{STORE}->{current_table} = undef;
            $self->SUPER::parse($data);
            return $self->{STORE}->{tables};
}
sub debug
{
            my ($self) = shift;
            my $class = ref($self);
            warn "$class: ", join('', @_), "\n";
}

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[noevil]
Travis - Those who say it cannot be done are usually interrupted by someone else doing it; Give the wrong symptoms, get the wrong solutions;
 
Hey Travs,

While golfing can be fun, I much prefer making code easier to read rather than harder. Here's my first attempt at simplifying your logic:

Code:
[url=http://perldoc.perl.org/functions/package.html][black][b]package[/b][/black][/url] [green]HTML::TableContentParser[/green][red];[/red]

[url=http://perldoc.perl.org/functions/use.html][black][b]use[/b][/black][/url] [green]HTML::Parser[/green][red];[/red]
[blue]@ISA[/blue] = [red]qw([/red][purple]HTML::Parser[/purple][red])[/red][red];[/red]

[black][b]use[/b][/black] [green]strict[/green][red];[/red]

[url=http://perldoc.perl.org/functions/our.html][black][b]our[/b][/black][/url] [blue]$VERSION[/blue] = [fuchsia]0.13[/fuchsia][red];[/red]
[black][b]our[/b][/black] [blue]$DEBUG[/blue] = [fuchsia]0[/fuchsia][red];[/red]

[gray][i]# The tags we're interested in.[/i][/gray]
[url=http://perldoc.perl.org/functions/my.html][black][b]my[/b][/black][/url] [blue]@tag_names[/blue] = [red]qw([/red][purple]table tr td th caption[/purple][red])[/red][red];[/red]

[black][b]my[/b][/black] [blue]%START_STORE[/blue] = [red]([/red]
	[purple]table[/purple]   => [red][[/red][red]qw([/red][purple]@ tables[/purple][red])[/red][red]][/red],
	[red]tr      =[/red][purple]> [qw(@ current_table headers)],[/purple]
[purple]	th      [/purple][red]=[/red][purple]> [qw(@ current_table rows)],[/purple]
[purple]	td      [/purple][red]=[/red]> [red][[/red][red]qw([/red][purple]@ current_row cells[/purple][red])[/red][red]][/red],
	[purple]caption[/purple] => [red][[/red][red]qw([/red][purple]$ current_table caption[/purple][red])[/red][red]][/red],
[red])[/red][red];[/red]

[black][b]my[/b][/black] [blue]%START_CURRENT[/blue] = [red]([/red]
	[purple]table[/purple]   => [red][[/red][red]qw([/red][purple]table[/purple][red])[/red][red]][/red],
	[red]tr      =[/red][purple]> [qw(row element)],[/purple]
[purple]	th      [/purple][red]=[/red][purple]> [qw(header element)],[/purple]
[purple]	td      [/purple][red]=[/red]> [red][[/red][red]qw([/red][purple]data_cell element[/purple][red])[/red][red]][/red],
	[purple]caption[/purple] => [red][[/red][red]qw([/red][purple]element[/purple][red])[/red][red]][/red],
[red])[/red][red];[/red]

[url=http://perldoc.perl.org/functions/sub.html][black][b]sub[/b][/black][/url] [maroon]start[/maroon] [red]{[/red]
	[black][b]my[/b][/black] [red]([/red][blue]$self[/blue], [blue]$tag[/blue], [blue]$attr[/blue], [blue]$attrseq[/blue], [blue]$origtext[/blue][red])[/red] = [blue]@_[/blue][red];[/red]
	[blue]$tag[/blue] = [url=http://perldoc.perl.org/functions/lc.html][black][b]lc[/b][/black][/url][red]([/red][blue]$tag[/blue][red])[/red][red];[/red]
	
	[gray][i]# Store the incoming details in the current 'object'.[/i][/gray]
	[olive][b]if[/b][/olive] [red]([/red][url=http://perldoc.perl.org/functions/exists.html][black][b]exists[/b][/black][/url] [blue]$START_STORE[/blue][red]{[/red][blue]$tag[/blue][red]}[/red][red])[/red] [red]{[/red]
		[black][b]my[/b][/black] [red]([/red][blue]$type[/blue], [blue]@levels[/blue][red])[/red] = [blue]@[/blue][red]{[/red][blue]$START_STORE[/blue][red]{[/red][blue]$tag[/blue][red]}[/red][red]}[/red][red];[/red]
		[black][b]my[/b][/black] [blue]$store[/blue] = [blue]$self[/blue]->[red]{[/red]STORE[red]}[/red][red];[/red]
		[blue]$store[/blue] = [blue]$store[/blue]->[red]{[/red][blue]$_[/blue][red]}[/red] [olive][b]foreach[/b][/olive] [red]([/red][blue]@levels[/blue][red])[/red][red];[/red]
		[olive][b]if[/b][/olive] [red]([/red][blue]$type[/blue] eq [red]'[/red][purple]@[/purple][red]'[/red][red])[/red] [red]{[/red]
			[url=http://perldoc.perl.org/functions/push.html][black][b]push[/b][/black][/url] [blue]@$store[/blue], [blue]$attr[/blue][red];[/red]
		[red]}[/red] [olive][b]else[/b][/olive] [red]{[/red]
			[blue]$store[/blue] = [blue]$attr[/blue][red];[/red]
		[red]}[/red]
		
		[blue]$self[/blue]->[red]{[/red]STORE[red]}[/red]->[red]{[/red][red]"[/red][purple]current_[blue]$_[/blue][/purple][red]"[/red][red]}[/red] = [blue]$attr[/blue] [olive][b]foreach[/b][/olive] [red]([/red][blue]@[/blue][red]{[/red][blue]$START_CURRENT[/blue][red]{[/red][blue]$tag[/blue][red]}[/red][red]}[/red][red])[/red][red];[/red]
		
	[red]}[/red] [olive][b]else[/b][/olive] [red]{[/red]
		[gray][i]## Found a non-table related tag. Push it into the currently-defined td[/i][/gray]
		[gray][i]## or th (if one exists).[/i][/gray]
		[black][b]my[/b][/black] [blue]$elem[/blue] = [blue]$self[/blue]->[red]{[/red]STORE[red]}[/red]->[red]{[/red]current_element[red]}[/red][red];[/red]
		[olive][b]if[/b][/olive] [red]([/red][blue]$elem[/blue][red])[/red] [red]{[/red]
			[blue]$self[/blue]->[maroon]debug[/maroon][red]([/red][red]'[/red][purple]TEXT(tag) = [/purple][red]'[/red], [blue]$origtext[/blue][red])[/red] [olive][b]if[/b][/olive] [blue]$DEBUG[/blue][red];[/red]
			[blue]$elem[/blue]->[red]{[/red]data[red]}[/red] .= [blue]$origtext[/blue][red];[/red]
		[red]}[/red]
	[red]}[/red]
	[blue]$self[/blue]->[maroon]debug[/maroon][red]([/red][blue]$origtext[/blue][red])[/red] [olive][b]if[/b][/olive] [blue]$DEBUG[/blue][red];[/red]
[red]}[/red]

[black][b]sub[/b][/black] [maroon]text[/maroon] [red]{[/red]
	[black][b]my[/b][/black] [red]([/red][blue]$self[/blue], [blue]$text[/blue][red])[/red] = [blue]@_[/blue][red];[/red]
	[black][b]my[/b][/black] [blue]$elem[/blue] = [blue]$self[/blue]->[red]{[/red]STORE[red]}[/red]->[red]{[/red]current_element[red]}[/red]
		or [url=http://perldoc.perl.org/functions/return.html][black][b]return[/b][/black][/url] [url=http://perldoc.perl.org/functions/undef.html][black][b]undef[/b][/black][/url][red];[/red]

	[blue]$self[/blue]->[maroon]debug[/maroon][red]([/red][red]'[/red][purple]TEXT = [/purple][red]'[/red], [blue]$text[/blue][red])[/red] [olive][b]if[/b][/olive] [blue]$DEBUG[/blue][red];[/red]
	[blue]$elem[/blue]->[red]{[/red]data[red]}[/red] .= [blue]$text[/blue][red];[/red]
[red]}[/red]

[black][b]my[/b][/black] [blue]%END_STORE[/blue] = [red]([/red]
	[purple]table[/purple]   => [red][[/red][red]qw([/red][purple]table row data_cell header element[/purple][red])[/red][red]][/red],
	[red]tr      =[/red][purple]> [qw(      row data_cell header element)],[/purple]
[purple]	th      [/purple][red]=[/red][purple]> [qw(      row data_cell header element)],[/purple]
[purple]	td      [/purple][red]=[/red]> [red][[/red][red]qw([/red][purple]      row data_cell header element[/purple][red])[/red][red]][/red],
	[purple]caption[/purple] => [red][[/red][red]qw([/red][purple]                           element[/purple][red])[/red][red]][/red],
[red])[/red][red];[/red]

[black][b]sub[/b][/black] [maroon]end[/maroon] [red]{[/red]
	[black][b]my[/b][/black] [red]([/red][blue]$self[/blue], [blue]$tag[/blue], [blue]$origtext[/blue][red])[/red] = [blue]@_[/blue][red];[/red]
	[blue]$tag[/blue] = [black][b]lc[/b][/black][red]([/red][blue]$tag[/blue][red])[/red][red];[/red]
	
	[gray][i]# Turn off the current object[/i][/gray]
	[olive][b]if[/b][/olive] [red]([/red][black][b]exists[/b][/black] [blue]$END_STORE[/blue][red]{[/red][blue]$tag[/blue][red]}[/red][red])[/red] [red]{[/red]
		[blue]$self[/blue]->[red]{[/red]STORE[red]}[/red]->[red]{[/red][red]"[/red][purple]current_[blue]$_[/blue][/purple][red]"[/red][red]}[/red] = [black][b]undef[/b][/black] [olive][b]for[/b][/olive] [red]([/red][blue]@[/blue][red]{[/red][blue]$END_STORE[/blue][red]{[/red][blue]$tag[/blue][red]}[/red][red]}[/red][red])[/red][red];[/red]
		
	[red]}[/red] [olive][b]else[/b][/olive] [red]{[/red]
		[gray][i]## Found a non-table related close tag. Push it into the currently-defined[/i][/gray]
		[gray][i]## td or th (if one exists).[/i][/gray]
		[black][b]my[/b][/black] [blue]$elem[/blue] = [blue]$self[/blue]->[red]{[/red]STORE[red]}[/red]->[red]{[/red]current_element[red]}[/red][red];[/red]
		[olive][b]if[/b][/olive] [red]([/red][blue]$elem[/blue][red])[/red] [red]{[/red]
			[blue]$self[/blue]->[maroon]debug[/maroon][red]([/red][red]'[/red][purple]TEXT(tag) = [/purple][red]'[/red], [blue]$origtext[/blue][red])[/red] [olive][b]if[/b][/olive] [blue]$DEBUG[/blue][red];[/red]
			[blue]$elem[/blue]->[red]{[/red]data[red]}[/red] .= [blue]$origtext[/blue][red];[/red]
		[red]}[/red]
	[red]}[/red]
	[blue]$self[/blue]->[maroon]debug[/maroon][red]([/red][blue]$origtext[/blue][red])[/red] [olive][b]if[/b][/olive] [blue]$DEBUG[/blue][red];[/red]
[red]}[/red]

[black][b]sub[/b][/black] [maroon]parse[/maroon] [red]{[/red]
	[black][b]my[/b][/black] [red]([/red][blue]$self[/blue], [blue]$data[/blue][red])[/red] = [blue]@_[/blue][red];[/red]
	
	[gray][i]# Reset and Initialize[/i][/gray]
	[blue]$self[/blue]->[red]{[/red]STORE[red]}[/red] = [red]{[/red][red]}[/red][red];[/red]
	[blue]$self[/blue]->[red]{[/red]STORE[red]}[/red]->[red]{[/red][red]"[/red][purple]current_[blue]$_[/blue][/purple][red]"[/red][red]}[/red] = [black][b]undef[/b][/black] [olive][b]for[/b][/olive] [red]qw([/red][purple]table row data_cell[/purple][red])[/red][red];[/red]
	
	[blue]$self[/blue]->[maroon]SUPER::parse[/maroon][red]([/red][blue]$data[/blue][red])[/red][red];[/red]
	
	[black][b]return[/b][/black] [blue]$self[/blue]->[red]{[/red]STORE[red]}[/red]->[red]{[/red]tables[red]}[/red][red];[/red]
[red]}[/red]

[black][b]sub[/b][/black] [maroon]debug[/maroon] [red]{[/red]
	[black][b]my[/b][/black] [red]([/red][blue]$self[/blue][red])[/red] = [url=http://perldoc.perl.org/functions/shift.html][black][b]shift[/b][/black][/url][red];[/red]
	[black][b]my[/b][/black] [blue]$class[/blue] = [url=http://perldoc.perl.org/functions/ref.html][black][b]ref[/b][/black][/url][red]([/red][blue]$self[/blue][red])[/red][red];[/red]
	[url=http://perldoc.perl.org/functions/warn.html][black][b]warn[/b][/black][/url] [red]"[/red][purple][blue]$class[/blue]: [/purple][red]"[/red], [url=http://perldoc.perl.org/functions/join.html][black][b]join[/b][/black][/url][red]([/red][red]'[/red][purple][/purple][red]'[/red], [blue]@_[/blue][red])[/red], [red]"[/red][purple][purple][b]\n[/b][/purple][/purple][red]"[/red][red];[/red]
[red]}[/red]
[tt]------------------------------------------------------------
Pragmas (perl 5.8.8) used :
[ul]
[li]strict - Perl pragma to restrict unsafe constructs[/li]
[/ul]
Other Modules used :
[ul]
[li]HTML::parser[/li]
[li]HTML::TableContentParser[/li]
[/ul]
[/tt]

- Miller
 
You know.. I have that bookmarked.. and I just forget to use it.. can't Kevin sell it to tek-tips or something and make it the default!!!

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[noevil]
Travis - Those who say it cannot be done are usually interrupted by someone else doing it; Give the wrong symptoms, get the wrong solutions;
 
If you preview your posts, it's pretty easy to remember I think. :)

Btw, is there a way to send a private message to someone on this website?

- M
 
Tek-Tips has no plans to add syntax highlighting and there is no way to send a PM on this forum.

------------------------------------------
- Kevin, perl coder unexceptional! [wiggle]
 
Then how do I send y'all private messages? How do we make fun of other people without being overtly offensive?

Gawd, that's just silly.

[bandito]

- M
 
I agree.. I would think PM's would be nice..

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[noevil]
Travis - Those who say it cannot be done are usually interrupted by someone else doing it; Give the wrong symptoms, get the wrong solutions;
 
To make fun of me try this
Code:
$server = 'gmail';
$domain = 'com';
$tektipsusername = 'travs69';
my $email = $tektipsusername . '@' . $server . '.' . $domain;
print "$email\n";



~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[noevil]
Travis - Those who say it cannot be done are usually interrupted by someone else doing it; Give the wrong symptoms, get the wrong solutions;
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top