parsing html

dinger2121 · May 1, 2008

Hello,
I am trying to parse this bit of html -

LEGAL</td>
<td headers="col2_1" style="width:13%; text-align:right" >
151</td>

using this line -

m!\$field\\</td>(\S*)\<td headers="col2_1" style="width:13%; text-align:right" >(\S*)\(.+?)/</td>!is)

The script is not currently finding anything. can anyone see where I might be off?

Thanks

ishnid · May 1, 2008

Parsing HTML with regexps is rarely a good idea. Especially if you're not entirely comfortable with them (you're preceding the '<' character with a backslash despite it not having any special meaning within a regexp, and you're using \S instead of \s to match whitespace).

Have a look on CPAN for HTML::TokeParser or HTML::TokeParser::Simple for parsing HTML. Those will be far more robust to minor changes in the HTML code in future (what if they change the width to 14% instead?), which will likely break your regexp.

dinger2121 · May 2, 2008

thank you....I will look at html::tokeparser::simple.
I am new to Perl, just trying some things out.

Thanks again

dinger2121 · May 2, 2008

I would like to quickly explain what I would like to accomplish in hopes that someone will affirm that I should be using HTML::TokeParser::Simple.

I have a page that has multiple sections like the following -

<td headers="col1_1" style="width:21%" >
LETTER</td>
<td headers="col2_1" style="width:13%; text-align:right" >
4,889</td>
<td headers="col3_1" style="width:13%; text-align:right" >
1.0</td>
<td headers="col4_1" style="width:13%; text-align:right" >
</td>
<td headers="col5_1" style="width:13%; text-align:right" >
</td>
<td headers="col6_1" style="width:13%; text-align:right" >
4889.0</td>
</tr>

I need to extract the number (in this case the 4,889) from each table row where the first tag (in this case LETTER) equals on or two values. I will then write that number value to a text file.
can anyone suggest a better method to accomplish this?

Thanks again

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

parsing html

dinger2121

Programmer

ishnid

Programmer

dinger2121

Programmer

dinger2121

Programmer

Similar threads

Part and Inventory Search

Sponsor