Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Chriss Miller on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Reg.exp help

Status
Not open for further replies.

Masali

Programmer
Jun 19, 2002
143
SE
I am trying to make a match for a table that contains several tags, but always contains the sentence "Hello World"

I have been trying to match this with:
Code:
<table .*>[\w\s\t\r\n\d\D]*?Hello World[\w\s\t\r\n\d\D]*?(?!table)<\/table>

on the below html:

Code:
<html>

<head>
<meta http-equiv="Content-Type" content="text/html; charset=windows-1252">
<title>Example</title>
</head>

<body topmargin="0" leftmargin="0" bgcolor="#CC3399">

<div align="center">
	<table id="table3" style="border: 2px solid #000000" cellSpacing="1" width="800" bgColor="#ffffff" border="1">
		<tr>
			<td><center><img border="0" src="logo.gif" width="800" height="190">
			<table id="table4" style="font-size: 16px; font-family: Verdana, Arial, Helvetica, sans-serif" border="0">
				<tr>
					<td><hr width="190" color="#000000" SIZE="1"></td>
					<td></td>
					<td><hr width="190" color="#000000" SIZE="1"></td>
				</tr>
			</table>
			<br>
			<font size="+2"><a href="a.html">A</a> ~
			<a href="b.html">B</a></font><br>
			<br>
&nbsp;<table style="border: 2px dotted #7b68ee" width="760" bgColor="#ffffff" border="0" id="table5">
				<tr>
					<td align="middle"><br>
					<font size="+1">Hello World<br>
					<a href="test">
					<img src="logo.gif" border="0"></a></font> 
					<p><br>
					<br>
&nbsp;</td>
				</tr>
			</table>
			</center></td>
		</tr>
	</table>
</div>

</body>

</html>

but it matches too much:

Code:
<table id="table3" style="border: 2px solid #000000" cellSpacing="1" width="800" bgColor="#ffffff" border="1">
        <tr>
            <td><center><img border="0" src="logo.gif" width="800" height="190">
            <table id="table4" style="font-size: 16px; font-family: Verdana, Arial, Helvetica, sans-serif" border="0">
                <tr>
                    <td><hr width="190" color="#000000" SIZE="1"></td>
                    <td></td>
                    <td><hr width="190" color="#000000" SIZE="1"></td>
                </tr>
            </table>
            <br>
            <font size="+2"><a href="a.html">A</a> ~
            <a href="b.html">B</a></font><br>
            <br>
&nbsp;<table style="border: 2px dotted #7b68ee" width="760" bgColor="#ffffff" border="0" id="table5">
                <tr>
                    <td align="middle"><br>
                    <font size="+1">Hello World<br>
                    <a href="test">
                    <img src="logo.gif" border="0"></a></font> 
                    <p><br>
                    <br>
&nbsp;</td>
                </tr>
            </table>

The table i want to match is:

Code:
<table style="border: 2px dotted #7b68ee" width="760" bgColor="#ffffff" border="0" id="table5">
                <tr>
                    <td align="middle"><br>
                    <font size="+1">Hello World<br>
                    <a href="test">
                    <img src="logo.gif" border="0"></a></font> 
                    <p><br>
                    <br>
&nbsp;</td>
                </tr>
            </table>

What am I doing wrong with the regexp??

Would really appreciate som help from a regexp guru!

/Masali
 
Well, I'm not a regexp guru, but I recently had to wrestle with the issue of greedy regexp expressions that took too much, so it's fresh in my mind. How about something like this?
Code:
<table[^(<table)]*Hello World[^(<table)]*</table>
The key is to allow anything that would be inside a table except another table. I'm not sure whether the "anything but" clause should check for the opening tag or the closing tag - in my code (which is looking for stuff in [brackets]), either the opening or the closing works. I'm also not completely sure that parentheses is what you need to tell regexp that the "^" should apply to several characters instead of just one, since in my code it's a single character ("[^\[]" or "[^\]]"). But try it out and see.

However, if you want to also check that your table is well-formed (having proper tr's and td's and such), it would be much more complex. I'm just assuming all you want is "Hello World" in one and only one table.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top