Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations wOOdy-Soft on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Monitor web site for software updates.

Status
Not open for further replies.
Nov 9, 2005
4
US
Hey All,

Here's the site that I'm trying to monitor.
I tried using the following shell script but "diff" doesn't play nicely with the bit changes that the includes and php scripts output.

Code:
#!/bin/bash

SAVE=/var/spool/iss/archive
TEMP=/tmp/RSNS.php.$$  
UPDATE=/var/spool/iss/update
MIMELITE=/home/iss/bin/mimelite.pl
PAGE=[URL unfurl="true"]http://www.iss.net/db_data/xpu/RSNS.php[/URL]

wget -k -E -N -O $TEMP $PAGE
echo "finished retrieving files..."

diff -l $TEMP $SAVE > $UPDATE
echo "new vs. old analysis complete..."

if [ -s $UPDATE ]
	then
   perl $MIMELITE
fi
echo "emailing any updated content..."

rm $SAVE
mv $TEMP $SAVE
echo "archiving... ;)"

My main reason for want to use Perl is its ability to parse specific html tags with regexp's. Take this html code snip from the page for example:

Code:
<noscript>
<h4 style="color: #CC0000;">
This site is best experienced with a JavaScript enabled browser.
Either your browser does not support JavaScript or JavaScript is turned off.
You will need to use our <a href="[URL unfurl="true"]http://www.iss.net/sitemap.php">site[/URL] index</a> to navigate our web site.
If you encounter any problems please contact the
<a href="[URL unfurl="true"]https://www.iss.net/webForm.php?to=Webmaster">Webmaster</a>.[/URL]
</h4>
<img src="[URL unfurl="true"]http://shared.iss.net/images/template/nsb.gif"[/URL] width="1" height="1" alt="nsb" border="0">
</noscript>
</td>
</tr>
</table>
<!--
<table width="100%" cellpadding="0" cellspacing="0" border="0">
<tr valign="top">
	<td width="191"><img src="[URL unfurl="true"]http://shared.iss.net/images/spacer.gif"[/URL] width="191" height="1"></td>
	<td>
	<table width="573" cellpadding="0" cellspacing="0" border="0" align="center">
	<tr><td>-->
<h2>Network Sensor XPUs</h2>
<table class="description" width="90%" cellspacing="1" cellpadding="15" align="center"><TR><TD WIDTH="10%" ALIGN="center" class="descriptionHeader"><b>XPU Number</b></TD><TD WIDTH="10%" ALIGN="center" class="descriptionHeader"><b>Date</b></TD><TD ALIGN="center" class="descriptionHeader"><b>Description</b></TD><TR>
<TD ALIGN=CENTER class="description" ALIGN="center"><A HREF="[URL unfurl="true"]http://www.iss.net/db_data/xpu/RSNSNS_24.22.php"><b>NS_24.22</b></A></TD><TD[/URL] class="description" ALIGN="center">11-23-2005</TD><TD class="description">This critical Content Update XPU (X-Press Update?), featuring
Virtual Patch? technology, contains 1 new event and provides
protection for an Internet Explorer vulnerability that allows
remote code execution. For more information about this
vulnerability, see the following X-Force Alert:
[URL unfurl="true"]http://xforce.iss.net/xforce/alerts/id/209</TD>[/URL]
</TR>
<TR>
<TD ALIGN=CENTER class="description" ALIGN="center"><A HREF="[URL unfurl="true"]http://www.iss.net/db_data/xpu/RSNSNS_24.21.php"><b>NS_24.21</b></A></TD><TD[/URL] class="description" ALIGN="center">11-08-2005</TD><TD class="description">This Content Update XPU (X-Press Update?), featuring Virtual
Patch? technology, contains 12 new events, for vulnerabilities in
Microsoft's Windows operating system listed in MS Bulletin
MS05-053 and a vulnerability in Kaspersky anti-virus, Mozilla and
Oracle. Also in this XPU are 13 security content updates and 10
new blocking responses.</TD>
</TR>
<TR>
<TD ALIGN=CENTER class="description" ALIGN="center"><A HREF="[URL unfurl="true"]http://www.iss.net/db_data/xpu/RSNSNS_24.20.php"><b>NS_24.20</b></A></TD><TD[/URL] class="description" ALIGN="center">10-21-2005</TD><TD class="description">CRITICAL: This Content Update XPU (X-Press Update?), featuring
Virtual Patch? technology, contains 4 new events, for exploits
released in the wild that target, CA Unicenter, HP-UX Line Printer
Daemon, Veritas Netbackup, and the RSA secure webagent. Also in
this XPU are 3 security content updates and 3 new blocking responses.</TD>
</TR>

I'm only concerned with anything between <h2>Network Sensor XPUs</h2> and the FIRST </TR> that occurs after that. I trided using a regexp to parse it our ($html =~ m|<h2>(.+)</TR>|s;) but that ended up matching everything from <hr> to the very LAST </TR>.:( Anyhelp is greatly appreciated.

Thanks!

- binaryechoes
 
The simplist way to do this with a regex would be to use the non-greedy operator. Try something like:
Code:
$html =~ m|<h2>(.+[b][blue]?[/blue][/b])</TR>|s;
You might also want to looking into the HTML parsing modules on CPAN, they could make getting the info you want out of the web page a lot easier.
 
Thanks rharsh !. One freakin character away... So....here's what I came up with. Problem now is I can't get LWP::Simple to retain the "full" URL to be sent to the output. It links to /RSNSNS_24.22.php instead of Anyone have any suggestions as to the best method to insert text into a file with perl. Specifically after <A HREF=" in the html. I tried using one of the methods described on this site but, couldn't get it to work.

Code:
#!/usr/bin/perl 

use LWP::Simple;
use File::Compare;
use File::Copy;
use MIME::Lite;

$page = get("[URL unfurl="true"]http://www.iss.net/db_data/xpu/RSNS.php");[/URL]
$html = $page;
$temp = "/var/spool/iss/test/temp";
$archive = "/var/spool/iss/test/archive";
$mimelite = "/var/spool/iss/test/mimelite.pl";

#sanitize output
$html =~ m|</h2>(.+?)</TR>|s;

open OUTPUT, ">$temp";
print OUTPUT "$1\n";
close OUTPUT;

if (compare("$temp","$archive") == 1) {
### Create the multipart "container":
    $msg = MIME::Lite->new( 
                 To      =>'user@domain.com',
                 Subject =>'ISS Sensor Updates',
                 Type    =>'multipart/related'
		 );

    $msg->attach(Type => 'text/html',
                 Data => "New ISS Sensor updates are available.<br>"
                 );

    $msg->attach(Type     =>'text/html',
                 Path     => $temp,
                 Filename =>'updates',
		 Disposition => 'inline'
		 );

    $msg->send();
}

unlink $archive;
move($temp, $archive);

__END__
 
In regards to Tie::File... here's my problem. It doesn't insert the text from $URL into the html file ($temp). Here's my code from that uses Tie::File.

Code:
for (@lines) { 
   if (/<A HREF="/) {
     $_ .= $URL;
    last;
    }
   }

Any help is greatly appreciated.

Thanks!!!
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top