I am parsing a page from Google that shows you when the last time the page was cached. That's just fine but the problem is, I am also taking the source code from the cached page and trying to show the user what their page looked like when Google saw it.
On some pages it works (like mine!), but for some lazy coders out there who used partial URLs in their links and images, it fails because it's looking on MY server for their stuff (rrrr).
What I tried to to was come up with a few regexes and force the full url on to any links (I'll work with images when this is done and working). The script runs without errors but unfortunately, the source code doesn't change. there is a link that's <a href="page.html">click here</a> and it won't build the domain onto it.
Any help with fixing this (without pushing me towards a module)?
<code>
my $url_no_slash = $url;
my $url_with_slash = "$url/";
$url =~ s/^http//;
##########
# LWP
##########
my $ua = LWP::UserAgent->new();
$ua->agent("");
my $parse_url = " my $content = $ua->get($parse_url)->content();
my @content = split(/\n/, $content);
foreach my $key (@content)
{
if ($key =~ m/<a href="([^"]+)"> /gi)
{
my $link = $1;
print "TEST: $1";
if ($link !~ m/$url/i)
{
# If our $link doesn't contain our
# original url, we need to build it
print "test1\n";
if ($link =~ m/^\//)
{
# If our $link begins with a slash
# we'll add the full url without a
# trailing slash
print "test2\n";
$key =~ s/$link/$url_no_slash/;
}
else
{
# Our $link doesn't begin with a slash
# so we'll have to add it ourselves
print "test3\n";
$key =~ s/$link/$url_with_slash/;
}
}
}
}
</code>
On some pages it works (like mine!), but for some lazy coders out there who used partial URLs in their links and images, it fails because it's looking on MY server for their stuff (rrrr).
What I tried to to was come up with a few regexes and force the full url on to any links (I'll work with images when this is done and working). The script runs without errors but unfortunately, the source code doesn't change. there is a link that's <a href="page.html">click here</a> and it won't build the domain onto it.
Any help with fixing this (without pushing me towards a module)?
<code>
my $url_no_slash = $url;
my $url_with_slash = "$url/";
$url =~ s/^http//;
##########
# LWP
##########
my $ua = LWP::UserAgent->new();
$ua->agent("");
my $parse_url = " my $content = $ua->get($parse_url)->content();
my @content = split(/\n/, $content);
foreach my $key (@content)
{
if ($key =~ m/<a href="([^"]+)"> /gi)
{
my $link = $1;
print "TEST: $1";
if ($link !~ m/$url/i)
{
# If our $link doesn't contain our
# original url, we need to build it
print "test1\n";
if ($link =~ m/^\//)
{
# If our $link begins with a slash
# we'll add the full url without a
# trailing slash
print "test2\n";
$key =~ s/$link/$url_no_slash/;
}
else
{
# Our $link doesn't begin with a slash
# so we'll have to add it ourselves
print "test3\n";
$key =~ s/$link/$url_with_slash/;
}
}
}
}
</code>