redstarfcs1
Programmer
How to read metatags without any perl module?
Follow along with the video below to see how to install our site as a web app on your home screen.
Note: This feature may not be available in some browsers.
[b]#!/usr/bin/perl[/b]
undef $/;
$_ = <DATA>;
$/ = "\n";
@metatags = m/(<meta[^>]+>)/ig;
foreach (@metatags) {
($name, $content) = m/name="?([^"]+)"? content="?([^"]+)"?/;
print "$name\t$content\n";
}
[blue]__DATA__
<HEAD>
<TITLE>Stamp Collecting World</TITLE>
<META name="description" content="Everything you wanted to know about stamps, from prices to history.">
<META name="keywords" content="stamps, stamp collecting, stamp history, prices, stamps for sale">
</HEAD>[/blue]
#!perl
use strict;
$_ = do {undef $/; <DATA>};
my $len = index(lc($_),'</head>');
$_ = substr $_,0,$len;
my @metatags = m/(<meta[^>]+>)/ig;
foreach (@metatags) {
my ($name, $content) = m/name="?([^"]+)"? content="?([^"]+)"?/;
print "$name\t$content\n";
}
__DATA__
<HEAD>
<TITLE>Stamp Collecting World</TITLE>
<META name="description" content="Everything you wanted to know about stamps, from prices to history.">
<META name="keywords" content="stamps, stamp collecting, stamp history, prices, stamps for sale">
</HEAD>
<BODY>
bunch of stuff down here we shouldn't need to parse
</BODY>
</HTML>
($sep, $name, $sep, $content) = m/name=(["']?)(.+)\1 content=(["']?)(.+)\3/i;
($sep, $name, $sep, $content) = m/name=(["']?)([red][^\1][/red]+)\1 content=(["']?)([red][^\3][/red]+)\3/i;
local undef $/;
my ($text) = <DATA>;
my ($headers) = $text =~ m[<head>(.*)</head>]is;
while ($headers =~ m!<META\s+(\w+)\s*=\s*['"]([^'"]+)['"]\s+(\w+)\s*=\s*['"]([^'"]+)['"]!isg) {
print "$2 - $4", "\n";
}
#!/usr/bin/perl -w
use strict;
use HTML::TokeParser::Simple;
# create a new parser
my $p = new HTML::TokeParser::Simple( 'test.html' );
# just want to keep track of how many meta tags we've seen
my $tagno = 1;
# loop through all the tokens in the page
while ( my $t = $p->get_token ) {
# if we've found a meta tag
if ( $t->is_start_tag( 'meta' ) ) {
# print some information
print 'Tag #'.$tagno++."\n";
# get the attributes from the tag
my $attrs = $t->get_attr;
# print the attributes
print "$_ - $attrs->{ $_ }\n" for ( keys %$attrs );
# print a line to distinguish it from the next tag
print '-' x 30, "\n";
}
}
Just as a sad conclusion, Duncan, I sucked a lot with such things. Some people prefer to use single quotes in HTML tag attributes, other ones use different cases in attribute names.
How to read metatags without any perl module?
Use eyes.How to read metatags without any perl module?
[b]#!/usr/bin/perl[/b]
undef $/;
$_ = <DATA>;
$/ = "\n";
@metatags = m/(<meta[^>]+>)/ig;
foreach (@metatags) {
if (s/(name|content)=['"]?([^'"]+)['"]? (name|content)=['"]?([^'"]+)['"]?/if(lc($1)eq"name"){$hash{name}=$2;$hash{content}=$4}else{$hash{name}=$4;$hash{content}=$2}/ie) {
print $hash{name} . "\t" . $hash{content} . "\n";
}
}
[blue]__DATA__
<META Name="name1" Content='Everything you wanted to know about stamps, from prices to history.'>
blah blah blah
<META Content="Everything you wanted to know about stamps, from prices to history." Name='name2'>
blah blah
<META NAME='name3' CONTENT="stamps, stamp collecting, stamp history, prices, stamps for sale">
blah blah blah blah
<META CONTENT='stamps, stamp collecting, stamp history, prices, stamps for sale' NAME="name4">[/blue]
[b]#!/usr/bin/perl[/b]
undef $/;
$_ = <DATA>;
$/ = "\n";
@metatags = m/(<meta[^>]+>)/ig;
foreach (@metatags) {
if (s/(name|content)=['"]?([^'"]+)['"]? (name|content)=['"]?([^'"]+)['"]?/$hash{lc($1)}=$2;$hash{lc($3)}=$4/ie) {
print $hash{name} . "\t" . $hash{content} . "\n";
}
}
[blue]__DATA__
<META Name="name1" Content='Everything you wanted to know about stamps, from prices to history.'>
blah blah blah
<META Content="Everything you wanted to know about stamps, from prices to history." Name='name2'>
blah blah
<META NAME='name3' CONTENT="stamps, stamp collecting, stamp history, prices, stamps for sale">
blah blah blah blah
<META CONTENT='stamps, stamp collecting, stamp history, prices, stamps for sale' NAME="name4">[/blue]