While you guys progressed in the solution(s) I worked on my own little solution. Since I got it working and it's there I'll let you take a look, even though it's too late:
[tt]
BEGIN {
# now works with different style HTML e.g.
# <A HREF = "
# <a href="
}
/[Hh][Rr][Ee][Ff]/ {
# a line contains 'href='
for ( i=1; i<=NF; i++ ) {
# for each field in that line
if ( $i ~ /^[Hh][Rr][Ee][Ff]$/ ) {
lastfield = $i
print "1 = " $i
continue
}
if ( (lastfield ~ /^[Hh][Rr][Ee][Ff]$/) && ($i == "="

) {
lastfield = $i
print "2 = " $i
continue
}
if ( lastfield == "=" ) {
lastfield = ""
print "3 = " $i
split( $i, temp_array, "\"" )
link = temp_array[2]
link_array[link]++
}
if ( $i ~ /[Hh][Rr][Ee][Ff]=/ ) {
split( $i, temp_array_1, ">" )
split( temp_array_1[1], temp_array_2, "=" )
gsub( "\"", "", temp_array_2[2] )
link = temp_array_2[2]
link_array[link]++
}
}
}
END {
print "SUMMARY"
for ( link in link_array )
printf( "%5d times link \"%s\"\n", link_array[link], link )
}
[/tt]
For your other question I have a solution too, because I had a similar problem.
Lets say you have 10 HTML documents you want to run your script for, simply use a shell script/command line like:
[tt]
find ./ -name *.html -exec cat {} \;| awk -f MYAWK.awk
[/tt]
This will summarize all 10 files (assuming they all have file endings "html"

. Giving you ONE summary.
If you want a per file summary, then use:
[tt]
find ./ -name *.html | while read filename
do
cat filename | awk -f MYAWK.awk
done
[/tt]
This will give you TEN summaries.