×
INTELLIGENT WORK FORUMS
FOR COMPUTER PROFESSIONALS

Log In

Come Join Us!

Are you a
Computer / IT professional?
Join Tek-Tips Forums!
  • Talk With Other Members
  • Be Notified Of Responses
    To Your Posts
  • Keyword Search
  • One-Click Access To Your
    Favorite Forums
  • Automated Signatures
    On Your Posts
  • Best Of All, It's Free!
  • Students Click Here

*Tek-Tips's functionality depends on members receiving e-mail. By joining you are opting in to receive e-mail.

Posting Guidelines

Promoting, selling, recruiting, coursework and thesis posting is forbidden.

Students Click Here

Jobs

Little tricks

How do I make hyperlinks out of urls in plain text? by icrf
Posted: 27 Mar 04 (Edited 27 Dec 04)

Taken and edited from this thread: Thread219-751655

CODE

use strict;
use warnings;

#URI specification from RFC 2396 : http://www.ietf.org/rfc/rfc2396.txt
my $reserved = ';?:@&=+\\$,';            #removed / from spec
my $unreserved = q{\\-_!~*'()a-zA-Z0-9}; #removed . from spec
my $escaped = '%[a-fA-F0-9]{2}';

#a single allowed character or a complete escape sequence
my $allowed_all = qr{(?:[$unreserved$reserved./]|(?:$escaped))};
my $allowed_limited = qr/(?:[$unreserved$reserved]|(?:$escaped))/;

#optionally, could add gopher: mailto: news: telnet:
my $start = qr{(?:http:|www.|ftp:)};

#easy one is where it starts with something predictable from $start
my $easy = qr/$start$allowed_all+/;

#attempts to match domain. multiple times, then a 2 or 3 character top level domain
#examples: com edu uk jp
#there are some exotic tld's that this wouldn't catch, like .museum or .info,
#but they're pretty rare yet so I'll ignore them
my $hard = qr/(?:$allowed_limited+\.)+$allowed_limited{2,3}(?:\/$allowed_all*)?(?!$allowed_all)/;

#make a little test suite of link-like and not-link-like lines to run it on
$_ = <<'EOF';
It needs to recognize, links in the following patterns:

subdomain.mydomain.com/page1.html
http://subdomain.mydomain.com/page1.html
www.mydomain.com/page1.html
http://www.mydomain.com/page1.html
subdomain.mydomain.com

Any ideas?

But, certain other combinations are being filtering into hyperlinks as well, such as:

>.<
Q.Letter

Some combinations don't, like the following:
!.!
A.B

Any thoughts?
EOF

#find an easy one or a hard one
s/($easy|$hard)/<a href="$1">$1<\/a>/g;
print;

It wrapped in html tags there at the end, but you can make whatever wrapper you wish. There's a module on cpan called HTML::FromText that does the most common cases, as well as a slew of other things. I'd probably suggest using it unless it doesn't pick up as much as you'd like (and this does).

For completeness, here's the DS thread where it was originally created and discussed.

Feedback is welcome.

Back to Perl FAQ Index
Back to Perl Forum

My Archive

Close Box

Join Tek-Tips® Today!

Join your peers on the Internet's largest technical computer professional community.
It's easy to join and it's free.

Here's Why Members Love Tek-Tips Forums:

Register now while it's still free!

Already a member? Close this window and log in.

Join Us             Close