Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations MikeeOK on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Mmm.. "regex" for accented charachters.

Status
Not open for further replies.

youradds

Programmer
Jun 27, 2001
817
GB
Hi,

Got a bit of a weird issue here;

$_ => été
$desc => "test 123 un été à Tanger. élargir elargir ete"

$desc =~ s|\b\Q$_|<a href="$hit->{LinkURL}" title="$hit->{TheText}" target="$hit->{Target}">$_</a>|sg;

($hit is a hash from earlier on in my code :))

If I change $_ to "un" (i.e just normal letters), then the regex works absolutly perfect. However, with the accented charachters, it screws up - and doesn't do the search + replace :(

Anyone got any suggestions?

TIA

Andy
 
The problem is with the assertion [tt]\b[/tt]. Unless [tt]use locale;[/tt] is in effect, Perl does not recognize accented characters as being word ([tt]\w[/tt]) chars, so that word boundaries are here: [tt]\bun é\bt\bé à \bTanger[/tt].
Try adding [tt]use locale;[/tt] in the block containing those lines.

Franco
: Online engineering calculations
: Magnetic brakes for fun rides
: Air bearing pads
 
Hi,

Thanks for the reply. Still doesn't work though :(

Code:
sub GetAdvertsForLink {

    my $desc   = $_[0];
    my $link_id = $_[1];

    my $cat_tbl = $DB->table('CatLinks');
       $cat_tbl->select_options('LIMIT 1');

    my $cat_id = $cat_tbl->select( ['CategoryID'] , { LinkID => $link_id } )->fetchrow;

    my $cond = new GT::SQL::Condition;
       $cond->add('CatIDs','LIKE',"%,$cat_id,%");
       $cond->add('CatIDs','LIKE',"%,$cat_id");
       $cond->add('CatIDs','LIKE',"$cat_id,%");
       $cond->add('CatIDs','LIKE',"$cat_id");
       $cond->bool('OR');

    my @words;
   
    print $IN->header;

#  print qq|GOT CAT ID: $cat_id|;

#   use Unicode::MapUTF8 qw(to_utf8 from_utf8 utf8_supported_charset); 

   my $sth = $DB->table('SponsorLinkText')->select( $cond ) || die $GT::SQL::error;
   while (my $hit = $sth->fetchrow_hashref) {

        my (@words) = split /,/, $hit->{Words};
        $hit->{Target} ||= '_blank';
        foreach (@words) {

           print qq|Looking for "$_" in "$desc" <br />\n|;


          if ($hit->{TheText}) {
    		$desc =~ s|\b$_|<a href="$hit->{LinkURL}" title="$hit->{TheText}" target="$hit->{Target}">$_</a>|sg; 
          } else {
    		$desc =~ s|\b$_|<a href="$hit->{LinkURL}" target="$hit->{Target}">$_</a>|sg;
          }
        }
   }

   return $desc;
}

..and the text at the top of the file:

Code:
use strict;
use CGI::Carp qw(fatalsToBrowser);
use GT::Base;
use GT::Plugins qw/STOP CONTINUE/;
use Links qw/:objects/;
use Data::Dumper;
use locale;

(this is for a script, so its actually a "plugin", which just gets executed in Perl - so the headers but before all that lot is:

Code:
package Plugins::SponsorText;

(not sure if that makes a difference though =))

The above code prints out:

Code:
Looking for "elargir" in "test 123 un été à Tanger. élargir elargir ete" <br>
Looking for "ete" in "test 123 un été à Tanger. élargir <a href="[URL unfurl="true"]http://www.sudimedia.com"[/URL] title="123" target="_parent">elargir</a> ete" <br>
Looking for "été" in "test 123 un été à Tanger. élargir <a href="[URL unfurl="true"]http://www.sudimedia.com"[/URL] title="123" target="_parent">elargir</a> <a href="[URL unfurl="true"]http://www.sudimedia.com"[/URL] title="123" target="_parent">ete</a>" <br>
Looking for "élargir" in "test 123 un été à Tanger. élargir <a href="[URL unfurl="true"]http://www.sudimedia.com"[/URL] title="123" target="_parent">elargir</a> <a href="[URL unfurl="true"]http://www.sudimedia.com"[/URL] title="123" target="_parent">ete</a>" <br>

Looking for "un" in "test 123 un été à Tanger. élargir <a href="[URL unfurl="true"]http://www.sudimedia.com"[/URL] title="123" target="_parent">elargir</a> <a href="[URL unfurl="true"]http://www.sudimedia.com"[/URL] title="123" target="_parent">ete</a>" <br>

(as you can see, the normal words are done - just not the accented ones :()

TIA!

Andy
 
It works for me with [tt]use locale;[/tt] and doesn't without it. Try putting [tt]use locale;[/tt] after the [tt]sub[/tt] statement or write a small test script (as I did). Don't know how locale settings are managed by perl, though.

Franco
: Online engineering calculations
: Magnetic brakes for fun rides
: Air bearing pads
 
Hi,

Mmm.. odd :/

Will try it with a test script tomorrow - as its definatly not working ATM.

Thanks for the help though :)

Cheers

Andy
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top