Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations TouchToneTommy on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Character Encoding Issue

Status
Not open for further replies.

bigbalbossa

Programmer
Mar 21, 2002
87
US
All,

I've run into an issue I have no experience with. I'm viewing a file in ultraedit that contains one name, UE shows this as (says U8-DOS format):

LILIANA MAGAÑA FRANCO

When i 'cat' this file on a unix box it looks like:

LILIANA MAGAÃA FRANCO

when i 'more' this file on a unix box it looks like:

LILIANA MAGAM-CM-^QA FRANCO

This is obviously a encoding issue, but does anybody have any ideas how to fix this? Unix or perl?

I've tried sed and tr, but can't get it right?
 
Hi

Try it and saves as ripout.pl in your unix server.

Then execute it

perl ripout.pl <filename>

It will save the file with the same name, Try it and see if it works ok.

Code:
$file =$ARGV[0];
$/=undef;
open FH, "<$file";
$perlcode=<FH>;
close FH;
$perlcode=~ s/\r//g;
open FH, ">$file";
print FH "$perlcode";
close FH;


dmazzini
GSM System and Telecomm Consultant

 
Thanks dmazzini...i finally figured out a way to do this using the Encode module:

require Encode;
use Unicode::Normalize;

open(FILE, "<$ARGV[0]") || die "Can't open file $ARGV[0]\n";
while(<FILE>) {
chomp;
#print "$_\n";

for ( $_ ) { # the variable we work on
## convert to Unicode first
} ## if your data comes in Latin-1, then uncomment:
#$_ = Encode::decode( 'iso-8859-1', $_ );
$_ = NFD( $_ ); ## decompose
s/\pM//g; ## strip combining characters
s/[^\0-\x80]//g; ## clear everything else
print "STR = $_\n";
}
 
or :

Code:
open(FILE, "<$ARGV[0]") || die "Can't open file $ARGV[0]\n";
while(<FILE>) {
    chomp;
    $_ = NFD( $_ );   ##  decompose
    s/\pM//g;         ##  strip combining characters
    s/[^\0-\x80]//g;  ##  clear everything else
    print "STR = $_\n";
}


------------------------------------------
- Kevin, perl coder unexceptional! [wiggle]
 
Status
Not open for further replies.

Similar threads

Part and Inventory Search

Sponsor

Back
Top