Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Chriss Miller on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

encoding of <96> throws error. Th

Status
Not open for further replies.

SirCharles

Programmer
Joined
Jun 10, 2002
Messages
212
Location
US
encoding of <96> throws error. This is an ndash:


I'm attempting to use sqlldr. Before the text is
loaded, I'm encoding all chars with following:

s/([^A-Za-z0-9_',])/sprintf(&quot;%%%02X&quot;,ord($1))/ego;

This throws following error:

Malformed UTF-8 character (unexpected continuation byte 0x96, with no preceding start byte) in substitution iterator at test.pl line 65, <IN> line 1.
 
I'm not up to snuff on regex's but don't you have to include <96> in your search string? Also, I think 96 is grave accent rather than an ndash.

There's always a better way. The fun is trying to find it!
 
Take this with a grain of salt it is totally a assumption

I thought that encoded is very document oriented characters like word docs and html and claris works all use simular but differant encoding in there docs. for example:

doc -> &#34, html ->&quot; both are parentheses
now if you converting html &quot; by your substitution would throw a error because it does not see &quot; it sees &quot;

once again take with salt I am just throwing this out there
 
sorry about that this was what I was getting at about the encoding
ascii &#38;#34;
html &#38;quot;
 
Solution:
Used utf8::encode($_) to remove cause of malformed error.
and encode_entities($_) to get ndashes to html format.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top