×
INTELLIGENT WORK FORUMS
FOR COMPUTER PROFESSIONALS

Log In

Come Join Us!

Are you a
Computer / IT professional?
Join Tek-Tips Forums!
  • Talk With Other Members
  • Be Notified Of Responses
    To Your Posts
  • Keyword Search
  • One-Click Access To Your
    Favorite Forums
  • Automated Signatures
    On Your Posts
  • Best Of All, It's Free!
  • Students Click Here

*Tek-Tips's functionality depends on members receiving e-mail. By joining you are opting in to receive e-mail.

Posting Guidelines

Promoting, selling, recruiting, coursework and thesis posting is forbidden.

Students Click Here

Jobs

UTF-8 , Unicode, HTML & CSV - no consitency?

UTF-8 , Unicode, HTML & CSV - no consitency?

UTF-8 , Unicode, HTML & CSV - no consitency?

(OP)
Hi,

I seem to be going round in circles trying to understand when I need to encode to UTF-8 to get special characters to show correctly (specifically the GBP symbol £)

I have a reporting class that formats monetary values with the 'Locale::Currency::Format' module...

CODE

$my_value = currency_format('gbp', $my_value, FMT_SYMBOL); 

My understanding is this module auto converts the currency symbol to Unicode "/x{00A3}" , which according to the codepoint.net site (if I'm reading it correctly), is UTF-16. (00A3)

But I tried to decode as UTF-16 and the encode module just bombs with

Quote:

"UTF-16:Unrecognised BOM 2249 at C:/Perl/site/lib/Encode.pm line 175."

So I tried simply encoding to UTF-8 and outputting to HTML and it displays correctly

CODE

$self->_encode($my_string, 'UTF-8') 
Great, however, if I then try to output to a CSV text file the browser just hangs and no file is downloaded.

So I removed from the output code

CODE

$iof->binmode(":encoding(UTF-8)"); 
and I get the CSV output but with funny characters...

Quote:

£
, so I remove the encode to UTF-8 but keep the binmode output encoding to UTF-8 but that outputs

Quote:

\xA3
Which according to codepoint.net is Perl.

So I remove both encoding and the binmode output formatting and bingo, I get a GBP pound sign in my CSV.

However, if I remove the encode to UTF-8 before outputting the HTML I get

Quote:


I'm baffled, do I or don't I need to encode before outputting, what formatting is my string currently, what am I meant to convert it to and when?

your help is appreciated.
1DMF

"In complete darkness we are all the same, it is only our knowledge and wisdom that separates us, don't let your eyes deceive you."

"If a shortcut was meant to be easy, it wouldn't be a shortcut, it would be the way!"
Free Electronic Dance Music

RE: UTF-8 , Unicode, HTML & CSV - no consitency?

Sounds like the hassle I had a while back with a similar issue.

I had 2 individual websites with the same ISP and I used a common Perl template on both sites.
One site displayed £ signs correctly and the other displayed a question mark in a diamond. The ISP said it was an encoding problem and I insisted it was a server setup problem otherwise both sites would do exactly the same thing.

They kept coming back to the same solution which was to use the £ entity but I wanted to get to the root cause. I got not joy from them so moved to another ISP and I am pleased to report I have not encountered the problem since.

Keith
www.studiosoft.co.uk

RE: UTF-8 , Unicode, HTML & CSV - no consitency?

(OP)
Hi Keith,

Well the IIS server is ours, so if you think it is a configuration issue, do you have any idea what needs changing?

What I don't understand is the inconsistency of needing and not needing to encode and the differing types of symbol you end up with.

Perhaps it's relative to the application opening the content, I.E. vs Excel?

The local devel server (part of Catalyst) when you encode to UTF-8 with the Encode module and output via IO::File using binmode ':encoding(UTF-8)' the CSV is delivered and the pound signs shows correctly.

However, this mechanism crashes IIS7.5, the download freezes, the doc shows zero bytes and eventually fails to be delivered?

I feel this could be an IIS7.5 file delivery issue, but not sure why and what is causing it, normal static CSV/XLS works fine, so perhaps it is Catalyst and the way it prints to STDOUT using UTF-8 which IIS is having trouble with, though it only seems to be for this dynamically created CSV file delivery, as mentioned, outputting HTML as UTF-8 works perfectly fine.

ponder

"In complete darkness we are all the same, it is only our knowledge and wisdom that separates us, don't let your eyes deceive you."

"If a shortcut was meant to be easy, it wouldn't be a shortcut, it would be the way!"
Free Electronic Dance Music

Red Flag This Post

Please let us know here why this post is inappropriate. Reasons such as off-topic, duplicates, flames, illegal, vulgar, or students posting their homework.

Red Flag Submitted

Thank you for helping keep Tek-Tips Forums free from inappropriate posts.
The Tek-Tips staff will check this out and take appropriate action.

Reply To This Thread

Posting in the Tek-Tips forums is a member-only feature.

Click Here to join Tek-Tips and talk with other members! Already a Member? Login

Close Box

Join Tek-Tips® Today!

Join your peers on the Internet's largest technical computer professional community.
It's easy to join and it's free.

Here's Why Members Love Tek-Tips Forums:

Register now while it's still free!

Already a member? Close this window and log in.

Join Us             Close