Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations MikeeOK on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

UTF8 jungle and .NET 1

Status
Not open for further replies.

mrdance

Programmer
Apr 17, 2001
308
SE
I really need help to understand and sort out my problem with "foreign" characters. Please read this if you now something about this.

I have made some webscripts in php which some translators use. It is some forms which posts to a script which insert the text into a MySQL database 4.1.11 on windows with UTF8 standard.

After insertion I use a VB.Net program which selects (using MySQL provider with charset=UTF8 in the connection) the "sentences/words". When selected I use the XMLTextWriter to write the content to a file using encoding UTF8.

My program, which was needed to be translated, use this language file by reading it in with UTF8 encoding.

Here comes some questions and what I don't understand.

1. Can I display the strings right into a label for example or do I have to convert the UTF8 into the codepage of the language?

I have tried different approaches: converting, using different fonts etc. I don't know if the error lies in the database, the files, or in my program showing the strings. I have examples for you, a russian and a japanese file:


2. Please tell me if these files look ok and is usable r the error is in the files.

I have succeded with swedish characters when I converted UTF8 to codepage 1252 which swedish belongs to. On the other hand is my whole OS and National Settings in swedish so I don't know if that way is successful.

Please give me all your ideas and forgive me if this text is long! Thanks again! / Henrik

--- neteject.com - Internet Solutions ---
 
Once you read your value from the database, it gets stored in a .net string, which is UCS-2 Unicode. You can then do anything you want with it, including displaying it in a label. No encoding or codepage mess needed -- the database provider does the conversion for you.

Chip H.




____________________________________________________________________
If you want to get the best response to a question, please read FAQ222-2244 first
 
It does not convert the characters in the form for me. Wonder why it doesn't. What I have read so far, is that this file is correct (swedish language file):


Do you have any idea why I see the "raw" format, like "Ã¥" instead of "å" in my form? Has it anything todo with national settings, project settings? I have tried different fonts said to be supporting unicode.

--- neteject.com - Internet Solutions ---
 
This indicates that you're reading the string as two bytes, and not a single character. I would start at the beginning -- I'm not a MySQL expert, but you should be able to load one of it's database files in a hex editor and find your row. Make sure it's stored as two bytes (or more -- UTF-8 is a multi-byte encoding.)

Next, make sure the managed database provider is reading & converting it correctly. When you inspect the value of that column, you should have a string which is the UCS-2 equivalent of the bytes you saw earlier. You should see the single two-byte UCS-2 value for that character. Inspecting it in the watch window should show the single character.

And just follow it through the calls from there.

Chip H.


____________________________________________________________________
If you want to get the best response to a question, please read FAQ222-2244 first
 
I really appreciate your help!

Now the westeuropean languages look ok. It was an encoding that was hiding.

I am now working with greek, russian and japanese. I am focusing on russian right now. On the webpage my russian characters look ok (with Kyrillian characterset). In forms and in the file I download I see the russian characters in westeuropean encoding, for example: "Äîáàâèòü çàäàíèå" when it should be: "???????? ???????".

I don't know how I can get it right in the file from the beginning. I have tried encoding the xml in 1251 encoding, encoding the mysql characters in 1251, and doing nothing (which resulted in my default characterset, westeuropean encoding).

Any ideas?

--- neteject.com - Internet Solutions ---
 
Sorry - all that's left is to start at the source of the data and work your way forward, making sure the encoding is correct at each step.

Chip H.


____________________________________________________________________
If you want to get the best response to a question, please read FAQ222-2244 first
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top