Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations derfloh on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

How can I serialize to a specific encoding?

Status
Not open for further replies.

mrdance

Programmer
Joined
Apr 17, 2001
Messages
308
Location
SE
How can I set the russian encoding format "cp1251" to my XML-streamwriter function below?

Public Sub SaveProject(ByVal p As Project)
Dim objStreamWriter As New StreamWriter(Application.StartupPath & "\" & p.FileName)
Dim x As New XmlSerializer(p.GetType)
Try
x.Serialize(objStreamWriter, p)
Catch ex As Exception
MsgBox(ex.ToString)
End Try
objStreamWriter.Close()
End Sub

--- neteject.com - Internet Solutions ---
 
There is no encoder provided for codepage 1251 in the .NET framework. To write your own, you would override the System.Text.Encoder abstract class, implementing your own version of it's public methods.

The two methods that you care most about are GetByteCount() and GetBytes(). In GetByteCount() you would tell the caller how many bytes would be taken up by the string they pass you. In GetBytes() you would actually do the conversion, converting from an array of chars (.NET Unicode characters) to an array of bytes (the equivalent cp1251 characters).

Chip H.


____________________________________________________________________
If you want to get the best response to a question, please read FAQ222-2244 first
 
OK, I don't know anything about encoding really. I have some russian words in a mysql-table, fetches them, and then write an xml-file with the content. Seems like somebody must have "overridden" that class before. I have no idea of how the encoding works. Have to search the net :)

--- neteject.com - Internet Solutions ---
 
Can't you just use this in the serialize function in some way?

Dim encoding As System.Text.Encoding = System.Text.Encoding.GetEncoding(1251)

--- neteject.com - Internet Solutions ---
 
I think I found a way for it to work:
Code:
// The Encoding constructor will take an integer, which
// corresponds to your codepage.
Encoding CP1251Enc = Encoding.GetEncoding(1251);
XmlTextWriter xtw = New XmlTextWriter("c:\\test.xml", CP1251Enc);

xtw.WriteStartDocument();
xtw.WriteStartElement("root");
xtw.WriteElementString("FirstChild", Customer.Name);
xtw.WriteElementString("SecondChild", XmlConvert.ToString(Customer.Balance));
xtw.WriteEndElement();
xtw.Close();
Note that the XmlConvert class allows you to write native .NET datatypes to an XML file. So if you read a Customer record from the database, store it in a Customer object (with strong datatyping), you can write it to the XML file with strong datatypes as well.

Chip H.


____________________________________________________________________
If you want to get the best response to a question, please read FAQ222-2244 first
 
Thanks to mrdance, who was a little faster on the submit button!

I'm pretty sure that the XmlSerialize will only give you so-so results. I've not had much luck with it, as it won't recurse through any nested objects you might have in your class.

Chip H.


____________________________________________________________________
If you want to get the best response to a question, please read FAQ222-2244 first
 
Hi again Chiph,

I have been struggling with this encoding thing. This code seem to work but I know to little about the result:

Dim encoding As System.Text.Encoding = System.Text.Encoding.GetEncoding(1251)
Dim i As Integer
For i = 0 To LangDic.Count - 1

Dim l As Language
l = CType(LangDic(i), Language)
l.DateCreated = Now
Dim ms As New MemoryStream
Dim xml As New XmlTextWriter(l.FileName, encoding)
xml.Formatting = Formatting.Indented
Dim saveXML As New XmlSerializer(l.GetType())
saveXML.Serialize(xml, l)
ms.Seek(0, System.IO.SeekOrigin.Begin)
Dim reader As New StreamReader(ms)
Dim buf As String = reader.ReadToEnd()
xml.Close()
reader.Close()
Try
Catch ex As Exception
MsgBox(ex.ToString)
End Try
Next

Using different encodings will give me different results(!) but the results don't show up correctly in my application. I use the following procedure:

1. Translator enters a translation through a webscript, the translation is being inserted in a mysql database. When the translator looks at the translation it looks ok in the browser.
2. I use a application, with the above code, to connect to the mysql, load the columns into a class, and serialize it. This works fine.
3. I use the created XML-file in my application, deserializing it. This works fine. The problem is ( and I lack of knowledge here ) is that the text that is show is displayed just as it is in the xml-file. It seems like I have to decode or convert the text into the right characters or something.

I tried the following. I opened a textfile of another application with russion in it. It looked just like my xml-file (but not russian). I pasted a line of my xmlfile into that file and opened the file in Word. Word sensed that it contained another encoding and asked me if I wanted to see it in Cyrillic encoding or something. Then, even, my words looked russian. So, it seems that the XML-file now have the correct encoding, it's just my application that must handle it. How can I do that? Do I set it on the font of a button for example, or do I do a special deserializing?

Any more help would be appreciated!

--- neteject.com - Internet Solutions ---
 
You need a Unicode-aware editor (Notepad is OK, Word is probably bad). I suggest Unipad ( Once you have something which can properly display unicode data, you need to step through your code and find out where the data is being corrupted.

In the internationalization area, there are two main things that can go wrong -- encoding and fonts. Unipad has a Unicode font, so it can display almost anything. Other applications do not come with Unicode-aware fonts, and will usually (but not always) display question marks for characters they don't recognize.

Chip H.


____________________________________________________________________
If you want to get the best response to a question, please read FAQ222-2244 first
 
Update.

I managed to find a russian font which could display the the russian UTF-8 XML-file. By the way I tried to save the XML-file with "Windows-1251"-encoding but those letters made no sense at all. Using the strings from UTF-8 instead let me show the words in a russian font. This does not however solve the problems with msgboxes. So, either is the encoding of the XML-file wrong or I have to convert the strings or set the encoding in VB.Net.

There must be a string converter for different charsets. I have seen that Chillkat has such a product but I think that this problem can't be to hard to solve.

Any more ideas?

--- neteject.com - Internet Solutions ---
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top