Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations bkrike on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

CStdioFile and Unicode

Status
Not open for further replies.

wilsonian

Programmer
Aug 25, 2003
8
NZ
Hi there,

I'm trying to read from a unicode file. The contents of the file is the word "hello" followed by a tab and then a schwa (unicode character that looks like an upside-down e)

It is saved in unicode, and it has the BOM 0xFF 0xFE at the start.

Now I do the following in my program:

#ifdef _UNICODE
CStdioFile c;
c.Open("hello.txt", CFile::modeRead | CFile::typeBinary);
CString stuff;
c.ReadString(stuff);
c.Close();
#endif

The string that ends up in stuff looks like this in the debugger:

"?hello ?"

If I put it in a CStatic or CEdit the ?s are replaced by squares.

The first ? is the BOM, the second is the schwa.

So what's going on there? Why is the BOM making it into my string and why are non-ansi characters appearing as ?s?

Thank you.

 
I think you should change CFile::typeBinary to CFile::typeText

Ion Filipski
1c.bmp

ICQ: 95034075
AIM: IonFilipski
filipski@excite.com
 
I'm 90% sure I tried this and it did something weird, however I will try it again on Friday when I am next at work.

Could you explain to me the purpose of TypeBinary and TypeText, because I couldn't make it out from MSDN. Why would TypeBinary let ascii characters work but not others?

Thanks for your help.
 
binary means you use bytes, but text means you use characters. For example, does not make sence using UNICODE in binary mode, because UNICODE is convention about text characters.

Ion Filipski
1c.bmp

ICQ: 95034075
AIM: IonFilipski
filipski@excite.com
 
Right, I tried it using TypeText, and the result was that stuff contained ÿÞH (if that doesn't come out, it's y-umlaut, thorn, capital H) ie the BOM in ASCII + a capital H. To me this looks very much like the program thinks the file is in ANSI format, whereas at least when I used TypeBinary it seemed to be grabbing out 2-byte characters.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top