Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations wOOdy-Soft on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

XML parser will not read 'illegal' characters in vb.net

Status
Not open for further replies.

fostom

Programmer
Jul 3, 2002
31
NO
Hi..

I am trying to parse a xml file in vb.net(imports system.xml). I have tried with xmlDocument and xmlTextReader to parse files with illegal characters like... leader_string=''. This will jump out of the try-catch refuseing to read this string.

Anyone know how I can parse an xml file even if it contains 'characters' like the  ???

Maybe I can open the file as a streamReader first and search and delete the , but how do I search for the ? What is  anyway?

Thanks in advance..

Tommy

 
If the characters do not fall within the encoding character set specified at the top of the XML file (no encoding means UTF-8), then the document is illegal, and the error you're getting is correct.

Which means that you won't be able to read it using any of the built-in XML readers. You would have to read it as a text stream, and parse out the XML element tags yourself (yuck!). Searching for illegal characters and replacing them with spaces is pretty fruitless -- as soon as you filter out x, they'll start sending you y, and you have to update your code to filter out the new illegal character (rinse, repeat).

Best advice is to tell whoever is giving you the file not to do that, as it violates the W3C spec. They need to send you a document that complies with the encoding you & they have agreed on.

Chip H.


If you want to get the best response to a question, please check out FAQ222-2244 first
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top