×
INTELLIGENT WORK FORUMS
FOR COMPUTER PROFESSIONALS

Log In

Come Join Us!

Are you a
Computer / IT professional?
Join Tek-Tips Forums!
  • Talk With Other Members
  • Be Notified Of Responses
    To Your Posts
  • Keyword Search
  • One-Click Access To Your
    Favorite Forums
  • Automated Signatures
    On Your Posts
  • Best Of All, It's Free!
  • Students Click Here

*Tek-Tips's functionality depends on members receiving e-mail. By joining you are opting in to receive e-mail.

Posting Guidelines

Promoting, selling, recruiting, coursework and thesis posting is forbidden.

Students Click Here

How to disable MSXML output escaping?

How to disable MSXML output escaping?

How to disable MSXML output escaping?

(OP)
I'm writing a simple XML file using the MSXML client for consumption by another program. The problem is on the text attributes of each of my elements, MSXML is deciding to provide escape characters. While this might be okay, the resulting characters are rejected by the other program upon load.

More or less, I need MSXML to leave my text alone, so I can escape the text properly to suit the program my file is getting inputted on. Any ideas on making this happen?

RE: How to disable MSXML output escaping?

Can you provide some examples on the source data and how it's being escaped by MSXML?

RE: How to disable MSXML output escaping?

Hi Glenn9999,

If you cannot set the XML generator to stop producing unwanted characters in XML, you can always add a post-processing step. In this step, you can eliminate the unwanted characters with regular expressions. Then you pass the corrected XML to the consumer program.
For the purposes of XML file correction, you can use ready-made tools, such as sed, or even write yourself a script - for example, in VBscript.

RE: How to disable MSXML output escaping?

(OP)
I'm using IXMLDOMDocument in a Delphi program if that helps to clarify things.

>Can you provide some examples on the source data and how it's being escaped by MSXML?

A simple example is if I have the text string "Henry & June" (without the quotes). If I accept the MSXML formatting ("Henry & June") the consumer program comes back with "unknown XML object" where the & appears in the text of the consumer program. In studying the output the consumer program makes, I figured out the consumer program only accepts digits (e.g. "Henry & June"). Problem is if I present that exact string to MSXML, it helpfully comes in and escapes the &, giving me "Henry & June" which produces an even more royal mess than what I already have at this point.

Also, MSXML doesn't escape certain characters I need to have done (namely <, >, ", and ' for what I'm aware of right now, mainly what's going to muck me up in the text presentation), so being able to escape characters myself would be ideal.

>you can always add a post-processing step.

This is what I'm fearing. I don't know if I read a raw XML file into a TMemo or TRichEdit and string search things for if it's going to mangle the XML beyond what is expected. Something I'll have to try.


RE: How to disable MSXML output escaping?

Hi Glenn9999,

for example, if you have in your original file glenn9999.txt this text:

CODE

... foo Henry &amp;#38; June bar baz ... 

then the command

CODE

$ sed 's/&amp;/\&/g;s/#38;//g' glen9999.txt > glenn9999_correct.txt 

creates corrected file glenn9999_correct.txt which contains this text with unwanted characters/strings removed:

CODE

... foo Henry & June bar baz ... 

If you would be interested - sed is available fo windows too:
http://gnuwin32.sourceforge.net/packages/sed.htm

RE: How to disable MSXML output escaping?

(OP)
I ended up doing what mikrom suggested in the code itself and got proper output now.

But if anyone knows how to shut off escaping within MSXML itself (a more elegant solution), I'm open to know how.

RE: How to disable MSXML output escaping?

Yes, it seems a bit unusual, but sometimes when the system does not want to do what we need, we are forced to use other means to achieve our goal.

RE: How to disable MSXML output escaping?

Quote (glenn9999)

But if anyone knows how to shut off escaping within MSXML itself (a more elegant solution), I'm open to know how.

That would be the 'create incorrect XML' setting? bigsmile

I think you are going to find a post-process step, as already suggested, to be the easiest fix.

The only alternative I have been able to devise within the context of an XML processor (MSXML and libxml) is to create an XSLT transform and specify text as the output method (xsl:output), and apply the transform to the XML document as the last step. You will go through a lot of hoops to produce almost correct XML. You can use called templates to create open and close tags in the output stream, and you must take care with output escaping. But I have created such a beast to deal with an XML consumer process that could not ingest certain aspects of well-formed XML; I had to keep it within the XML (and XSLT) realm.

Tom Morrison
Consultant

RE: How to disable MSXML output escaping?

(OP)
>That would be the 'create incorrect XML' setting?

Nice to see you around! Anyway, MSXML didn't escape 4 of the standard XML characters needed to be escaped, from my understanding (", ', >, <), so I would say it's already doing "create incorrect XML". But I digress. That particular project is done. I'm trying to read back other files now besides the ones I create and found another wrinkle, but hopefully I'll figure out what's going on. If not, I'll probably ask here again.

RE: How to disable MSXML output escaping?

I am glad to see some activity here in the XML forum! Please feel free to come back and ask for help.

Tom Morrison
Consultant

RE: How to disable MSXML output escaping?

Glenn9999

Finally got time to have a look at this.

I created an XML document like this.

CODE --> XML

<?xml version="1.0" encoding="utf-8"?>
<foo>
	<bar>Henry &#38; June</bar>
</foo> 

After loaded through MSXML, selectNodes("foo/bar").item(0).text returns Henry & June, as expected.

When saved using MSXML's save() method, the resulting document is exactly this, again as expected:

CODE --> XML

<?xml version="1.0" encoding="utf-8"?>
<foo>
	<bar>Henry &amp; June</bar>
</foo> 

Then, I created a new element, using MSXML's createElement("bar"), set its text to "<>', inserted it as foo's child with the appendChild() method, and saved the result again.

This was the result:

CODE --> XML

<?xml version="1.0" encoding="utf-8"?>
<foo>
	<bar>Henry &amp; June</bar>
	<bar>"&lt;&gt;'</bar></foo> 

Finally, I created a new attribute with createAttribute("attr") and set its value to "<>', and set it as a foo's attribute with attributes.setNamedItem() method. When saved, it ended up as:

CODE --> XML

<?xml version="1.0" encoding="utf-8"?>
<foo attr="&quot;&lt;&gt;'">
	<bar>Henry &amp; June</bar>
	<bar>"&lt;&gt;'</bar></foo> 

This seems ok with me and results in well-formed XML that any XML parser should be able to read. Are you doing things differently?

Note: using MSXML2.DOMDocument.6.0, as I normally do.

RE: How to disable MSXML output escaping?

(OP)
Like I said, I got what I was wanting to do here completed to satisfaction. But for learning sake (really why I'm doing all of this, this is my first serious project with XML outside of reading certain very small snippets of it for utility sake)...

>Are you doing things differently?

No. I did a similar write test and ended up with below as output (I'll admit the input on the other thing didn't have > and < but note " and ' remain unescaped).

<?xml version="1.0" encoding="UTF-8"?>
<base-tag>"&lt;'Henry &amp; June'&gt;"</base-tag>

Note, the presence of " caused me a major problem when it came to reading the data in the consumer program.

For those keeping score, I didn't have to do anything with the output as I had it above to read it back.

RE: How to disable MSXML output escaping?

Ok, Glenn9999, it's important that you got your problem solved, but for the record, escaping " and ' isn't required unless as part of the contents of an attribute that is delimited by the quote or the apostrophe. That is, attr="&quot;'" and attr='&apos;"' are perfectly fine, as it is <element>"'</element>

Based on what you gave us as info, and not having access to the actual data is being passed between the two systems, I would say that is the other side of processing that fails to comply to XML encoding rules. MSXML is working ok, with is (including reading and processing numeric character references).

Red Flag This Post

Please let us know here why this post is inappropriate. Reasons such as off-topic, duplicates, flames, illegal, vulgar, or students posting their homework.

Red Flag Submitted

Thank you for helping keep Tek-Tips Forums free from inappropriate posts.
The Tek-Tips staff will check this out and take appropriate action.

Reply To This Thread

Posting in the Tek-Tips forums is a member-only feature.

Click Here to join Tek-Tips and talk with other members! Already a Member? Login

Close Box

Join Tek-Tips® Today!

Join your peers on the Internet's largest technical computer professional community.
It's easy to join and it's free.

Here's Why Members Love Tek-Tips Forums:

Register now while it's still free!

Already a member? Close this window and log in.

Join Us             Close