×
INTELLIGENT WORK FORUMS
FOR COMPUTER PROFESSIONALS

Log In

Come Join Us!

Are you a
Computer / IT professional?
Join Tek-Tips Forums!
  • Talk With Other Members
  • Be Notified Of Responses
    To Your Posts
  • Keyword Search
  • One-Click Access To Your
    Favorite Forums
  • Automated Signatures
    On Your Posts
  • Best Of All, It's Free!
  • Students Click Here

*Tek-Tips's functionality depends on members receiving e-mail. By joining you are opting in to receive e-mail.

Posting Guidelines

Promoting, selling, recruiting, coursework and thesis posting is forbidden.

Students Click Here

xml.etree.ElementTree - xml.parsers.expat.ExpatError: undefined entity

xml.etree.ElementTree - xml.parsers.expat.ExpatError: undefined entity

xml.etree.ElementTree - xml.parsers.expat.ExpatError: undefined entity

(OP)
from http://effbot.org/elementtree/elementtree-xmlparse...
I thought that just setting the entity dict of the XMLParser instance would be sufficient
but evidently it's not enough.

what am I missing?

I know that I can replace the named entities with their unicode equivalents but that would mess up my
output.
I am asked, among other things, to check if the idxname matches the surname.
If not, then I am to flag an error indicating the line and column numbers
so that they can find and fix the error in the XML file.

(oh and yes I am stuck with python 2.5.4)

Thanks in advance.

Justin

source code

CODE

from xml.etree import ElementTree
from htmlentitydefs import name2codepoint
from StringIO import StringIO
import unicodedata

def getParser():
    xp = ElementTree.XMLParser()
    for k, v in name2codepoint.iteritems():
        xp.entity[k] = unichr(v)

    return xp

test = '''<surnamegrp>
<surname print="yes">Muñoz</surname>
<idxname>Munoz</idxname>
</surnamegrp>'''

if __name__ == '__main__':
    print 'ntilde' in name2codepoint # True
    xp = getParser()
    print 'ntilde' in xp.entity # True
    print unicodedata.name(xp.entity['ntilde']) # LATIN SMALL LETTER N WITH TILDE

##    xp.feed(test)
##    e = xp.close()
    b = StringIO(test)
    t = ElementTree.parse(b, xp) 

output

CODE

C:\Users\justin\Desktop>parserTest.py
True
True
LATIN SMALL LETTER N WITH TILDE
Traceback (most recent call last):
  File "C:\Users\justin\Desktop\parserTest.py", line 27, in <module>
    t = ElementTree.parse(b, xp)
  File "C:\Python25\lib\xml\etree\ElementTree.py", line 862, in parse
    tree.parse(source, parser)
  File "C:\Python25\lib\xml\etree\ElementTree.py", line 586, in parse
    parser.feed(data)
  File "C:\Python25\lib\xml\etree\ElementTree.py", line 1245, in feed
    self._parser.Parse(data, 0)
xml.parsers.expat.ExpatError: undefined entity: line 2, column 23 

[code]
>>> import sys
>>> sys.version
'2.5.4 (r254:67916, Dec 23 2008, 15:10:54) [MSC v.1310 32 bit (Intel)]'
[/code

RE: xml.etree.ElementTree - xml.parsers.expat.ExpatError: undefined entity

(OP)
apparently, if I add a DOCTYPE, it works

CODE

<!DOCTYPE nul SYSTEM "nul.dtd">
<surnamegrp>
<surname print="yes">Muñoz</surname>
<idxname>Munoz</idxname>
</surnamegrp> 

Red Flag This Post

Please let us know here why this post is inappropriate. Reasons such as off-topic, duplicates, flames, illegal, vulgar, or students posting their homework.

Red Flag Submitted

Thank you for helping keep Tek-Tips Forums free from inappropriate posts.
The Tek-Tips staff will check this out and take appropriate action.

Reply To This Thread

Posting in the Tek-Tips forums is a member-only feature.

Click Here to join Tek-Tips and talk with other members! Already a Member? Login

Close Box

Join Tek-Tips® Today!

Join your peers on the Internet's largest technical computer professional community.
It's easy to join and it's free.

Here's Why Members Love Tek-Tips Forums:

Register now while it's still free!

Already a member? Close this window and log in.

Join Us             Close