xml.etree.ElementTree - xml.parsers.expat.ExpatError: undefined entity
xml.etree.ElementTree - xml.parsers.expat.ExpatError: undefined entity
(OP)
from http://effbot.org/elementtree/elementtree-xmlparse...
I thought that just setting the entity dict of the XMLParser instance would be sufficient
but evidently it's not enough.
what am I missing?
I know that I can replace the named entities with their unicode equivalents but that would mess up my
output.
I am asked, among other things, to check if the idxname matches the surname.
If not, then I am to flag an error indicating the line and column numbers
so that they can find and fix the error in the XML file.
(oh and yes I am stuck with python 2.5.4)
Thanks in advance.
Justin
source code
output
[code]
>>> import sys
>>> sys.version
'2.5.4 (r254:67916, Dec 23 2008, 15:10:54) [MSC v.1310 32 bit (Intel)]'
[/code
I thought that just setting the entity dict of the XMLParser instance would be sufficient
but evidently it's not enough.
what am I missing?
I know that I can replace the named entities with their unicode equivalents but that would mess up my
output.
I am asked, among other things, to check if the idxname matches the surname.
If not, then I am to flag an error indicating the line and column numbers
so that they can find and fix the error in the XML file.
(oh and yes I am stuck with python 2.5.4)
Thanks in advance.
Justin
source code
CODE
from xml.etree import ElementTree from htmlentitydefs import name2codepoint from StringIO import StringIO import unicodedata def getParser(): xp = ElementTree.XMLParser() for k, v in name2codepoint.iteritems(): xp.entity[k] = unichr(v) return xp test = '''<surnamegrp> <surname print="yes">Muñoz</surname> <idxname>Munoz</idxname> </surnamegrp>''' if __name__ == '__main__': print 'ntilde' in name2codepoint # True xp = getParser() print 'ntilde' in xp.entity # True print unicodedata.name(xp.entity['ntilde']) # LATIN SMALL LETTER N WITH TILDE ## xp.feed(test) ## e = xp.close() b = StringIO(test) t = ElementTree.parse(b, xp)
output
CODE
C:\Users\justin\Desktop>parserTest.py True True LATIN SMALL LETTER N WITH TILDE Traceback (most recent call last): File "C:\Users\justin\Desktop\parserTest.py", line 27, in <module> t = ElementTree.parse(b, xp) File "C:\Python25\lib\xml\etree\ElementTree.py", line 862, in parse tree.parse(source, parser) File "C:\Python25\lib\xml\etree\ElementTree.py", line 586, in parse parser.feed(data) File "C:\Python25\lib\xml\etree\ElementTree.py", line 1245, in feed self._parser.Parse(data, 0) xml.parsers.expat.ExpatError: undefined entity: line 2, column 23
[code]
>>> import sys
>>> sys.version
'2.5.4 (r254:67916, Dec 23 2008, 15:10:54) [MSC v.1310 32 bit (Intel)]'
[/code
RE: xml.etree.ElementTree - xml.parsers.expat.ExpatError: undefined entity
CODE
<!DOCTYPE nul SYSTEM "nul.dtd"> <surnamegrp> <surname print="yes">Muñoz</surname> <idxname>Munoz</idxname> </surnamegrp>