Hi all
I'm trying to parse an XML file that looks like:
The real file is > 800,000 lines long.
I wrote a parser to grab a list of the info1 and info2 data in XML::Twig, XML::SAX and XML:
arser.
After a bit of searching I am lead to believe SAX is the way to go. XML::Twig looked interesting but seems to choke on the size of the document, even though it is supposed to be able to "simulate" stream processing as opposed to tree.
XML:
arser was the fastest, SAX about 50% slower and Twig several times slower.
Anyway, my question is this.
What recommendations do people have for XML parsing? I'm happy to move with the times and move away from XML:
arser but 50% delay seems a bit much.
I loaded XML::SAX::ExpatXS which seems to be the fastest SAX processor but it's still a lot slower.
Any suggestions?
Thanks
~ Michael
I'm trying to parse an XML file that looks like:
Code:
<?xml version="1.0" standalone="yes" ?>
<abc>
<def>
<ghi>
<jkl>information</jkl>
</ghi>
<important>
<data>
<info1>info 1</info1>
<info2>info 2</info2>
</data>
<data>
<info1>info</info1>
<info2>info</info2>
</data>
</important>
</def>
</abc>
The real file is > 800,000 lines long.
I wrote a parser to grab a list of the info1 and info2 data in XML::Twig, XML::SAX and XML:
After a bit of searching I am lead to believe SAX is the way to go. XML::Twig looked interesting but seems to choke on the size of the document, even though it is supposed to be able to "simulate" stream processing as opposed to tree.
XML:
Anyway, my question is this.
What recommendations do people have for XML parsing? I'm happy to move with the times and move away from XML:
I loaded XML::SAX::ExpatXS which seems to be the fastest SAX processor but it's still a lot slower.
Any suggestions?
Thanks
~ Michael