Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations TouchToneTommy on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Problem parsing my XML File with XML::Simple

Status
Not open for further replies.

mrdrlove

Programmer
Aug 27, 2008
1
DE
Hello world,

I used XML::Simple to parse my XML-File. The XML-File is not in good syntax. But I have no possibilities to correct the syntax at the moment. Here is an example of the XML-File:

Code:
<?xml version="1.0" encoding="iso-8859-1"?>
<hosts>
<network_objects_object>host2<DAG>false</DAG>
<color><![CDATA[blue]]></color>
<comments><![CDATA[This is a sample host2]]></comments>
</network_objects_object>
<network_objects_object>host3<DAG>false</DAG>
<color><![CDATA[blue]]></color>
<comments><![CDATA[This is a sample host3]]></comments>
</network_objects_object>
</hosts>

As you can see, the value host2 is not good defined.
When I parse such a kind of xml, XML::Simple returns me an array of the network_objects_object. Over this array I can access the different attributes. But I could not find the value host2 or host3.
Is there a possibility to use XML::Simple the right way to get this information. Or is there a XML-Module which can handle these kind of mad XML-Files?

In the first step, I only use the following code:
Code:
#!/usr/bin/perl

use XML::Simple;

print "START\n";
my $xs = XML::Simple->new();
#my $ref = $xs->XMLin('../sample/sample2.xml');
my $ref = $xs->XMLin('../sample/sample2.xml',KeepRoot =>1);
   
use Data::Dumper;
print Dumper($ref);
As you can see, I also tried to use the option KeepRoot without any success.

Thx.
ciao
mr_drlove
 
[0]
>The XML-File is not in good syntax.
It is in good syntax.
>...the value host2 is not good defined.
It is well defined.

I just want to make the point that it is a legitimate xml document, so you shouldn't be that mad about it. The content model of the network_objects_object is the mixed type which might however pose some trouble in retrieving data out of it. Hence, for data-oriented document or fragments of it, it is reasonable to try to avoid that kind of mixed content model.

[1] >Or is there a XML-Module which can handle these kind of mad XML-Files?

I would suggest XML::XPath series of modules which is more intimately reflecting the driving idea of the underlying technologies, in particular, the xpath. Hence is a no non-sense script to show you that it can be done concentrating on the core concepts.
[tt]
#!/usr/bin/perl

use XML::XPath;
use XML::XPath::XMLParser;

my $xp = XML::XPath->new(filename => '../sample/sample2.xml');

my $nodeset = $xp->find('/hosts/network_objects_object');
$i=0;
foreach my $node ($nodeset->get_nodelist) {
my $txtnodeset=$node->find('./text()[normalize-space()!=""]');
$j=0;
foreach my $txtnode ($txtnodeset->get_nodelist) {
print "($i,$j):",$node->getTagName(),' - ',trim($txtnode->toString()),"\n";
$j++
}
$i++;
}

#I arbitrarily grab from ref [ignore][/ignore]
sub trim($) {
my $string = shift;
$string =~ s/^\s+//;
$string =~ s/\s+$//;
return $string;
}
[/tt]
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top