Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations wOOdy-Soft on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Windows OLE w/ MSXML 1

Status
Not open for further replies.

goBoating

Programmer
Feb 8, 2000
1,606
US
hello all,

I don't know whether to put this in the Perl section or the VB section. I am trying to validate XML on the Windows platforms using OLE to talk to the MSXML libs. The sequence is:
1 - create a schemaCache object.
2 - populate that schemaCache w/ the necessary schemata.
3 - create an XML Document object containing the XML to be validated.

With 'parseOnLoad' set, the validation happens in step 3 as the XML doc is loaded.

The problem is that I am getting errors that I can not figure out how to track down. The OLE layer makes the use of MSXML opaque.... as far as I can tell.

Code:
#!perl
# working test script to explore MSXML usage.
##################################################
use strict;
use Win32::OLE;
use XML::XPath;
$| = 1;


my $contentType;
my $inputXML = $ARGV[0];
if ($inputXML =~ /OVAL/) { $contentType = 'OVAL'; }
# Verify that the input XML file exists.
unless (-e $inputXML) { print  "File $inputXML does not exist."; exit; }

# Load the root node.
my $nodes = XML::XPath->new($inputXML);
my $nodeSet = $nodes->find("/");

# Ensure that we only find one node.
if ($nodeSet->size < 1) {
	print  "Root node not found during validation of $inputXML.";
	exit;
	}
elsif ($nodeSet->size > 1) {
	print  "More than one root node found during validation of $inputXML.";
	exit;
	}
print "Root node exists\n";
# Get the root node.
my $rootNode = $nodeSet->get_node(1);

# Get the node as a string.
my $rootNodeString = $rootNode->toString(1);

my $schemaLocations;
my %schemaLocations;
if ($rootNodeString =~ /schemaLocation\s*=\s*"(.*?)"/i) { $schemaLocations = $1; }
else { print  "Could not find schemaLocation in $inputXML."; exit; }
print "schemaLocation found.\n";
# print "$schemaLocations\n";

#                        '!spaces' then 'space(s)' then '!spaces' then 'space(s)'
my $num_schemata;
while ($schemaLocations =~ /\s*([^\s]+?)\s+([^\s]+)\s*/g) { $schemaLocations{$1} = $2; $num_schemata++; } 

if ($num_schemata > 0) { print "Successfully built schema locations hash.\n"; }
else { die "SchemaLocations hash is empty.\n"; }

# Ensure we have an even, non-zero number of strings.

# Verify that all of the files exist in our schema directory.
foreach my $uri (keys(%schemaLocations) )
{
	my $sl = $schemaLocations{$uri};
	$sl =~ s/.*[\\\/](.*$)/$1/gis;

	if ($contentType eq 'OVAL') { $sl = '../Schemas/oval_5.7/'. $sl; }
	else { $sl = './Schemas/'. $sl; }

	if (!(-e $sl)) { print  "Could not find schema " . $sl. ".\n"; }
	else { $schemaLocations{$uri} = $sl; } # Now, points to local copy of the schema 
}

# Create a schema cache. MSXML 6.0 must be installed, 3.0 will not work.
my $schemaCache = Win32::OLE->new("MSXML2.XMLSchemaCache.6.0");
unless ($schemaCache) { die "MSXML 6.0 is not installed."; }
print "MSXML 6.0 is installed and schemaCache created.\n";

# Set properties of the schema cache.
$schemaCache->{async} = "false";
$schemaCache->{validateOnParse} = "1";
$schemaCache->{validateOnLoad} = "1";

# Load each schema into the cache.
foreach my $uri (keys(%schemaLocations)) { $schemaCache->add($uri, $schemaLocations{$uri});  }
$schemaCache->add('garbage');


# Create a DOM object. MSXML 6.0 must be installed, 3.0 will not work.
my $xmlDoc = Win32::OLE->new('MSXML2.DOMDocument.6.0');
unless ($xmlDoc) { die "MSXML 6.0 is not installed."; }

print "Setting DOM object properties.\n";
$xmlDoc->{async} = "false";
$xmlDoc->{validateOnParse} = "true";
$xmlDoc->{schemas} = $schemaCache;

# All required schema references have been addressed.
# Remove 'schemaLocation from root element to absolutely prevent
# any possible redundant use by MSXML libs
my $xmlString;
open(XML,"<$inputXML") or die "Failed to open XML file, $!\n";
while (<XML>) { $xmlString .= $_; }
close XML;
if ($xmlString =~ s/xsi:schemaLocation\s*=\s*".*?"\s*//is) { print "Nulled out schemaLocation.\n"; }
else { print "Failed to null out schemaLocation\n"; }

print "Validating the input XML file against the cached schemas.\n";
unless ($xmlDoc->LoadXML("$xmlString")) 
{
	my $Rs = $xmlDoc->{parseError}->{reason}; 
	my $Ln = $xmlDoc->{parseError}->{line};
	my $Ps = $xmlDoc->{parseError}->{linePos};
	my $Tx = $xmlDoc->{parseError}->{srcText};
	print "Did not load at line $Ln, pos $Ps, reason: $Rs, text: $Tx\n";

	my $parseError = $xmlDoc->{parseError};
	exit;
}

print "XML load successful, XML file is validated.\n";

I'm getting validation errors on XML that I am able to validate cleanly via Altova XMLSpy.

Questions.
1 - Is there a way to know if and why the addition of a schema to the schemaCache does not work?
2 - Is there a way to dump the contents of the schemaCache so I can see exactly what the validation is using?
3 - Is there a better way to do this? I have to stay within the bounds of those resources that come on all modern windows OSes. Ergo, the use of MSXML. Otherwise, I'd be going straight to XML::LibXML.

Further, the XML doc that I'm validating is HUGE and contains sensitive content. If need be, I'll spend some time sanitizing a copy and post it. But, I'd rather not, unless I really have to.

'hope this helps

If you are new to Tek-Tips, please use descriptive titles, check the FAQs, and beware the evil typo.
 
>$schemaCache->add('garbage');
What is the story of that?
 
I was deliberately trying to make the 'add' method complain by feeding it 'garbage', so to speak.

I have found a way to get OLE to give a little more detail about errors by setting the 'Warn' option. Like:

Code:
use Win32::OLE;
Win32::OLE->Option(Warn => 2);

Without that option set, OLE was just handing back null on an error. It also hands back null on success. Useless :-/ So, with 'Warn' set to 2, I'm getting back some useful error statements.


'hope this helps

If you are new to Tek-Tips, please use descriptive titles, check the FAQs, and beware the evil typo.
 
Ok, you don't need that kind of error to debug the script, do you?!

You may try adding parseexternals to true if that matters with your xml document.
[tt] $xmlDoc->{resolveExternals} = 1;[/tt]
Otherwise, I would trust more of the msxml2 indicators. Also what is the error reason it shows?
 
Thanks tsuji,

Once I figured out that the OLE layer was not passing the errors through, and fixed that, I am now seeing the XML validation errors that occur when MSXML tries to load each schema into the schemaCache.

As it turns out, I'm dependent on XML schemata that are owned/maintained by someone else. And, the XML validation issues found/reported by MSXML are genuine.

So, thanks for thinking through this with me. The problem is solved as far as I can solve it.



'hope this helps

If you are new to Tek-Tips, please use descriptive titles, check the FAQs, and beware the evil typo.
 
okay, no problem. (That is just a way of saying, I do have some for your perusal.)

I still have some comments, though, as to the script itself.

[0] I understand very well it is a RAD or work-in-progress, but there are genuine problems that need addressing.

[1] I would re-iterate to set $xmlDoc's property resolveExternal to true as it is default to false and, in most case, you need it true for validation. That's said.

[2] The genuine problems are also the $schemaCache setting. Apparently the ole layer does a less than perfect job in reporting error. async and validateOnParse are _not_ $schemaCache's properties. Setting them is an error.

[2.1] validateOnLoad is one of its properties. However, its default is true, hence setting it true is not necessary, though fine.

[3] I would suggest no need to "nullify" the schemaLocation attribute. Doing that is not necessary: it won't interfer in the validation as msxml2.domdocument won't take that piece of data automatically. Besides, from the perspective of security/authenticity, it can produce adverse effect if the xml documents have some signature or encryption attached to them. (I understand they most certainly do not in this case.) Furthermore, taking it out make the script work faster... so it is a gain-gain situation.
 
I understand and agree with all.

Those changes are in place and full error checking is now in place, also.

'Well on the way to having this little chore knocked out.

Thanks a ton.


'hope this helps

If you are new to Tek-Tips, please use descriptive titles, check the FAQs, and beware the evil typo.
 
Thanks, goBoating, and glad to hear the progress being made.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top