Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Chriss Miller on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

EOF pattern matching

Status
Not open for further replies.

mThomas

Instructor
May 3, 2001
404
US
I have an XML file which appears to be generated from a Pivot Table. What I need to do is find an instance of something in the each <z:row> element. If the something exists, remove and push the element to a separate file. After each element with the something in it has been removed from the original file, the original file is sent to a different department and my work with that portion of the file is done.

Here is a snippet of the file:

Code:
<rs:data>
<z:row City='Eastover' State='NC' Due_Date='2005-10-11T00:00:00'
		 Due_Time='1899-12-30T15:00:00' Name='Eastover Central Recreation center'
		 Description='Construction of the Eastover Central Recreation Center Project'
		 Estimated_Cost='0' Pre_Bid='True' Mandatory='False' Pre_Bid_Date='2005-09-29T00:00:00'
		 Pre_Bid_Time='1899-12-30T15:00:00' Building='False' Region='0'
		 Owner='0' Architect='0' Engineer='0' Other='0' Specs='True'
		 Plans='True' Processed='1899-12-30T00:00:00' Firm='MVcdc'
		 OtherProjectNumber='' Other1='' Other2='' CDC='True' CDCNumber='NC Q000000001'
		 Job_Key='' CDCURL='[URL unfurl="true"]http://www.aeplans.com/aeplans/cdc/transfer.asp?ID=LQD3492;27/OXefe'[/URL]
		 CDCVERIFYURL='[URL unfurl="true"]http://www.aeplans.com/aeplans/cdc/verify.asp?ID=LQD3492;27/OXefe'/>[/URL]

There are multiple such row elements. I can pattern match for the “something”. The something is CDC=’False’. If that occurs then I want to remove the element from the xml file.

I was thinking one way would be to loop through the file and push everything between a < and a > into a variable. I would then check if the something existed. If the something exists then print the variable to a new file. If the something does not exist, then print the variable to a different file. I would end up with two separate files. One file with all the row elements without the something and a file with all the row elements with the something.

I am trying to use a regular expression to match all occurrences of any text/characters/numbers between the < and >. Here is part of the code I’m trying. This is just trying to see if I can grab all of the code between each < and > in the file.

Code:
$xml_file="cdc10-31-2005-8-56-20.xml";
open(XML, $xml_file) || die("Could not open file!");
@xml=<XML>;
close(XML);

foreach (@xml) {

$file = m/<(.+)>/;

print "$1\n";

}

This only returns rs:data for each line. That in and of its self confused me as there is only one <rs:data> and one </rs:data> in the file. I am presuming this is a root tag.

Is there a way to tell my pattern match to match until it gets to a > even if there are new lines and tabs? This part of my code $file = m/<(.+)>/; does not do that.

I hope I have given enough information.

As always, any help will be greatly appreciated.

mike
 
Have a look on there's a plethora of XML parsers out there

HTH
--Paul

Spend an hour a week on CPAN, helps cure all known programming ailments ;-)
 
I have a couple of xml IDE's. I was hoping to not have to use one and do a simple pattern matching.

My pattern matching question is as much about future reference as it is about this project.

tia,

mmike
 
this shouldn't be too difficult if the element were all on one line, will it matter if the <z:row> tag is transformed into a single line instead of broken over several lines? Then you could do seomthing like:

Code:
if (/^<z:row/ and /CDC=’False’/ and /\/>/) {
   do something
}
 
YOu could use the module XML::Simple

There are good articles...Type in google

Perl and XML


dmazzini
GSM System and Telecomm Consultant

 
I tried using XML::Simple and it would probably work, however considering I don't really have to read the file as XML, here is what I did.

Code:
# more code here

$current = 0;
$next = 1;
$variable = "";

foreach(@xml){
	
	if($_ =~ /<z:row/) { 

		if($next != $current) { $variable = $_; $next = $current;}

	} # end if z row
				
	if($current = $next) { $variable = $variable.$_;  }

	if($_ =~ />/) {

		 push @onearray, $variable;
		$variable = "";
		$next = 1;

	} # end if />

}

foreach(@onearray) {

if($_ =~ /CDC='False'/) {push @false, $_;}
if($_ =~ /CDC='True'/) {push @true, $_;}

}

# more code here

It's not sweet and there is probably a regex that would have done a sweeter job, but it works.

mike
 
that looks OK. You could maybe shorten the last foreach loop up a bit if those are the only two conditions, you really only need check for one condition when there are only two possible (boolean):

Code:
(/CDC='False'/ ? push @false,$_ : push @true,$_) for (@onearray);

the terneary operator is perfect for boolean conditions:

if this part is true ? do this : else do this
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top