EOF pattern matching

mThomas · Oct 31, 2005

I have an XML file which appears to be generated from a Pivot Table. What I need to do is find an instance of something in the each <z:row> element. If the something exists, remove and push the element to a separate file. After each element with the something in it has been removed from the original file, the original file is sent to a different department and my work with that portion of the file is done.

Here is a snippet of the file:

Code:

<rs:data>
<z:row City='Eastover' State='NC' Due_Date='2005-10-11T00:00:00'
		 Due_Time='1899-12-30T15:00:00' Name='Eastover Central Recreation center'
		 Description='Construction of the Eastover Central Recreation Center Project'
		 Estimated_Cost='0' Pre_Bid='True' Mandatory='False' Pre_Bid_Date='2005-09-29T00:00:00'
		 Pre_Bid_Time='1899-12-30T15:00:00' Building='False' Region='0'
		 Owner='0' Architect='0' Engineer='0' Other='0' Specs='True'
		 Plans='True' Processed='1899-12-30T00:00:00' Firm='MVcdc'
		 OtherProjectNumber='' Other1='' Other2='' CDC='True' CDCNumber='NC Q000000001'
		 Job_Key='' CDCURL='[URL unfurl="true"]http://www.aeplans.com/aeplans/cdc/transfer.asp?ID=LQD3492;27/OXefe'[/URL]
		 CDCVERIFYURL='[URL unfurl="true"]http://www.aeplans.com/aeplans/cdc/verify.asp?ID=LQD3492;27/OXefe'/>[/URL]

There are multiple such row elements. I can pattern match for the “something”. The something is CDC=’False’. If that occurs then I want to remove the element from the xml file.

I was thinking one way would be to loop through the file and push everything between a < and a > into a variable. I would then check if the something existed. If the something exists then print the variable to a new file. If the something does not exist, then print the variable to a different file. I would end up with two separate files. One file with all the row elements without the something and a file with all the row elements with the something.

I am trying to use a regular expression to match all occurrences of any text/characters/numbers between the < and >. Here is part of the code I’m trying. This is just trying to see if I can grab all of the code between each < and > in the file.

Code:

$xml_file="cdc10-31-2005-8-56-20.xml";
open(XML, $xml_file) || die("Could not open file!");
@xml=<XML>;
close(XML);

foreach (@xml) {

$file = m/<(.+)>/;

print "$1\n";

}

This only returns rs:data for each line. That in and of its self confused me as there is only one <rs:data> and one </rs:data> in the file. I am presuming this is a root tag.

Is there a way to tell my pattern match to match until it gets to a > even if there are new lines and tabs? This part of my code $file = m/<(.+)>/; does not do that.

I hope I have given enough information.

As always, any help will be greatly appreciated.

mike

PaulTEG · Oct 31, 2005

Have a look on

http://search.cpan.org,

there's a plethora of XML parsers out there

HTH
--Paul

Spend an hour a week on CPAN, helps cure all known programming ailments ;-)

mThomas · Oct 31, 2005

I have a couple of xml IDE's. I was hoping to not have to use one and do a simple pattern matching.

My pattern matching question is as much about future reference as it is about this project.

tia,

mmike

KevinADC · Oct 31, 2005

this shouldn't be too difficult if the element were all on one line, will it matter if the <z:row> tag is transformed into a single line instead of broken over several lines? Then you could do seomthing like:

Code:

if (/^<z:row/ and /CDC=’False’/ and /\/>/) {
   do something
}

dmazzini · Oct 31, 2005

YOu could use the module XML::Simple

There are good articles...Type in google

Perl and XML

dmazzini
GSM System and Telecomm Consultant

mThomas · Nov 1, 2005

I tried using XML::Simple and it would probably work, however considering I don't really have to read the file as XML, here is what I did.

Code:

# more code here

$current = 0;
$next = 1;
$variable = "";

foreach(@xml){
	
	if($_ =~ /<z:row/) { 

		if($next != $current) { $variable = $_; $next = $current;}

	} # end if z row
				
	if($current = $next) { $variable = $variable.$_;  }

	if($_ =~ />/) {

		 push @onearray, $variable;
		$variable = "";
		$next = 1;

	} # end if />

}

foreach(@onearray) {

if($_ =~ /CDC='False'/) {push @false, $_;}
if($_ =~ /CDC='True'/) {push @true, $_;}

}

# more code here

It's not sweet and there is probably a regex that would have done a sweeter job, but it works.

mike

KevinADC · Nov 1, 2005

that looks OK. You could maybe shorten the last foreach loop up a bit if those are the only two conditions, you really only need check for one condition when there are only two possible (boolean):

Code:

(/CDC='False'/ ? push @false,$_ : push @true,$_) for (@onearray);

the terneary operator is perfect for boolean conditions:

if this part is true ? do this : else do this

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

EOF pattern matching

mThomas

Instructor

PaulTEG

Technical User

mThomas

Instructor

KevinADC

Technical User

dmazzini

Programmer

mThomas

Instructor

KevinADC

Technical User

Similar threads

Part and Inventory Search

Sponsor