I have an XML file which appears to be generated from a Pivot Table. What I need to do is find an instance of something in the each <z:row> element. If the something exists, remove and push the element to a separate file. After each element with the something in it has been removed from the original file, the original file is sent to a different department and my work with that portion of the file is done.
Here is a snippet of the file:
There are multiple such row elements. I can pattern match for the “something”. The something is CDC=’False’. If that occurs then I want to remove the element from the xml file.
I was thinking one way would be to loop through the file and push everything between a < and a > into a variable. I would then check if the something existed. If the something exists then print the variable to a new file. If the something does not exist, then print the variable to a different file. I would end up with two separate files. One file with all the row elements without the something and a file with all the row elements with the something.
I am trying to use a regular expression to match all occurrences of any text/characters/numbers between the < and >. Here is part of the code I’m trying. This is just trying to see if I can grab all of the code between each < and > in the file.
This only returns rs:data for each line. That in and of its self confused me as there is only one <rs:data> and one </rs:data> in the file. I am presuming this is a root tag.
Is there a way to tell my pattern match to match until it gets to a > even if there are new lines and tabs? This part of my code $file = m/<(.+)>/; does not do that.
I hope I have given enough information.
As always, any help will be greatly appreciated.
mike
Here is a snippet of the file:
Code:
<rs:data>
<z:row City='Eastover' State='NC' Due_Date='2005-10-11T00:00:00'
Due_Time='1899-12-30T15:00:00' Name='Eastover Central Recreation center'
Description='Construction of the Eastover Central Recreation Center Project'
Estimated_Cost='0' Pre_Bid='True' Mandatory='False' Pre_Bid_Date='2005-09-29T00:00:00'
Pre_Bid_Time='1899-12-30T15:00:00' Building='False' Region='0'
Owner='0' Architect='0' Engineer='0' Other='0' Specs='True'
Plans='True' Processed='1899-12-30T00:00:00' Firm='MVcdc'
OtherProjectNumber='' Other1='' Other2='' CDC='True' CDCNumber='NC Q000000001'
Job_Key='' CDCURL='[URL unfurl="true"]http://www.aeplans.com/aeplans/cdc/transfer.asp?ID=LQD3492;27/OXefe'[/URL]
CDCVERIFYURL='[URL unfurl="true"]http://www.aeplans.com/aeplans/cdc/verify.asp?ID=LQD3492;27/OXefe'/>[/URL]
There are multiple such row elements. I can pattern match for the “something”. The something is CDC=’False’. If that occurs then I want to remove the element from the xml file.
I was thinking one way would be to loop through the file and push everything between a < and a > into a variable. I would then check if the something existed. If the something exists then print the variable to a new file. If the something does not exist, then print the variable to a different file. I would end up with two separate files. One file with all the row elements without the something and a file with all the row elements with the something.
I am trying to use a regular expression to match all occurrences of any text/characters/numbers between the < and >. Here is part of the code I’m trying. This is just trying to see if I can grab all of the code between each < and > in the file.
Code:
$xml_file="cdc10-31-2005-8-56-20.xml";
open(XML, $xml_file) || die("Could not open file!");
@xml=<XML>;
close(XML);
foreach (@xml) {
$file = m/<(.+)>/;
print "$1\n";
}
This only returns rs:data for each line. That in and of its self confused me as there is only one <rs:data> and one </rs:data> in the file. I am presuming this is a root tag.
Is there a way to tell my pattern match to match until it gets to a > even if there are new lines and tabs? This part of my code $file = m/<(.+)>/; does not do that.
I hope I have given enough information.
As always, any help will be greatly appreciated.
mike