Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations TouchToneTommy on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Writing my own simple xml parsing

Status
Not open for further replies.

jmreinwald

Technical User
Jun 3, 2002
46
US
Hi,

I've taken a look at some of the typical XML parsers and am looking for an easy way to parse some very specific xml data my company provides with very little overhead.

Strings are definitely my weakpoint, so I need to know how to grab the contents between, say <topic> and </topic>. I do need to make it general enough, so I need to know how to count the number of <topic> tags as well as the extraction noted above. I do not need to deal with attributes, luckily.

Ideas?
 
Question:
<topic> </topic> is never nested?

If so, I suggest a regular expression and preg_match_all.
preg_match_all pulls all matches into an array.
Code:
$pattern = "/<topic>(.*)<\/topic>/";
$numMatches = preg_match_all($pattern,$XMLtext,$matches);
This way you get the number of items and the extracted information in $matches. print_r($matches) to see how it is built.
 
Sorry, I realized a moment ago that the strucure would be of some help... It will be:

<cXmlTopicList> (root, so only one, obviously)
-<Topic>
-<TopicName>
-<TopicUrl>

The <Topic> tag is, for all purposes in my app, useless, as it only is a parent tag and contains attributes which I'm not using. The thing I need to extract is the <TopicName> and the <TopicUrl>.

Sorry, I was trying to be generic in my original post but now realize how counterproductive that was!
 
I'd write a recursive parsing function understands what tags need a corresponding closing tab and pass an iterator down through the functions, then return the matching tag to the caller when found.

That being said, I'd really just use an XML parser. The overhead isn't that great.
 
So, what about this:
Code:
$pattern = "/<Topic>(.*)<\/Topic>/s";
$numMatches = preg_match_all($pattern,$text,$matches);
# matches[1] has the extracted topic content
foreach($matches[1] as $value){
	$pattern = "/<TopicName>(.*)<\/TopicName>/";
	preg_match($pattern,$value,$name);
	$pattern = "/<TopicUrl>(.*)<\/TopicUrl>/";
	preg_match($pattern,$value,$url);
}

That will get all url's and names using regular expressions.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top