Writing my own simple xml parsing

jmreinwald · Sep 24, 2004

Hi,

I've taken a look at some of the typical XML parsers and am looking for an easy way to parse some very specific xml data my company provides with very little overhead.

Strings are definitely my weakpoint, so I need to know how to grab the contents between, say <topic> and </topic>. I do need to make it general enough, so I need to know how to count the number of <topic> tags as well as the extraction noted above. I do not need to deal with attributes, luckily.

Ideas?

DRJ478 · Sep 24, 2004

Question:
<topic> </topic> is never nested?

If so, I suggest a regular expression and preg_match_all.
preg_match_all pulls all matches into an array.

Code:

$pattern = "/<topic>(.*)<\/topic>/";
$numMatches = preg_match_all($pattern,$XMLtext,$matches);

This way you get the number of items and the extracted information in $matches. print_r($matches) to see how it is built.

jmreinwald · Sep 24, 2004

Sorry, I realized a moment ago that the strucure would be of some help... It will be:

<cXmlTopicList> (root, so only one, obviously)
-<Topic>
-<TopicName>
-<TopicUrl>

The <Topic> tag is, for all purposes in my app, useless, as it only is a parent tag and contains attributes which I'm not using. The thing I need to extract is the <TopicName> and the <TopicUrl>.

Sorry, I was trying to be generic in my original post but now realize how counterproductive that was!

ericbrunson · Sep 24, 2004

I'd write a recursive parsing function understands what tags need a corresponding closing tab and pass an iterator down through the functions, then return the matching tag to the caller when found.

That being said, I'd really just use an XML parser. The overhead isn't that great.

DRJ478 · Sep 24, 2004

So, what about this:

Code:

$pattern = "/<Topic>(.*)<\/Topic>/s";
$numMatches = preg_match_all($pattern,$text,$matches);
# matches[1] has the extracted topic content
foreach($matches[1] as $value){
	$pattern = "/<TopicName>(.*)<\/TopicName>/";
	preg_match($pattern,$value,$name);
	$pattern = "/<TopicUrl>(.*)<\/TopicUrl>/";
	preg_match($pattern,$value,$url);
}

That will get all url's and names using regular expressions.

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Writing my own simple xml parsing

jmreinwald

Technical User

DRJ478

IS-IT--Management

jmreinwald

Technical User

ericbrunson

Technical User

DRJ478

IS-IT--Management

Similar threads

Part and Inventory Search

Sponsor