I'm parsing XML manually, for a specific task, and I've got a regex that works 90% of the time. I'd like some help figuring out how to modify my regex to get that last 10%.
So first off, this is my regex:
/<$element\s*(.*?)(\/>|>(.*?)<\/$element>)/si
which matches all the stuff inbetween a set element, including any attributes that might be applied to that element. ie, for:
<subject fg="black>This is my subject</subject>
I can pull out the 'This is my subject' part AND the 'fg="black"' part for further processing.
The problem I am getting, is with attributes that might include another attribute in their name, ie:
<subject name="subject" namefg="orange" fg="black">This is my subject</subject>
I get messed up on the 'namefg' attribute, as it is also recognized as 'name' and not just 'namefg'.
So basically my regex needs to not match on partial attributes.
If that makes sense.. Regex's give me headaches.
So first off, this is my regex:
/<$element\s*(.*?)(\/>|>(.*?)<\/$element>)/si
which matches all the stuff inbetween a set element, including any attributes that might be applied to that element. ie, for:
<subject fg="black>This is my subject</subject>
I can pull out the 'This is my subject' part AND the 'fg="black"' part for further processing.
The problem I am getting, is with attributes that might include another attribute in their name, ie:
<subject name="subject" namefg="orange" fg="black">This is my subject</subject>
I get messed up on the 'namefg' attribute, as it is also recognized as 'name' and not just 'namefg'.
So basically my regex needs to not match on partial attributes.
If that makes sense.. Regex's give me headaches.
