RegExp: Replace SRC attribute from <iframe>
RegExp: Replace SRC attribute from <iframe>
(OP)
Hello,
I want to parse some HTML code with PHP and REPLACE the SRC attribute of a <iframe>.
Example INPUT:
<iframe marginwidth="0" src="http://www.yahoo.com" marginheight="0" width="120" height="240" scrolling="no" frameborder="0" ></iframe>
OUTPUT should be for example:
<iframe marginwidth="0" src="http://www.google.com" marginheight="0" width="120" height="240" scrolling="no" frameborder="0" ></iframe>
The real problem is that the SRC parameter can have double quotes (src="http://www.google.com"), single quotes (src='http://www.google.com') or even no quotes at all (src=http://www.google.com).
Also it can be located within other attributes (like the code above) or at the end just before the closing tag > , for example:
<iframe width="120" height="240" src="http://www.google.com"></iframe>
The code should be output exactly as input just with the SRC parameter replaced.
I tried using VARIOUS regular expressions but always it works just in some cases. For example I tried:
$pattern = "/<iframe ([^>]*)src=['\"]*([^'\s\">]*)['\"\s]+([^>]*)>(.*)<\/iframe>/Uis";
$pattern = "/<iframe ([^>]*)src=(.*)\s([^>]*)>(.*)<\/iframe>/Uis";
and others...
Any suggestions?
Many thanks,
c4n
I want to parse some HTML code with PHP and REPLACE the SRC attribute of a <iframe>.
Example INPUT:
<iframe marginwidth="0" src="http://www.yahoo.com" marginheight="0" width="120" height="240" scrolling="no" frameborder="0" ></iframe>
OUTPUT should be for example:
<iframe marginwidth="0" src="http://www.google.com" marginheight="0" width="120" height="240" scrolling="no" frameborder="0" ></iframe>
The real problem is that the SRC parameter can have double quotes (src="http://www.google.com"), single quotes (src='http://www.google.com') or even no quotes at all (src=http://www.google.com).
Also it can be located within other attributes (like the code above) or at the end just before the closing tag > , for example:
<iframe width="120" height="240" src="http://www.google.com"></iframe>
The code should be output exactly as input just with the SRC parameter replaced.
I tried using VARIOUS regular expressions but always it works just in some cases. For example I tried:
$pattern = "/<iframe ([^>]*)src=['\"]*([^'\s\">]*)['\"\s]+([^>]*)>(.*)<\/iframe>/Uis";
$pattern = "/<iframe ([^>]*)src=(.*)\s([^>]*)>(.*)<\/iframe>/Uis";
and others...
Any suggestions?
Many thanks,
c4n
RE: RegExp: Replace SRC attribute from <iframe>
I believe this will do the job.
Let me know if it's not just right.
(Any improvements welcome, also)
CODE
<iframe width="120" height="240" src="http://www.google.com"></iframe>
END;
$pattern = "/(<iframe )([^src]*)(src=)([\'\"]?)([^>\s\'\"]+)([\'\"]?)/e";
$tmp = preg_replace($pattern,
'"\\1\\2\\3\\4".str_replace("\\5", "http://www.yahoo.com", "\\5")."\\6"',
$test);
echo stripslashes($tmp);
Thanks,
-Lrnmore
RE: RegExp: Replace SRC attribute from <iframe>
That does the job, many thanks.
I made two small changes:
1. added i modifier to make the pattern case-insensitive. Now it also matches <IFRAME , <Iframe etc.
2. placed stripslashes() within the preg_replace() - this is to avoid the entire HTML code being put throug stripslashes() but just the replaced part. I believe this will save CPU/memory if many HTML pages are parsed.
This is what I'll be using:
CODE
$tmp = preg_replace($pattern,'stripslashes("\\1\\2\\3\\4".str_replace("\\5", "http://www.yahoo.com", "\\5")."\\6")',$test);
Again thanks for your help!
Best regards,
c4n
RE: RegExp: Replace SRC attribute from <iframe>
I found a little problem with this part of the RE:
([^src]*)
This doesn't match "src" exactly. If you input a code like this:
CODE
<iframe scrolling="0" src="http://www.google.com"></iframe>
END;
it brakes on scrolling="" (note the "scr" which is not "src" but it brakes it anyway). Similary if you have name="s" before the src or any attribute with "s" in it.
This seems to work ok:
CODE
<iframe scrolling="0" src="http://www.google.com"></iframe>
END;
$pattern = "/(<iframe )(.*)(src=)([\'\"]?)([^>\s\'\"]+)([\'\"]?)/ie";
$tmp = preg_replace($pattern,'stripslashes("\\1\\2\\3\\4".str_replace("\\5", "http://www.yahoo.com", "\\5")."\\6")',$test);
echo $tmp;
I just changed ([^src]*) to (.*) and it seems to work ok with the tests I did. Any thoughts?
Thanks again, your code did help a lot!
c4n
RE: RegExp: Replace SRC attribute from <iframe>
Good job with the improvements.
The only thing I can see now is that there might be a new line in the tag.
So would you want to use the "s" modifier?
CODE
RE: RegExp: Replace SRC attribute from <iframe>
1. If you have the literal that you want to replace, use it in the regex. The /e modifier in this case with a str_replace in the replacement is redundant and can easily be eliminated. I would not recommend to look for something and the in the replacement look again for something and then replace it.
2. Another solution - very clean and elegant - is to load the document into a DOM and manipulate it that way. Find all the iframe tags (getElementByTagName()); inspect the src attributes and replace the ones you want changed.
RE: RegExp: Replace SRC attribute from <iframe>
I see what you mean.
You can remove the str_replace, because it's already captured.
CODE
<iframe width="120"
height="240" src="http://www.yahoo.com"></iframe>
END;
$nwURL = "http://www.ask.com";
$pattern = "/(<iframe )(.*)(src=)([\'\"]?)([^>\s\'\"]+)([\'\"]?)/ies";
$tmp = preg_replace($pattern,
'"\\1\\2\\3\\4".$nwURL."\\6"',
$test);
echo stripslashes($tmp);
Thanks,
-Lrnmore
RE: RegExp: Replace SRC attribute from <iframe>
Removing the /e pattern modifier now will probably give a little speed gain (how much really?) since the replacement does not need to be parsed as PHP code.
Some of the subpatterns could also be consolidated and we'd come up with:
CODE
<iframe width="120"
height="240" src="http://www.yahoo.com"></iframe>
END;
$nwURL = "http://www.ask.com";
$pattern = "/(<iframe .*src=([\'\"]?))([^>\s\'\"]+)[\'\"]?/is";
$tmp = preg_replace($pattern,
"\\1".$nwURL,
$test);
echo stripslashes($tmp);
RE: RegExp: Replace SRC attribute from <iframe>
Thanks for your valuable posts. Just a little issue with the last code posted, you probably didn't test it DRJ478 otherwise I am sure you would have figured it out.
The above code replaces the test <iframe> to
CODE
Not the http://www.ask.com is missing the final quote.
This works ok:
CODE
<iframe width="120" scrolling="0" height="240" src="http://www.yahoo.com" name="test"></iframe>
END;
$nwURL = "http://www.ask.com";
$pattern = "/(<iframe .*src=([\'\"]?))([^>\s\'\"]+)([\'\"]?)/is";
$tmp = preg_replace($pattern,"\\1".$nwURL."\\4",$test);
echo stripslashes($tmp);
If you (or anyone else) have any other suggestions/improvements please feel free to post them.
Thanks to both.
c4n
RE: RegExp: Replace SRC attribute from <iframe>
You're right - the quote is missing.
I always test the code before posting it. In this case Firefox was a bit too nice by adding the additional quote in the source code when viewed.
If I had tested it in IE it would have shown. However, IE is my least favorite program to test in.
BTW, have you considered the DOM approach?
RE: RegExp: Replace SRC attribute from <iframe>
As for the DOM approach: I haven't got any experience with DOM, will read the manual at http://php.net/dom and see if I can get it to work.
Regards,
c4n
RE: RegExp: Replace SRC attribute from <iframe>
Thanks for bringing the thread up again ;)
I'm glad you demonstrated the "nesting" in the RE.
Is it safe to assume that a nested will return the next as in:
(this)(that(n))(thisandthat)
So n would be "\\3"?
Thanks,
-Lrnmore
RE: RegExp: Replace SRC attribute from <iframe>
Known is handfull, Unknown is worldfull
RE: RegExp: Replace SRC attribute from <iframe>
RE: RegExp: Replace SRC attribute from <iframe>
ah, i thought that was a Mircosoft addition to RegExp, hmm...
Known is handfull, Unknown is worldfull