×
INTELLIGENT WORK FORUMS
FOR COMPUTER PROFESSIONALS

Contact US

Log In

Come Join Us!

Are you a
Computer / IT professional?
Join Tek-Tips Forums!
  • Talk With Other Members
  • Be Notified Of Responses
    To Your Posts
  • Keyword Search
  • One-Click Access To Your
    Favorite Forums
  • Automated Signatures
    On Your Posts
  • Best Of All, It's Free!

*Tek-Tips's functionality depends on members receiving e-mail. By joining you are opting in to receive e-mail.

Posting Guidelines

Promoting, selling, recruiting, coursework and thesis posting is forbidden.

Students Click Here

RegExp: Replace SRC attribute from <iframe>

RegExp: Replace SRC attribute from <iframe>

RegExp: Replace SRC attribute from <iframe>

(OP)
Hello,

I want to parse some HTML code with PHP and REPLACE the SRC attribute of a <iframe>.

Example INPUT:
<iframe marginwidth="0" src="http://www.yahoo.com" marginheight="0" width="120" height="240" scrolling="no" frameborder="0" ></iframe>


OUTPUT should be for example:
<iframe marginwidth="0" src="http://www.google.com" marginheight="0" width="120" height="240" scrolling="no" frameborder="0" ></iframe>

The real problem is that the SRC parameter can have double quotes (src="http://www.google.com"), single quotes (src='http://www.google.com') or even no quotes at all (src=http://www.google.com).

Also it can be located within other attributes (like the code above) or at the end just before the closing tag > , for example:
<iframe width="120" height="240" src="http://www.google.com"></iframe>

The code should be output exactly as input just with the SRC parameter replaced.

I tried using VARIOUS regular expressions but always it works just in some cases. For example I tried:

$pattern = "/<iframe ([^>]*)src=['\"]*([^'\s\">]*)['\"\s]+([^>]*)>(.*)<\/iframe>/Uis";

$pattern = "/<iframe ([^>]*)src=(.*)\s([^>]*)>(.*)<\/iframe>/Uis";

and others...

Any suggestions?

Many thanks,

c4n

RE: RegExp: Replace SRC attribute from <iframe>

c4n,

I believe this will do the job.

Let me know if it's not just right.
(Any improvements welcome, also)

CODE

$test = <<<END
<iframe width="120" height="240" src="http://www.google.com"></iframe>
END;

$pattern = "/(<iframe )([^src]*)(src=)([\'\"]?)([^>\s\'\"]+)([\'\"]?)/e";
$tmp = preg_replace($pattern,
  '"\\1\\2\\3\\4".str_replace("\\5", "http://www.yahoo.com", "\\5")."\\6"',
                    $test);
echo stripslashes($tmp);

Thanks,
-Lrnmore

RE: RegExp: Replace SRC attribute from <iframe>

(OP)
Hi,

That does the job, many thanks.

I made two small changes:

1. added i modifier to make the pattern case-insensitive. Now it also matches <IFRAME , <Iframe etc.

2. placed stripslashes() within the preg_replace() - this is to avoid the entire HTML code being put throug stripslashes() but just the replaced part. I believe this will save CPU/memory if many HTML pages are parsed.

This is what I'll be using:

CODE

$pattern = "/(<iframe )([^src]*)(src=)([\'\"]?)([^>\s\'\"]+)([\'\"]?)/ie";
$tmp = preg_replace($pattern,'stripslashes("\\1\\2\\3\\4".str_replace("\\5", "http://www.yahoo.com", "\\5")."\\6")',$test);

Again thanks for your help!

Best regards,

c4n

RE: RegExp: Replace SRC attribute from <iframe>

(OP)
Hi again Lrnmore,

I found a little problem with this part of the RE:

([^src]*)

This doesn't match "src" exactly. If you input a code like this:

CODE

$test = <<<END
<iframe scrolling="0" src="http://www.google.com"></iframe>
END;

it brakes on scrolling="" (note the "scr" which is not "src" but it brakes it anyway). Similary if you have name="s" before the src or any attribute with "s" in it.

This seems to work ok:

CODE

$test = <<<END
<iframe scrolling="0" src="http://www.google.com"></iframe>
END;

$pattern = "/(<iframe )(.*)(src=)([\'\"]?)([^>\s\'\"]+)([\'\"]?)/ie";
$tmp = preg_replace($pattern,'stripslashes("\\1\\2\\3\\4".str_replace("\\5", "http://www.yahoo.com", "\\5")."\\6")',$test);
echo $tmp;

I just changed ([^src]*) to (.*) and it seems to work ok with the tests I did. Any thoughts?

Thanks again, your code did help a lot!

c4n

RE: RegExp: Replace SRC attribute from <iframe>

c4n,

Good job with the improvements.

The only thing I can see now is that there might be a new line in the tag.

So would you want to use the "s" modifier?

CODE

$pattern = "/(<iframe )(.*)(src=)([\'\"]?)([^>\s\'\"]+)([\'\"]?)/ies"

RE: RegExp: Replace SRC attribute from <iframe>

Some thoughts:
1. If you have the literal that you want to replace, use it in the regex. The /e modifier in this case with a str_replace in the replacement is redundant and can easily be eliminated. I would not recommend to look for something and the in the replacement look again for something and then replace it.
2. Another solution - very clean and elegant - is to load the document into a DOM and manipulate it that way. Find all the iframe tags (getElementByTagName()); inspect the src attributes and replace the ones you want changed.

RE: RegExp: Replace SRC attribute from <iframe>

As always thanks for the advice.

I see what you mean.

You can remove the str_replace, because it's already captured.

CODE

$test = <<<END
<iframe width="120"
height="240" src="http://www.yahoo.com"></iframe>
END;

$nwURL = "http://www.ask.com";

$pattern = "/(<iframe )(.*)(src=)([\'\"]?)([^>\s\'\"]+)([\'\"]?)/ies";
$tmp = preg_replace($pattern,
                       '"\\1\\2\\3\\4".$nwURL."\\6"',
                    $test);
echo stripslashes($tmp);

Thanks,
-Lrnmore

RE: RegExp: Replace SRC attribute from <iframe>

One final thing:
Removing the /e pattern modifier now will probably give a little speed gain (how much really?) since the replacement does not need to be parsed as PHP code.
Some of the subpatterns could also be consolidated and we'd come up with:

CODE

$test = <<<END
<iframe width="120"
height="240" src="http://www.yahoo.com"></iframe>
END;

$nwURL = "http://www.ask.com";

$pattern = "/(<iframe .*src=([\'\"]?))([^>\s\'\"]+)[\'\"]?/is";
$tmp = preg_replace($pattern,
                       "\\1".$nwURL,
                    $test);
echo stripslashes($tmp);

RE: RegExp: Replace SRC attribute from <iframe>

(OP)
Hello both,

Thanks for your valuable posts. Just a little issue with the last code posted, you probably didn't test it DRJ478 otherwise I am sure you would have figured it out.

The above code replaces the test <iframe> to

CODE

<iframe width="120" height="240" src="http://www.ask.com></iframe>

Not the http://www.ask.com is missing the final quote.

This works ok:

CODE

$test = <<<END
<iframe width="120" scrolling="0" height="240" src="http://www.yahoo.com" name="test"></iframe>
END;

$nwURL = "http://www.ask.com";

$pattern = "/(<iframe .*src=([\'\"]?))([^>\s\'\"]+)([\'\"]?)/is";
$tmp = preg_replace($pattern,"\\1".$nwURL."\\4",$test);
echo stripslashes($tmp);

If you (or anyone else) have any other suggestions/improvements please feel free to post them.

Thanks to both.

c4n

RE: RegExp: Replace SRC attribute from <iframe>

c4n
You're right - the quote is missing.
I always test the code before posting it. In this case Firefox was a bit too nice by adding the additional quote in the source code when viewed.
If I had tested it in IE it would have shown. However, IE is my least favorite program to test in.

BTW, have you considered the DOM approach?

RE: RegExp: Replace SRC attribute from <iframe>

(OP)
Yes, I also don't use IE much. I found that by testing the code in my PHP editor (from www.dzsoft.com) which has an in-built browser.

As for the DOM approach: I haven't got any experience with DOM, will read the manual at http://php.net/dom and see if I can get it to work.

Regards,

c4n

RE: RegExp: Replace SRC attribute from <iframe>

DRJ478,

Thanks for bringing the thread up again ;)

I'm glad you demonstrated the "nesting" in the RE.

Is it safe to assume that a nested will return the next as in:
(this)(that(n))(thisandthat)

So n would be "\\3"?

Thanks,
-Lrnmore

RE: RegExp: Replace SRC attribute from <iframe>

yes it should...

Known is handfull, Unknown is worldfull

RE: RegExp: Replace SRC attribute from <iframe>

Nested subpatterns would just count in order, like vbkris says. Opening parantheses are counted from the left to the right to determine the backreference. The only exception is when the subpattern is made non-capturing by using (?:). The ?: after the opening paranthesis means that the subexpression will not be captured and therefore not count in the back references.

RE: RegExp: Replace SRC attribute from <iframe>

>>(?:)

ah, i thought that was a Mircosoft addition to RegExp, hmm...

Known is handfull, Unknown is worldfull

Red Flag This Post

Please let us know here why this post is inappropriate. Reasons such as off-topic, duplicates, flames, illegal, vulgar, or students posting their homework.

Red Flag Submitted

Thank you for helping keep Tek-Tips Forums free from inappropriate posts.
The Tek-Tips staff will check this out and take appropriate action.

Reply To This Thread

Posting in the Tek-Tips forums is a member-only feature.

Click Here to join Tek-Tips and talk with other members! Already a Member? Login

Close Box

Join Tek-Tips® Today!

Join your peers on the Internet's largest technical computer professional community.
It's easy to join and it's free.

Here's Why Members Love Tek-Tips Forums:

Register now while it's still free!

Already a member? Close this window and log in.

Join Us             Close