Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations wOOdy-Soft on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Converting Tags to Lower Case

Status
Not open for further replies.

DonP

IS-IT--Management
Jul 20, 2000
684
US
Does anyone know of a function that will convert HTML tags into lower case? I am doing it now with a series of str_replace() functions but it's awkward at best.

Don
contact@pc-homepage.com
Experienced in HTML, Perl, PHP, VBScript, PWS, IIS and Apache and MS-Access, MS-SQL, MySQL databases
 
I think he's looking for something that will turn this:

<TD>Text Goes Here</TD>

To this:

<td>Text Goes here</td>

strtolower() will change it to:

<td>text goes here</td>

He'll likely need a regular expression for this, of which I am little help :-/

*cLFlaVA
----------------------------
Ham and Eggs walks into a bar and asks, "Can I have a beer please?"
The bartender replies, "I'm sorry, we don't serve breakfast.
 
Yes, strtolower() will convert everything, not just the tags. I know virtually nothing about regular expressions myself.

Don
contact@pc-homepage.com
Experienced in HTML, Perl, PHP, VBScript, PWS, IIS and Apache and MS-Access, MS-SQL, MySQL databases
 
OK.. not sure if this will help. It's slightly converted from the PHP manual.

Code:
<?php
preg_replace("/(<\/?)(\w+)([^>]*>)/e",
             "'\\1'.strtolower('\\2').'\\3'",
             $html_body);
?>

Not sure if it will work as my brain is too fuzzy to fully understand it.

 
Thanks! It LOOKS OK but, as I said, I know little about regular expressions. I just tested it too but it seems to mess up image tags:

Code:
<img src=\"../images/photo.jpg\">

I suppose I Can use str_replace() though to straighten it out again.

Don
contact@pc-homepage.com
Experienced in HTML, Perl, PHP, VBScript, PWS, IIS and Apache and MS-Access, MS-SQL, MySQL databases
 
preg_replace("/(<.*>)/Ue",
"strtolower('\\1')",
$html_body);


will replace all the text between <> to lowecase...

Known is handfull, Unknown is worldfull
 
In this case, wouldn't it change things like

Code:
<A HREF="filename.php?ID=1">

to all lower, including passed values?

Code:
<a href="filename.php?id=1">

If so, this won't work because "ID=1" isn't the same as "id=1". Is there any way to change only the HTML and not anything else? Foamcow's tip seems to ALMOST do the trick and leaves values alone, changing only the HTML but it backslashes any quotes. Any thoughts on how to correct this?

Don
contact@pc-homepage.com
Experienced in HTML, Perl, PHP, VBScript, PWS, IIS and Apache and MS-Access, MS-SQL, MySQL databases
 
The example does more than leave alone all values, it also leaves alone attributes.

So
Code:
<A HREF="something.html">
becomes
<a HREF="something.html">

It is specifically looking for text attached to a less than sign with no spaces, (this text will be lowered), followed by anything and everything, and ending with a greater than sign.

As far as it automatically escaping the quotes, I'm a bit confused by that. It really shouldn't be doing that. A stripslashes would remove them, but may remove others you don't want to.

You need a much more complex regular expression to handle this whole thing properly... I'm at work now, but maybe by morning I'll try to chug something up.
 
The real trick here is ignoring everything in quotes.
Code:
<a href=">">

causes a potential pain in the rear for example. I don't actually know that regex's are the best solution actually. At least not for a complete solution. A tokenizing approach may be much better.
 
The more I think about it, the more convinced I don't see regex's here, unless you're going to layer them. And I think the below code is better than that... you guys tell me.

It handles the following problems that were both discussed and not discussed above...

1) HTML Tags can span several lines
2) attributes should be lowercased, their values should not
3) Everything in double quotes should be completely ignored (you can add in single quotes easily if you're going to be handling non-perfectly entered HTML... could also mutate to handle non-quoted values if you want, but I'm too lazy right now)

And has the following shortcomings (and probably more)
1) Will lowercase your HTML comments as well
2) Absolutely no ability to handle poorly formed HTML, i.e. &lt; with no corresponding &gt; Again an easy fix if it's needed.

Code:
function convertHtmlTags($input) {
  $in_tag=false;
  $in_var=false;
  
  $length = strlen($input);

  for ($i=0;$i<$length;$i++) 
    {
      $char=$input[$i];
      if ($in_tag) {
	if ($in_var) {
	  if ($char == '"') {
	    $in_var = false;
	  }
	  $output .= $char;
	}
	else {
	  if ($char == '"') {
	    $in_var = true;
	  }
	  else if ($char == '>') {
	    $in_tag = false;
	  }
	  $output .= strtolower($char);
	}
      }
      else {
	if ($char == '<') {
	  $in_tag = true;
	}
	$output .= $char;
      }
    }

  return $output;
}
 
i have done it in javascript, i forgot the functions for PHP:
Code:
<script>
str='<A HREF="ASD.html" ASD=\'Hello\'>asdasd</A><Img src="ASD">'
str1=str.match(/(\s.*?=['"].+?['"])/gi);
str=str.toLowerCase()
for(i=0;i<str1.length;i++)
{
	TheInsidePath=str1[i].replace(/(.*)(=.*)/,"$2")
	TheStr="str=str.replace(/"+TheInsidePath+"/i,'"+TheInsidePath.replace(/'/g,"\\'")+"')"
	eval(TheStr)
	
}
alert(str)
</script>

i guess this is what u want. give me sometime and i shall give u the PHP code (after doing some research for myself ;) )

Known is handfull, Unknown is worldfull
 
I'm not sure I follow that javascript... looks to me like it finds all the attributes and stores them off in str1 (which is actually an array?).

Then it converts the whole string to lower case.

Then it goes through the array and swaps everything back in in its original case?

If so, I'd say effective, and probably good for the real world, but it has the potential for problems if someone's being dumb.

i.e. I'm pretty sure the following would "break" it

Code:
str='<A HREF="ASD.html">blah</a><A HREF="AsD.HTML">Don't you love contrived examples?</a>';
 
heres the PHP:
<?
$str='<A HREF="AsD.html" ASD=\'Hel"lo\' BSD="asd">asdasd</A><Img src="ASD"><Img src="asd">';
echo($str."<br>\n");
preg_match_all("/(\s.*=(['\"]).+\\2)/Ui", $str,$out, PREG_PATTERN_ORDER);
$str=strtolower($str);
for($i=0;$i<count($out[0]);$i++)
{
$TheStr=$out[0][$i];
$TheAttribPath=eregi_replace("^(.*)=.*$","\\1",$TheStr);
$TheInsidePath=eregi_replace("^.*(=.*)$","\\1",$TheStr);
$str=preg_replace("/".$TheAttribPath.$TheInsidePath."/i",strtolower($TheAttribPath).'#$%'.$i.$TheInsidePath,$str,1) or die($TheInsidePath);
}
$str=preg_replace("/#\\$%\d+=/i","=",$str) or die($TheInsidePath);
echo($str);
?>

skiflyer, give me some time to test ur string...

Known is handfull, Unknown is worldfull
 
hi,
here is the code for ur Contrived Expample thing:
Code:
<?
$str='<A HREF="ASD.html">blah</a><A HREF="AsD.HTML">Don\'t you love contrived examples?</a>';
preg_match_all("/(\s.*=(['\"]).+\\2)/Ui", $str,$out, PREG_PATTERN_ORDER);
$str=preg_replace("/<(.*)>/Ue","'<'.strtolower(\"$1\").'>'",$str);
for($i=0;$i<count($out[0]);$i++)
{
	$TheStr=$out[0][$i];
	$TheAttribPath=eregi_replace("^(.*)=.*$","\\1",$TheStr);
	$TheInsidePath=eregi_replace("^.*(=.*)$","\\1",$TheStr);
	$str=preg_replace("/".$TheAttribPath.$TheInsidePath."/i",strtolower($TheAttribPath).'#$%'.$i.$TheInsidePath,$str,1) or die($TheInsidePath);
}
$str=preg_replace("/#\\$%\d+=/i","=",$str) or die($TheInsidePath);
echo($str);
?>

it will also allow > in values...
try it...

Known is handfull, Unknown is worldfull
 
hold it, i found a bug with using > in an attribute, other things will work...

Known is handfull, Unknown is worldfull
 
Finally:
Code:
<?
$str='<A HREF="ASD.html>" ASDASDASDASD asdasdasdsd>Today is as good day to die</a><img src="ASD"><img src="asd">';
preg_match_all("/(\s.*=(['\"]).+\\2)/Ui", $str,$out, PREG_PATTERN_ORDER);
$str=preg_replace("/(=(['\"]).+\\2)/Ui","ToBeReplaced",$str);
$str=preg_replace("/<(.*)>/Uei","'<'.strtolower(\"$1\").'>'",$str);

for($i=0;$i<count($out[0]);$i++)
{
	$TheStr=$out[0][$i];
	$TheAttribPath=eregi_replace("^(.*)=.*$","\\1",$TheStr);
	$TheInsidePath=eregi_replace("^.*(=.*)$","\\1",$TheStr);
	$str=preg_replace("/tobereplaced/i",'#$%'.$i.$TheInsidePath,$str,1) or die($TheInsidePath);
}
$str=preg_replace("/#\\$%\d+=/i","=",$str) or die($TheInsidePath);
echo($str);
?>


I will be really glad if someone can come to me with bugs...

Known is handfull, Unknown is worldfull
 
Ok, now for the fun part... some non-scientific benchmarks.

Non-Regex x 10,000 = 4 seconds
Non-Regex x 100,000= 34 seconds

Regex x 10,000 = 7 seconds
Regex x 100,000 = 66 seconds

Test String
Code:
$str = '
<SPAN CLASS=">" NAME="BOO">la de da</SPAN><A HREF="AsD.HTML">1</A><a HREF="aSD.html"></A>
';
 
Thanks all. I am out of town now and, for some reason, nothing between the tek-tips code tags is readable on this system so I'll check it out when I return late on Sunday.

Don
contact@pc-homepage.com
Experienced in HTML, Perl, PHP, VBScript, PWS, IIS and Apache and MS-Access, MS-SQL, MySQL databases
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top