Converting Tags to Lower Case

DonP · Sep 8, 2004

Does anyone know of a function that will convert HTML tags into lower case? I am doing it now with a series of str_replace() functions but it's awkward at best.

Don
contact@pc-homepage.com

http://www.pc-homepage.com/

Experienced in HTML, Perl, PHP, VBScript, PWS, IIS and Apache and MS-Access, MS-SQL, MySQL databases

Foamcow · Sep 8, 2004

strtolower();

http://uk.php.net/strtolower

will that do it?

http://www.foamcow.com

cLFlaVA · Sep 8, 2004

I think he's looking for something that will turn this:

<TD>Text Goes Here</TD>

To this:

<td>Text Goes here</td>

strtolower() will change it to:

<td>text goes here</td>

He'll likely need a regular expression for this, of which I am little help :-/

*cLFlaVA
----------------------------
Ham and Eggs walks into a bar and asks, "Can I have a beer please?"
The bartender replies, "I'm sorry, we don't serve breakfast.

DonP · Sep 8, 2004

Yes, strtolower() will convert everything, not just the tags. I know virtually nothing about regular expressions myself.

Don
contact@pc-homepage.com

http://www.pc-homepage.com/

Experienced in HTML, Perl, PHP, VBScript, PWS, IIS and Apache and MS-Access, MS-SQL, MySQL databases

Foamcow · Sep 8, 2004

OK.. not sure if this will help. It's slightly converted from the PHP manual.

Code:

<?php
preg_replace("/(<\/?)(\w+)([^>]*>)/e",
             "'\\1'.strtolower('\\2').'\\3'",
             $html_body);
?>

Not sure if it will work as my brain is too fuzzy to fully understand it.

http://www.foamcow.com

Foamcow · Sep 8, 2004

Just tested it and it seems to work ok.
It might need to be adjusted slightly to deal with quote marks and slashes.

http://www.foamcow.com

DonP · Sep 8, 2004

Thanks! It LOOKS OK but, as I said, I know little about regular expressions. I just tested it too but it seems to mess up image tags:

Code:

<img src=\"../images/photo.jpg\">

I suppose I Can use str_replace() though to straighten it out again.

Don
contact@pc-homepage.com

http://www.pc-homepage.com/

Experienced in HTML, Perl, PHP, VBScript, PWS, IIS and Apache and MS-Access, MS-SQL, MySQL databases

vbkris · Sep 8, 2004

preg_replace("/(<.*>)/Ue",
"strtolower('\\1')",
$html_body);

will replace all the text between <> to lowecase...

Known is handfull, Unknown is worldfull

DonP · Sep 9, 2004

In this case, wouldn't it change things like

Code:

<A HREF="filename.php?ID=1">

to all lower, including passed values?

Code:

<a href="filename.php?id=1">

If so, this won't work because "ID=1" isn't the same as "id=1". Is there any way to change only the HTML and not anything else? Foamcow's tip seems to ALMOST do the trick and leaves values alone, changing only the HTML but it backslashes any quotes. Any thoughts on how to correct this?

Don
contact@pc-homepage.com

http://www.pc-homepage.com/

Experienced in HTML, Perl, PHP, VBScript, PWS, IIS and Apache and MS-Access, MS-SQL, MySQL databases

skiflyer · Sep 9, 2004

The example does more than leave alone all values, it also leaves alone attributes.

So

Code:

<A HREF="something.html">
becomes
<a HREF="something.html">

It is specifically looking for text attached to a less than sign with no spaces, (this text will be lowered), followed by anything and everything, and ending with a greater than sign.

As far as it automatically escaping the quotes, I'm a bit confused by that. It really shouldn't be doing that. A stripslashes would remove them, but may remove others you don't want to.

You need a much more complex regular expression to handle this whole thing properly... I'm at work now, but maybe by morning I'll try to chug something up.

skiflyer · Sep 9, 2004

The real trick here is ignoring everything in quotes.

Code:

<a href=">">

causes a potential pain in the rear for example. I don't actually know that regex's are the best solution actually. At least not for a complete solution. A tokenizing approach may be much better.

skiflyer · Sep 9, 2004

The more I think about it, the more convinced I don't see regex's here, unless you're going to layer them. And I think the below code is better than that... you guys tell me.

It handles the following problems that were both discussed and not discussed above...

1) HTML Tags can span several lines
2) attributes should be lowercased, their values should not
3) Everything in double quotes should be completely ignored (you can add in single quotes easily if you're going to be handling non-perfectly entered HTML... could also mutate to handle non-quoted values if you want, but I'm too lazy right now)

And has the following shortcomings (and probably more)
1) Will lowercase your HTML comments as well
2) Absolutely no ability to handle poorly formed HTML, i.e. < with no corresponding > Again an easy fix if it's needed.

Code:

function convertHtmlTags($input) {
  $in_tag=false;
  $in_var=false;
  
  $length = strlen($input);

  for ($i=0;$i<$length;$i++) 
    {
      $char=$input[$i];
      if ($in_tag) {
	if ($in_var) {
	  if ($char == '"') {
	    $in_var = false;
	  }
	  $output .= $char;
	}
	else {
	  if ($char == '"') {
	    $in_var = true;
	  }
	  else if ($char == '>') {
	    $in_tag = false;
	  }
	  $output .= strtolower($char);
	}
      }
      else {
	if ($char == '<') {
	  $in_tag = true;
	}
	$output .= $char;
      }
    }

  return $output;
}

vbkris · Sep 10, 2004

i have done it in javascript, i forgot the functions for PHP:

Code:

<script>
str='<A HREF="ASD.html" ASD=\'Hello\'>asdasd</A><Img src="ASD">'
str1=str.match(/(\s.*?=['"].+?['"])/gi);
str=str.toLowerCase()
for(i=0;i<str1.length;i++)
{
	TheInsidePath=str1[i].replace(/(.*)(=.*)/,"$2")
	TheStr="str=str.replace(/"+TheInsidePath+"/i,'"+TheInsidePath.replace(/'/g,"\\'")+"')"
	eval(TheStr)
	
}
alert(str)
</script>

i guess this is what u want. give me sometime and i shall give u the PHP code (after doing some research for myself

)

Known is handfull, Unknown is worldfull

skiflyer · Sep 10, 2004

I'm not sure I follow that javascript... looks to me like it finds all the attributes and stores them off in str1 (which is actually an array?).

Then it converts the whole string to lower case.

Then it goes through the array and swaps everything back in in its original case?

If so, I'd say effective, and probably good for the real world, but it has the potential for problems if someone's being dumb.

i.e. I'm pretty sure the following would "break" it

Code:

str='<A HREF="ASD.html">blah</a><A HREF="AsD.HTML">Don't you love contrived examples?</a>';

vbkris · Sep 10, 2004

heres the PHP:
<?
$str='<A HREF="AsD.html" ASD=\'Hel"lo\' BSD="asd">asdasd</A><Img src="ASD"><Img src="asd">';
echo($str."<br>\n");
preg_match_all("/(\s.*=(['\"]).+\\2)/Ui", $str,$out, PREG_PATTERN_ORDER);
$str=strtolower($str);
for($i=0;$i<count($out[0]);$i++)
{
$TheStr=$out[0][$i];
$TheAttribPath=eregi_replace("^(.*)=.*$","\\1",$TheStr);
$TheInsidePath=eregi_replace("^.*(=.*)$","\\1",$TheStr);
$str=preg_replace("/".$TheAttribPath.$TheInsidePath."/i",strtolower($TheAttribPath).'#$%'.$i.$TheInsidePath,$str,1) or die($TheInsidePath);
}
$str=preg_replace("/#\\$%\d+=/i","=",$str) or die($TheInsidePath);
echo($str);
?>

skiflyer, give me some time to test ur string...

Known is handfull, Unknown is worldfull

vbkris · Sep 10, 2004

hi,
here is the code for ur Contrived Expample thing:

Code:

<?
$str='<A HREF="ASD.html">blah</a><A HREF="AsD.HTML">Don\'t you love contrived examples?</a>';
preg_match_all("/(\s.*=(['\"]).+\\2)/Ui", $str,$out, PREG_PATTERN_ORDER);
$str=preg_replace("/<(.*)>/Ue","'<'.strtolower(\"$1\").'>'",$str);
for($i=0;$i<count($out[0]);$i++)
{
	$TheStr=$out[0][$i];
	$TheAttribPath=eregi_replace("^(.*)=.*$","\\1",$TheStr);
	$TheInsidePath=eregi_replace("^.*(=.*)$","\\1",$TheStr);
	$str=preg_replace("/".$TheAttribPath.$TheInsidePath."/i",strtolower($TheAttribPath).'#$%'.$i.$TheInsidePath,$str,1) or die($TheInsidePath);
}
$str=preg_replace("/#\\$%\d+=/i","=",$str) or die($TheInsidePath);
echo($str);
?>

it will also allow > in values...
try it...

Known is handfull, Unknown is worldfull

vbkris · Sep 10, 2004

hold it, i found a bug with using > in an attribute, other things will work...

Known is handfull, Unknown is worldfull

vbkris · Sep 10, 2004

Finally:

Code:

<?
$str='<A HREF="ASD.html>" ASDASDASDASD asdasdasdsd>Today is as good day to die</a><img src="ASD"><img src="asd">';
preg_match_all("/(\s.*=(['\"]).+\\2)/Ui", $str,$out, PREG_PATTERN_ORDER);
$str=preg_replace("/(=(['\"]).+\\2)/Ui","ToBeReplaced",$str);
$str=preg_replace("/<(.*)>/Uei","'<'.strtolower(\"$1\").'>'",$str);

for($i=0;$i<count($out[0]);$i++)
{
	$TheStr=$out[0][$i];
	$TheAttribPath=eregi_replace("^(.*)=.*$","\\1",$TheStr);
	$TheInsidePath=eregi_replace("^.*(=.*)$","\\1",$TheStr);
	$str=preg_replace("/tobereplaced/i",'#$%'.$i.$TheInsidePath,$str,1) or die($TheInsidePath);
}
$str=preg_replace("/#\\$%\d+=/i","=",$str) or die($TheInsidePath);
echo($str);
?>

I will be really glad if someone can come to me with bugs...

Known is handfull, Unknown is worldfull

skiflyer · Sep 10, 2004

Ok, now for the fun part... some non-scientific benchmarks.

Non-Regex x 10,000 = 4 seconds
Non-Regex x 100,000= 34 seconds

Regex x 10,000 = 7 seconds
Regex x 100,000 = 66 seconds

Test String

Code:

$str = '
<SPAN CLASS=">" NAME="BOO">la de da</SPAN><A HREF="AsD.HTML">1</A><a HREF="aSD.html"></A>
';

DonP · Sep 10, 2004

Thanks all. I am out of town now and, for some reason, nothing between the tek-tips code tags is readable on this system so I'll check it out when I return late on Sunday.

Don
contact@pc-homepage.com

http://www.pc-homepage.com/

Experienced in HTML, Perl, PHP, VBScript, PWS, IIS and Apache and MS-Access, MS-SQL, MySQL databases

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Converting Tags to Lower Case

IS-IT--Management

Programmer

Programmer

IS-IT--Management

Programmer

Programmer

IS-IT--Management

Programmer

IS-IT--Management

Programmer

Programmer

Programmer

Programmer

Programmer

Programmer

Programmer

Programmer

Programmer

Programmer

IS-IT--Management

Similar threads

Log in

Part and Inventory Search

Sponsor