Strip HTLM

jstreich · Dec 3, 2004

Is there a premade Java function that can strip HTML from a string? Or do I have to use regular expressions?

sedj · Dec 4, 2004

There is no core API class that I know of that will do this (but could be wrong).
If your HTML can be parsed by an XML parser (ie XHTML), then the task would be easy. If not, then I guess you need to get busy with regex, or lots of String.indexOf() and susbtring() !

--------------------------------------------------
Free Database Connection Pooling Software

http://www.primrose.org.uk

prosper · Dec 4, 2004

//use regular expression
import java.util.regex.Pattern;
import java.util.regex.Matcher;
class test2
{
public static void main(String args[])
{
String input = "hello<p>1</p><logic:messagesPresent></logic:messagesPresent>";
String result = "";
Pattern p = Pattern.compile("<.+?>"); //"\u003C.+?\u003E"
Matcher matcher = p.matcher( input );
if (matcher.find())
{
result = matcher.replaceAll("");
System.out.println(result);
}

}
}

ishnid · Dec 6, 2004

Isn't that just the same as:

Code:

String input = "hello<p>1</p><logic:messagesPresent></logic:messagesPresent>";
String result = input.replaceAll( "<.+?>", "" );
System.out.println( result );

?

prosper · Dec 6, 2004

yes, your statements is much shorter but your statements do not return(boolean) if replacement has been performed

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Strip HTLM

jstreich

Programmer

sedj

Programmer

prosper

Programmer

ishnid

Programmer

prosper

Programmer

Similar threads

Part and Inventory Search

Sponsor