Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Chriss Miller on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Strip HTLM

Status
Not open for further replies.

jstreich

Programmer
Apr 20, 2002
1,067
US
Is there a premade Java function that can strip HTML from a string? Or do I have to use regular expressions?
 
There is no core API class that I know of that will do this (but could be wrong).
If your HTML can be parsed by an XML parser (ie XHTML), then the task would be easy. If not, then I guess you need to get busy with regex, or lots of String.indexOf() and susbtring() !

--------------------------------------------------
Free Database Connection Pooling Software
 
//use regular expression
import java.util.regex.Pattern;
import java.util.regex.Matcher;
class test2
{
public static void main(String args[])
{
String input = "hello<p>1</p><logic:messagesPresent></logic:messagesPresent>";
String result = "";
Pattern p = Pattern.compile("<.+?>"); //"\u003C.+?\u003E"
Matcher matcher = p.matcher( input );
if (matcher.find())
{
result = matcher.replaceAll("");
System.out.println(result);
}

}
}
 
Isn't that just the same as:
Code:
String input = "hello<p>1</p><logic:messagesPresent></logic:messagesPresent>";
String result = input.replaceAll( "<.+?>", "" );
System.out.println( result );
?
 
yes, your statements is much shorter but your statements do not return(boolean) if replacement has been performed
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top