Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations TouchToneTommy on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

stripping or taking out html code string from a file 1

Status
Not open for further replies.

MAWarGod

Programmer
Feb 15, 2002
352
US
I have a dir I need open and open all files with the extension .who these file contain username email web-url time and date logged on showin below





Myfile.who
Code:
<img src= [URL unfurl="true"]http://www.freespaces.com/realitygonewild/wg1.jpg[/URL] align=left><center></img><br><center><FONT FACE=garamond><FONT COLOR =silver><FONT SIZE=4>WarGod of  Venna<br>*Administrator*</font><br>*Never say Never*<br><FONT COLOR =silver><FONT SIZE=4><a href=[URL unfurl="true"]http://www.chatvenues.com[/URL] target=_window>Chat Venues</a><br><a href=[URL unfurl="true"]http://www.geocities.com/resortofvenna/index.htmltarget=_window>Venna</a<FONT[/URL] COLOR =blue><FONT SIZE=3>|someemail.com|somesite.com|12/15/2006 6:24 AM

My question is how do I strip the html tags to get the first 32 chars of plain txt...WarGod of Venna*Administrator**Ne in this case
then have that sum set to file? like names.txt







MA WarGod

I believe if someone can think it, it can be programmed
 
Regular expressions:

Code:
<img src= [URL unfurl="true"]http://www.freespaces.com/realitygonewild/wg1.jpg[/URL] align=left><center></img><br><center><FONT FACE=garamond><FONT COLOR =silver><FONT SIZE=4>WarGod of  Venna<br>*Administrator*</font><br>*Never say Never*<br><FONT COLOR =silver><FONT SIZE=4><a href=[URL unfurl="true"]http://www.chatvenues.com[/URL] target=_window>Chat Venues</a><br><a href=[URL unfurl="true"]http://www.geocities.com/resortofvenna/index.htmltarget=_window>Venna</a<FONT[/URL] COLOR =blue><FONT SIZE=3>|someemail.com|somesite.com|12/15/2006 6:24 AM

Code:
($user,$quote) = $code =~ /<FONT FACE=garamond><FONT COLOR =silver><FONT SIZE=4>(.+?)<br>(.+?)<br>/;
print "$user: $quote\n";

and so-on if you want to extract more information

or, if all the plain, non-html text is everything you need, remove all HTML and convert <br> to \n

Code:
$code =~ s/<br>/\n/ig;
$code =~ s/<(.|\n)+?>//mg;
print $code;

-------------
Cuvou.com | The NEW Kirsle.net
 
{code}
sub strphtml
opendir(WHOCHATDIR, "$chat_room_dir/who");
@files = grep(/who$/,readdir(WHOCHATDIR));
closedir(WHOCHATDIR);
($user,$quote) = $code =~ s/<br>/\n/ig;
$code =~ s/<(.|\n)+?>//mg;
;chomp

#open a file and print code??
#open a file and print code??
open(whois,">$htmlpath/whofile.cgi.tmp");

print $code;

open(who,"<$htmlpath/whofile.cgi");
while(<who>){
last if $. > 50;
print whois;
}
close who;
close whois;


rename("$htmlpath/whofile.cgi.tmp","$htmlpath/whofile.cgi");

is this sorta what I need to from?



MA WarGod

I believe if someone can think it, it can be programmed
 
I gave two solutions, hence the *OR* in my last message. Use a regular expression, **OR** just remove all the HTML. Your code is very redundant. Once you have the user/quote, do what you want with it.

-------------
Cuvou.com | The NEW Kirsle.net
 
Furthermore, your overall program seems very inefficient. Is your code writing those .who HTML files out in the first place? If so, shouldn't the writing code already know the usernames so it could write them into the HTML? Why then would you need to search the HTML that your own program wrote for your program to recover variables from it?

This is the kind of programming style that makes MySpace suck.

-------------
Cuvou.com | The NEW Kirsle.net
 
As an extra question.... does your chat script store data as entire HTML files, and then when the CGI comes to send something to the browser, it loads and sends an entire HTML file all at once? This is very inefficient. Consider this:

Code:
# open messages
open (MSG, "messages.txt");
my @lines = <MSG>;
close (MSG);
chomp @lines;

print "Content-Type: text/html\n\n";
foreach my $line (@lines) {
   my ($sender,$timestamp,$msg) = split(/===/, $line, 3);
   print "[$timestamp] <b>$sender says:</b> $msg<p>\n\n";
}

# add a form to add a message to the file
print qq~<form name="add" action="Add.cgi">
<input type="text" name="name"><br>
<input type="text" name="msg">
<input type="submit">
</form>~;

##############################
# add.cgi:                   #
##############################

use CGI;
my $q = new CGI;

open (MSG, ">>messages.txt");
print MSG "\n"
   . $q->param('name') . "===" . time . "===" . $q->param('msg');
close (MSG);

So basically, you have a plain text document that reads in like this:

Code:
user one===timestamp===hello everybody!
user two===timestamp===hello user one!
user three===timestamp===welcome to our chat room, user one!

And the Perl opens that text file up, and splits users, timestamps, and messages outta it, and then renders it dynamically as HTML code for the browser to display.

This is much easier to manage, read, and manipulate, than it is for the program to save its data AS html, and then have to worry about parsing it and modifying it.

-------------
Cuvou.com | The NEW Kirsle.net
 
Yes it knows the Usernames as inputted html tag by the user in html string
the User submits the code as their name.

So I am trying (or have too) to come up with a way to strip the html code
To place in a dropdown menu and for the who-chat=showing users in the room before entering the room.. Now the dropdown menu would return the stripped html name within a 32 characters

Now I took Your code here and tried putting in to format this what I came up with so far
Your code formatted

As You can see User has to C&P Their tags each time to post or submit a massage(I like to find a way for it to remember the input name?)
I took Your code and broke it half’s to from a header, body and footer

Code:
##############################
#banner or the header
# writes to body.cgi:                   #
##############################

#!/usr/bin/perl
print "Content-Type: text/html\n\n";

# add a form to add a message to the file
print qq~<center><form name="add" action="header.cgi">
Name:<input type="text" name="name"><br>
Message:<input type="text" name="msg">
<input type="submit" value="*Post*">
</form>~;


use CGI;
my $q = new CGI;

open (MSG, ">>/mypath/to/mychat/messages.txt");
print MSG "\n"
   . $q->param('name') . "===" . time . "===" . $q->param('msg') . "===" . $q->param('REMOTE_USER_IP');

Now I added the param 'REMOTE_USER_IP' to attempt to gain the name’s IP so it will print to end of file.But it didn't work LOL, I feel that this will be needed in future one for PM’s private messages having IP to only to IP in room. As the IP’s would be hidden to the chatters also for a kick or ban I may add later.

Code:
#!/usr/bin/perl

# open messages
open (MSG, "/MYpath/messages.txt");
my @lines = <MSG>;
@lines = reverse @lines;
close (MSG);
chomp @lines;

print "Content-Type: text/html\n\n";
foreach my $line (@lines) {
   my ($sender,$timestamp,$msg) = split(/===/, $line, 3);
   print "<TABLE WIDTH=’100’ border=’0’ cellpadding=’0’ cellspacing=’0’><TR><TD> [$timestamp] <b>$sender says:</b> $msg</TD></TR></TABLE><P>\n\n";

Now here I reverse Your @lines so that the chatter’s would not have to scroll down ever post.
Btw Grant and justice41 showed Me this a long time ago in sept. 02 thank god it was still posted or I would have forgot.. and I also added a table for output.

Now I guess My next step would be to from a lib.cgi
and require it in banner?
and try to from would You have writen above
Code:
($sender) = $code =~ s/<br>/\n/ig;
$code =~ s/<(.|\n)+?>//mg;
print $code;
print "$user:$ip\n";
or can I just add it in to the action of header?
wich would be the propare way?

The footer will be nothing more then html output links and what not.

I need 3 frames for .net users




MA WarGod

I believe if someone can think it, it can be programmed
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top