Regex

sulfericacid · Jul 30, 2003

This is similar to my last post but I've spent the night fiddling and have different questions/better way to explain the question I had earlier.

Test print:
aaron: hi there everyone!! (1 second ago) ( in HTML)
someone else: I am writing this as a test (23 seconds ago)(br)
screw up user: I will type () and mess you up! (40 minutes ago) (br)
perhaps me: will I screw you up?

(1 hour ago)

question 1) I need to change /([^:]+): (.+) $(\d+) minutes ago$/; because 'minutes ago' can say 'seconds ago', 'minute ago', 'day ago' or 'days ago'. I read yesterday in a regex tutorial to do something like that you'd add it in brackets like [seconds ago|minute ago|minutes ago|..] but I can't figure out how to implement it.

question 2) That same regex above WILL fail if the message (after the username and after the colon) contains one or two parenthesis. How can I make it so it'll ignore those and only worry about the ones which contain ( ## $interval ago) ?

question 3) Last but not least, I can't rip apart the string as I expected to using:
/([^:]+): (.+) $(\d+) minutes ago$/;
my( $name, $text, $delay ) = ( $1, $2, $3 );
print "NAME:$name\nText:$text\nDelay:$delay\n\n";
}

I am trying to parse each line so I can get their username (anything before the first colon) into $name, anything after the colon but before the (## $interval ago) into $text and delay to be (## $interval ago) itself.

I know this is probably asking for a lot, but maybe not. If any of you are good at regexs, do you think you could give me a hand?

Thank you so much!

My script is below:
use strict;
use CGI qw/:standard/;

use HTML::Tree;
use LWP::Simple;

print header, start_html('test printing');

my $funky = "

http://www.allpoetry.com/chat//page=1";

my $content = get($funky);

my $tree = HTML::Tree->new();

$tree->parse($content);

# retrieve the text and split into lines
my @lines = split " ", $tree->as_text;

local $/;
my @good_lines;
my $good_lines;

for my $lines (@lines) {
$lines =~ s/\)/\) /g;

while($lines =~ m/Next Chatter \>(.*?)\< Previous Chatter/gs){
$good_lines = $1;
push @good_lines,$good_lines;
}
foreach (@good_lines){

my @lines = split / /;
foreach (@lines){
next unless $_;
/([^:]+): (.+) $(\d+) minutes ago$/;
my( $name, $text, $delay ) = ( $1, $2, $3 );
print "NAME:$name\nText:$text\nDelay:$delay\n\n";
}
}
}

"Age is nothing more than an inaccurate number bestowed upon each of us at birth as just another means for others to judge and classify us- sulfericacid

ctbperlmonger · Jul 30, 2003

#1:

It's implemented easiest with () grouping, not [] character class:

Code:

$foo =~ /(minute)|(hour)|(day)s? ago/;

#2:

You can match the message using character classes [ ] because ( and ) aren't interepreted as anything but their characters within character classes. For example:

Code:

[\w()\s]+

...would allow A-Z a-z 0-9 _ spaces and ( and ) in your message (punctuation would frag that particular peice of the regex, but you should be able to modify it from their - it's just an example).

#3:

Study this:

Code:

#!c:/perl/bin/perl -w

use strict;

my $foo = &quot;aaron: hi there everyone!! (1 second ago) &quot;;

my $name = substr($foo, 0, index($foo, ':'));
my $text = substr($foo, index($foo, ':'), length($foo));

print &quot;Username is: $name\n&quot;;

my $interval = $1 if $text =~ s|(\(\d+\s[A-Za-z]+\sago\))||i;

print &quot;text is: $text\n&quot;;
print &quot;interval is: $interval\n&quot;;

.. it works for me on all of your samples.

sulfericacid · Jul 30, 2003

Only problem is, I'm parsing a website so I can't define the variables like you did. I have to parse EVERYTHING through regex and I need to merge the $interval with the current regex I'm using which is:

m/([^:]+): (.+)$(\d+) (?:seconds|minute|minutes|hour|hours|day|days) ago$/;
my( $name, $text, $delay ) = ( $1, $2, $3 );

"Age is nothing more than an inaccurate number bestowed upon each of us at birth as just another means for others to judge and classify us- sulfericacid

sulfericacid · Jul 30, 2003

I'll try that out on my script to see what it does (I just tested it as-is.

"Age is nothing more than an inaccurate number bestowed upon each of us at birth as just another means for others to judge and classify us- sulfericacid

sulfericacid · Jul 30, 2003

Ok, I can't get /anything/ to print while trying to use:

#!/usr/bin/perl -w

use strict;
use CGI qw/:standard/;

use HTML::Tree;
use LWP::Simple;

print header, start_html('test printing');

my $cnt;
until ($cnt eq "2&quot

{
$cnt++;
print "Current page count: $cnt ";
my $funky = "

http://www.allpoetry.com/chat/page=$cnt;";

my $content = get($funky);

my $tree = HTML::Tree->new();

$tree->parse($content);

# retrieve the text and split into lines
my @lines = split " ", $tree->as_text;

for my $lines (@lines) {
$lines =~ s/\)/\) /g;
}
my $foo = @lines;

for my $lines (@lines) {
my $name = substr($foo, 0, index($foo, ':'));
my $text = substr($foo, index($foo, ':'), length($foo));
my $interval = $1 if $text =~ s|($\d+\s[A-Za-z]+\sago$)||i;

print "$name: $text $interval";
}
}

Have any idea what I'm doing wrong?

"Age is nothing more than an inaccurate number bestowed upon each of us at birth as just another means for others to judge and classify us- sulfericacid

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Regex

sulfericacid

Programmer

ctbperlmonger

Programmer

sulfericacid

Programmer

sulfericacid

Programmer

sulfericacid

Programmer

Similar threads

Part and Inventory Search

Sponsor