parsing string problem 2

j8by7 · Mar 13, 2008

Hey guys,
i'm a newbie to perl, I've been looking up how to parse strings but I can't seem to find what i'm looking for. If i have a string ($apple = "I like to eat green apples"), how do i pull out a section of that string and put it into another string. For example, if i want everything between "like" and "green" how do i put " to eat " into another variable $whyMe? What would the assignment statment look like to do this and the regular expression if i need one.
Thanks
Harvey

ishnid · Mar 13, 2008

The first thing you need to do is define exactly what it is you're doing, without expressing it in terms of an example (ironically enough).

For instance, what you've described in your example may be technically described in a number of different ways, each of which are quite distinct, e.g.:
- extract the third and fourth words from the string
- extract everything between the 8th and 15th characters of the string, inclusive
- given two substrings, extract everything between them

What you've described probably hints at the third variation, but you still have to define what happens if, for example, the word "green" turns up twice in the string - which would you use?

j8by7 · Mar 13, 2008

I'm trying to read a file and analyze each line for specific data. I'm using a for loop to do the reading. Each line gets stored in $line each loop. There'll be alot of junk in $line but i'm Specifically looking for a web address and date, for example

$line = akgwjkrgagkjgawkhwh ahk <a href="/w/index.php?title=Star_Wars&oldid=197291913" title="Star Wars">18:57, 10 March 2008</a> akhjwekhakhkahkh

I can't figure out how to extract just the address and the date then store them into variables. Hope I didn't confuse you. If you know what I'm talking about and can give me a point in the right direction (or the answer

i'd appreciated!

spookie · Mar 13, 2008

To start with try below code :
Not tested

Code:

$line = 'akgwjkrgagkjgawkhwh ahk <a href="/w/index.php?title=Star_Wars&amp;oldid=197291913" title="Star Wars">18:57, 10 March 2008</a> akhjwekhakhkahkh' ;

$line =~ /\<a href=\"(.+)?\"\s+.+?\>(.+)?\</ ;

print "$1\n\2" ;

$1 should give you /w/index.php?title=Star_Wars&oldid=197291913
$2 should give you 18:57, 10 March 2008

--------------------------------------------------------------------------
I never set a goal because u never know whats going to happen tommorow.

spookie · Mar 13, 2008

Typo

Code:

$line = 'akgwjkrgagkjgawkhwh ahk <a href="/w/index.php?title=Star_Wars&amp;oldid=197291913" title="Star Wars">18:57, 10 March 2008</a> akhjwekhakhkahkh' ;

$line =~ /\<a href=\"(.+)?\"\s+.+?\>(.+)?\</ ;

print "$1\n\[b]$[/b]2" ;

--------------------------------------------------------------------------
I never set a goal because u never know whats going to happen tommorow.

ishnid · Mar 14, 2008

While that would work for this specific string, what happens if there's a line break between "<a" and "href="? Or if the HTML developer has put the "class" attribute of their <a> before the "href"?

A proper tag-aware HTML parser is the way to go for this one. HTML::LinkExtor is specifically designed for this kind of thing, or if you need something more flexible, HTML::TokeParser::Simple might be the thing to use.

j8by7 · Mar 14, 2008

Thanks for the good input guys. Something i'm not understanding about the expression is how it assigns what i need to variable $1 and $2. I understand some parts of the expression but not enough to figure out what's going on. Could you please break it down for me or give me a good link that explains this area well.
Again, thanks [thumbsup2]

brigmar · Mar 14, 2008

Those parts of the regex in parentheses [in this case "(.+)"] are assigned to $1, $2 etc.

http://perldoc.perl.org/perlretut.html#Extracting-matches

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

parsing string problem 2

j8by7

Programmer

ishnid

Programmer

j8by7

Programmer

spookie

Programmer

spookie

Programmer

ishnid

Programmer

j8by7

Programmer

brigmar

Programmer

Similar threads

Part and Inventory Search

Sponsor