Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations derfloh on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

parsing string problem 2

Status
Not open for further replies.

j8by7

Programmer
Joined
Mar 13, 2008
Messages
3
Location
US
Hey guys,
i'm a newbie to perl, I've been looking up how to parse strings but I can't seem to find what i'm looking for. If i have a string ($apple = "I like to eat green apples"), how do i pull out a section of that string and put it into another string. For example, if i want everything between "like" and "green" how do i put " to eat " into another variable $whyMe? What would the assignment statment look like to do this and the regular expression if i need one.
Thanks
Harvey
 
The first thing you need to do is define exactly what it is you're doing, without expressing it in terms of an example (ironically enough).

For instance, what you've described in your example may be technically described in a number of different ways, each of which are quite distinct, e.g.:
- extract the third and fourth words from the string
- extract everything between the 8th and 15th characters of the string, inclusive
- given two substrings, extract everything between them

What you've described probably hints at the third variation, but you still have to define what happens if, for example, the word "green" turns up twice in the string - which would you use?
 
I'm trying to read a file and analyze each line for specific data. I'm using a for loop to do the reading. Each line gets stored in $line each loop. There'll be alot of junk in $line but i'm Specifically looking for a web address and date, for example

$line = akgwjkrgagkjgawkhwh ahk <a href="/w/index.php?title=Star_Wars&amp;oldid=197291913" title="Star Wars">18:57, 10 March 2008</a> akhjwekhakhkahkh

I can't figure out how to extract just the address and the date then store them into variables. Hope I didn't confuse you. If you know what I'm talking about and can give me a point in the right direction (or the answer ;) i'd appreciated!
 
To start with try below code :
Not tested
Code:
$line = 'akgwjkrgagkjgawkhwh ahk <a href="/w/index.php?title=Star_Wars&amp;oldid=197291913" title="Star Wars">18:57, 10 March 2008</a> akhjwekhakhkahkh' ;

$line =~ /\<a href=\"(.+)?\"\s+.+?\>(.+)?\</ ;

print "$1\n\2" ;
$1 should give you /w/index.php?title=Star_Wars&amp;oldid=197291913
$2 should give you 18:57, 10 March 2008



--------------------------------------------------------------------------
I never set a goal because u never know whats going to happen tommorow.
 
Typo
Code:
$line = 'akgwjkrgagkjgawkhwh ahk <a href="/w/index.php?title=Star_Wars&amp;oldid=197291913" title="Star Wars">18:57, 10 March 2008</a> akhjwekhakhkahkh' ;

$line =~ /\<a href=\"(.+)?\"\s+.+?\>(.+)?\</ ;

print "$1\n\[b]$[/b]2" ;


--------------------------------------------------------------------------
I never set a goal because u never know whats going to happen tommorow.
 
While that would work for this specific string, what happens if there's a line break between "<a" and "href="? Or if the HTML developer has put the "class" attribute of their <a> before the "href"?

A proper tag-aware HTML parser is the way to go for this one. HTML::LinkExtor is specifically designed for this kind of thing, or if you need something more flexible, HTML::TokeParser::Simple might be the thing to use.
 
Thanks for the good input guys. Something i'm not understanding about the expression is how it assigns what i need to variable $1 and $2. I understand some parts of the expression but not enough to figure out what's going on. Could you please break it down for me or give me a good link that explains this area well.
Again, thanks [thumbsup2]
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top