Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Shaun E on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Extract Regular Expression? 1

Status
Not open for further replies.

PerlElvir

Technical User
Aug 23, 2005
68
I have one question, so I have:

$string1 = "Hello World\n";

my ($var1) = $string1 =~ /H\s*(.*)\s*l/i;

whan I print $var1 I extract "ello Wor" and my question is how I can manipulate with this, because there is 3 times repeat latter "l", but in this case he is take only last? How can I took 1 or 2...?
 
I don't quite know what you are trying to do so i have broken the regex down:-

Code:
/

H      literal 'H'
\s*    none or more spaces (captures nothing in this example)
(.*)   none or more of any character (greedy - captures 'ello Wor') STORE AS $1
\s*    none or more spaces (captures nothing in this example)
l      literal 'l'

/i;    (ignore case switch)

Kind Regards
Duncan
 

Ok from

-'Hello World' I want to extraxt literals between 'H' and 'l'

my ($var1) = $string1 =~ /H\s*(.*)\s*l/i;

=

from literal 'H' to literal 'l'

and I get 'ello Wor', but whan I use this regex why he didnt took 'e' or 'el', because that is same from 'H' till first literal 'l' or seconde 'l'.

So my question whay regex take last literal 'l' and how I can tell to take first or second?

 
Not a regex expert, but here's what worked for me.

$string1 = "Hello World\n";
chomp($string1);

print ($1) if ($string1 =~ (/^(\w+).l/i));
"He" is printed.

print ($1) if ($string1 =~ (/^(\w+.l)?l/i));
"Hel" is printed.

print ($1) if ($string1 =~ (/^H(\w+\s+\w+)+l\w+$/i));
"ello Wor" is printed.
 
the question mark stops it being greedy - i.e. capturing as much as possible

Code:
[b]#!/usr/bin/perl[/b]

$string1 = "Hello World\n";

my ($var1) = $string1 =~ /H(.*?)l/i;

print $var1;

Kind Regards
Duncan
 
thanks for the vote... whoever it was! ;-)

Kind Regards
Duncan
 
regexp's that use .* or even .+ are "greedy" as duncdude said, this means it will match as much as possible, so the regexp is matching until the last "l" instead of the first "l". As duncdude showed, the ? makes the regexp "non-greedy", so it matches as little as possible. There are a lot of regexp operators, such as ?, it's best to study up on them as it can be confusing at first.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top