×
INTELLIGENT WORK FORUMS
FOR COMPUTER PROFESSIONALS

Log In

Come Join Us!

Are you a
Computer / IT professional?
Join Tek-Tips Forums!
  • Talk With Other Members
  • Be Notified Of Responses
    To Your Posts
  • Keyword Search
  • One-Click Access To Your
    Favorite Forums
  • Automated Signatures
    On Your Posts
  • Best Of All, It's Free!
  • Students Click Here

*Tek-Tips's functionality depends on members receiving e-mail. By joining you are opting in to receive e-mail.

Posting Guidelines

Promoting, selling, recruiting, coursework and thesis posting is forbidden.

Students Click Here

Jobs

Need help in construct an regexp

Need help in construct an regexp

Need help in construct an regexp

(OP)
I need to find a string match at Microsoft site. Out of its web page, there could be one or more strings look like this:

<a class="download" onclick="return false;" href="confirmation.aspx?id=36888" bi:fileurl="http://download.microsoft.com/download/8/D/5/8D5F90F3-AC24-4A15-9716-BAE10533977A/Windows6.0-KB2809289-x86.msu"

var downloadFileUrl = "http://download.microsoft.com/download/8/D/5/8D5F90F3-AC24-4A15-9716-BAE10533977A/Windows6.0-KB2809289-x86.msu" ;

$("#ctl00_ctl21_ColumnRepeater_ctl00_RowRepeater_ctl01_CellRepeater_ctl00_ctl01").details({ "downloadUrl": "http://download.microsoft.com/download/8/D/5/8D5F90F3-AC24-4A15-9716-BAE10533977A/Windows6.0-KB2809289-x86.msu", "enableAtlasActionTag": true, "atlasActionTag": ""

The regexp I came up is somewhat like this:

CODE

my $kb = 'KB2809289';
my $pattern = qq/http:\/\/download\.microsoft\.com\/download([\\d+\\D+\\w+\W+]+)+$kb\.msu/; 

I then have two implementations:

Implementation I:

CODE

my $pattern = qq/http:\/\/download\.microsoft\.com\/download([\\d+\\D+\\w+\W+]+)+$kbName\.msu/;
  my $contents = `cat $srce`; # The Microsoft page has been saved as a local file
  my $i = 1;
  while($contents =~ /($pattern)/g) {
    my $match = $&;
    print ("\$i = $i, $match\n");
    $i++;
  } 
The error I got is like this:

CODE

Complex regular subexpression recursion limit (32766) exceeded at <file name> line 3348. 

Implementation II:

CODE

if(open(FH, $srce)) {
    my $i = 1;
    while(my $line = <FH>) {
      if($line =~ /($pattern)/g) {
        my $match = $1;
        print ("\$i = $i, $match\n");
      }
      $i++;
    }
    close(FH);
  } 
No match is found. So I know the regexp is incorrect.

Please help me in two areas:
1) fix my regexp;
2) with a correct regexp, would I still get this error - Complex regular subexpression recursion limit (32766) exceeded?

Many thanks!!

RE: Need help in construct an regexp

Hi

What is the expected output for that sample input ?

Feherke.
http://feherke.github.com/

RE: Need help in construct an regexp

(OP)
Sorry for not having myself clear.

I need to extract "http://download.microsoft.com/download/8/D/5/8D5F90F3-AC24-4A15-9716-BAE10533977A/Windows6.0-KB2809289-x86.msu" from an html file. The strings in bold red font are known. A few sample lines in that html file were listed in my original post.


Thanks.

RE: Need help in construct an regexp

Hi

Oops. I had the feeling you want only certain parts of the URLs.

Then I would do it like this :

CODE --> Perl

my $kb = 'KB2809289';
my $pattern1 = qq{http://download.microsoft.com/download.+?-$kb-\\w+.msu};    # string
my $pattern2 = qr{http://download\.microsoft\.com/download/.+?-$kb-\w+\.msu}; # regular expression

my $contents = do { local $/; <DATA> };

my $i = 1;
while ($contents =~ /$pattern1/g) {
  my $match = $&;
  print "\$i = $i, $match\n";
  $i++;
}

__DATA__
<a class="download" onclick="return false;" href="confirmation.aspx?id=36888" bi:fileurl="
http://download.microsoft.com/download/8/D/5/8D5F90F3-AC24-4A15-9716-BAE10533977A/Windows6.0-KB2809289-x86.msu"

var downloadFileUrl = "http://download.microsoft.com/download/8/D/5/8D5F90F3-AC24-4A15-9716-BAE10533977A/Windows6.0-KB2809289-x86.msu" ;

$("#ctl00_ctl21_ColumnRepeater_ctl00_RowRepeater_ctl01_CellRepeater_ctl00_ctl01").details({ "downloadUrl": "http://download.microsoft.com/download/8/D/5/8D5F90F3-AC24-4A15-9716-BAE10533977A/Windows6.0-KB2809289-x86.msu", "enableAtlasActionTag": true, "atlasActionTag": "" 

Note that using the $pattern1 string or the $pattern2 regular expression does the same. I included both as I see you mixed abit the their syntax.

Feherke.
http://feherke.github.com/

RE: Need help in construct an regexp

(OP)
Thank you so much, Feherke! You are the man!!

RE: Need help in construct an regexp

(OP)
Hi Feherke,

I have a follow-up question.

I modified your code a bit. Please also note that I modified the input data, too. All my changes are in blue bold fonts.

I noticed that when regexp is used, then the case-insensitive match would not work. Is this the way supposed to be?

Again, thank you so much for your help.

CODE

my $kb = 'KB2809289';
my $cpuType = 'x86'; # passed in, could be in upper case, too
my $cpuTypeInL = lc($cpuType);
my $cpuTypeInU = uc($cpuType);

# String
my $pattern1 = qq{http://download.microsoft.com/download.+?-$kb-$cpuTypeInL.*?.msu};
my $pattern2 = qq{http://download.microsoft.com/download.+?-$kb-$cpuTypeInU.*?.msu};

# regular expression
my $pattern3 = qr{http://download\.microsoft\.com/download/.+?-$kb-$cpuTypeInL.*?.msu};
my $pattern4 = qr{http://download\.microsoft\.com/download/.+?-$kb-$cpuTypeInU.*?.msu};

my $p;
#$p = $pattern1; # string match - match all 3
#$p = $pattern2; # string match - match all 3
#$p = $pattern3; # regexp match - only match 2
$p = $pattern4;  # regexp match - only match 1

my $contents = do { local $/; <DATA> };
if($p =~ /x86/) {
  print "Lower Case Pattern: $p\n";
}
else {
  print "Upper Case Pattern: $p\n";
}

my $i = 1;
while ($contents =~ /$p/gi) { # make it case-insensitive match
  my $match = $&;
  print "\$i = $i, $match\n";
  $i++;
}
__DATA__
<a class="download" onclick="return false;" href="confirmation.aspx?id=36888" bi:fileurl="
http://download.microsoft.com/download/8/D/5/8D5F90F3-AC24-4A15-9716-BAE10533977A/Windows6.0-KB2809289-x86.msu"

var downloadFileUrl = "http://download.microsoft.com/download/8/D/5/8D5F90F3-AC24-4A15-9716-BAE10533977A/Windows6.0-KB2809289-X86.msu" ; // It's upper case!!

$("#ctl00_ctl21_ColumnRepeater_ctl00_RowRepeater_ctl01_CellRepeater_ctl00_ctl01").details({ "downloadUrl": "http://download.microsoft.com/download/8/D/5/8D5F90F3-AC24-4A15-9716-BAE10533977A/Windows6.0-KB2809289-x86.msu", "enableAtlasActionTag": true, "atlasActionTag": "" 

RE: Need help in construct an regexp

Hi

That is because putting a regular expression into a variable also includes the flags :

CODE --> perl -de 42

  DB<1> print qr{foo}
(?^:foo)
  DB<2> print qr{foo}i
(?^i:foo) 

There the ?^ resets the flags locally inside the group. So you have to specify the case-insensitive flag at the qr :

CODE --> (fragment)

regular expression
my $pattern3 = qr{http://download\.microsoft\.com/download/.+?-$kb-$cpuTypeInL.*?.msu}i;
my $pattern4 = qr{http://download\.microsoft\.com/download/.+?-$kb-$cpuTypeInU.*?.msu}i; 

Feherke.
http://feherke.github.com/

RE: Need help in construct an regexp

(OP)
Excellent!! Thank you, Feherke.

Red Flag This Post

Please let us know here why this post is inappropriate. Reasons such as off-topic, duplicates, flames, illegal, vulgar, or students posting their homework.

Red Flag Submitted

Thank you for helping keep Tek-Tips Forums free from inappropriate posts.
The Tek-Tips staff will check this out and take appropriate action.

Reply To This Thread

Posting in the Tek-Tips forums is a member-only feature.

Click Here to join Tek-Tips and talk with other members!

Close Box

Join Tek-Tips® Today!

Join your peers on the Internet's largest technical computer professional community.
It's easy to join and it's free.

Here's Why Members Love Tek-Tips Forums:

Register now while it's still free!

Already a member? Close this window and log in.

Join Us             Close