Need help in construct an regexp
Need help in construct an regexp
(OP)
I need to find a string match at Microsoft site. Out of its web page, there could be one or more strings look like this:
<a class="download" onclick="return false;" href="confirmation.aspx?id=36888" bi:fileurl="http://download.microsoft.com/download/8/D/5/8D5F90F3-AC24-4A15-9716-BAE10533977A/Windows6.0-KB2809289-x86.msu"
var downloadFileUrl = "http://download.microsoft.com/download/8/D/5/8D5F90F3-AC24-4A15-9716-BAE10533977A/Windows6.0-KB2809289-x86.msu" ;
$("#ctl00_ctl21_ColumnRepeater_ctl00_RowRepeater_ctl01_CellRepeater_ctl00_ctl01").details({ "downloadUrl": "http://download.microsoft.com/download/8/D/5/8D5F90F3-AC24-4A15-9716-BAE10533977A/Windows6.0-KB2809289-x86.msu", "enableAtlasActionTag": true, "atlasActionTag": ""
The regexp I came up is somewhat like this:
I then have two implementations:
Implementation I:
The error I got is like this:
Implementation II:
No match is found. So I know the regexp is incorrect.
Please help me in two areas:
1) fix my regexp;
2) with a correct regexp, would I still get this error - Complex regular subexpression recursion limit (32766) exceeded?
Many thanks!!
<a class="download" onclick="return false;" href="confirmation.aspx?id=36888" bi:fileurl="http://download.microsoft.com/download/8/D/5/8D5F90F3-AC24-4A15-9716-BAE10533977A/Windows6.0-KB2809289-x86.msu"
var downloadFileUrl = "http://download.microsoft.com/download/8/D/5/8D5F90F3-AC24-4A15-9716-BAE10533977A/Windows6.0-KB2809289-x86.msu" ;
$("#ctl00_ctl21_ColumnRepeater_ctl00_RowRepeater_ctl01_CellRepeater_ctl00_ctl01").details({ "downloadUrl": "http://download.microsoft.com/download/8/D/5/8D5F90F3-AC24-4A15-9716-BAE10533977A/Windows6.0-KB2809289-x86.msu", "enableAtlasActionTag": true, "atlasActionTag": ""
The regexp I came up is somewhat like this:
CODE
my $kb = 'KB2809289';
my $pattern = qq/http:\/\/download\.microsoft\.com\/download([\\d+\\D+\\w+\W+]+)+$kb\.msu/;
I then have two implementations:
Implementation I:
CODE
my $pattern = qq/http:\/\/download\.microsoft\.com\/download([\\d+\\D+\\w+\W+]+)+$kbName\.msu/; my $contents = `cat $srce`; # The Microsoft page has been saved as a local file my $i = 1; while($contents =~ /($pattern)/g) { my $match = $&; print ("\$i = $i, $match\n"); $i++; }
CODE
Complex regular subexpression recursion limit (32766) exceeded at <file name> line 3348.
Implementation II:
CODE
if(open(FH, $srce)) { my $i = 1; while(my $line = <FH>) { if($line =~ /($pattern)/g) { my $match = $1; print ("\$i = $i, $match\n"); } $i++; } close(FH); }
Please help me in two areas:
1) fix my regexp;
2) with a correct regexp, would I still get this error - Complex regular subexpression recursion limit (32766) exceeded?
Many thanks!!
RE: Need help in construct an regexp
What is the expected output for that sample input ?
Feherke.
http://feherke.github.com/
RE: Need help in construct an regexp
I need to extract "http://download.microsoft.com/download/8/D/5/8D5F90F3-AC24-4A15-9716-BAE10533977A/Windows6.0-KB2809289-x86.msu" from an html file. The strings in bold red font are known. A few sample lines in that html file were listed in my original post.
Thanks.
RE: Need help in construct an regexp
Oops. I had the feeling you want only certain parts of the URLs.
Then I would do it like this :
CODE --> Perl
Note that using the $pattern1 string or the $pattern2 regular expression does the same. I included both as I see you mixed abit the their syntax.
Feherke.
http://feherke.github.com/
RE: Need help in construct an regexp
RE: Need help in construct an regexp
I have a follow-up question.
I modified your code a bit. Please also note that I modified the input data, too. All my changes are in blue bold fonts.
I noticed that when regexp is used, then the case-insensitive match would not work. Is this the way supposed to be?
Again, thank you so much for your help.
CODE
RE: Need help in construct an regexp
That is because putting a regular expression into a variable also includes the flags :
CODE --> perl -de 42
There the ?^ resets the flags locally inside the group. So you have to specify the case-insensitive flag at the qr :
CODE --> (fragment)
Feherke.
http://feherke.github.com/
RE: Need help in construct an regexp