Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations TouchToneTommy on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

greedy regex 1

Status
Not open for further replies.

MoshiachNow

IS-IT--Management
Feb 6, 2002
1,851
IL
HI,

Having a string like:

<?xml version="1.0" encoding="utf-16"?><DATADOC><GROUP NAME="Chunk"><PARAM NAME="StartJobCommand"></PARAM><GROUP
NAME="Job"><PARAM NAME="SplitLevel" IT="I64">0</PARAM><PARAM NAME="StartDocCommand"></PARAM><GROUP NAME="Doc"><PARAM NAM
E="DocIndex" IT="I64">1</PARAM><PARAM NAME="NumOfPagesInDoc" IT="I64">12512</PARAM><PARAM NAME="FileLocation" SRSID="1">
D:\Output\NJobs\binder1_10443\binder1.pdf</PARAM><PARAM NAME="StartPageCommand"></PARAM><GROUP NAME="Page"><PARAM NAME="
CjfPageIndex" PdlPageIndex="1" IT="I64

I need to extract the file type - in this case "pdf" : binder1.pdf

However the below regex is not good enough,since it eats up all the string till the last accurance of "</PARAM> :

$fileType =~ m!"FileLocation"\s*SRSID=".+?">.*?\\\.(.*?)\</PARAM\>!;
$fileType = $1;

Need an advice here .
thanks

Long live king Moshiach !
 
The problem is not that of the greediness of whatever part of the expression (in fact there seems to be none), but the fact that the dot (.) is preceded by binder1 (or some filename permissible characters) rather than a backslash.

>[tt]$fileType =~ m!"FileLocation"\s*SRSID=".+?">.*?\\\.(.*?)\</PARAM\>!;[/tt]
[tt]$fileType =~ m!"FileLocation"\s[blue]+[/blue]SRSID=".+?">[highlight].*?\.[/highlight](.*?)</PARAM\>!;[/tt]

ps: [1] [tt]\s*[/tt] should better be [tt]\s+[/tt] and that [2] the [tt].*?[/tt] corresponding to the base part of the filename could be further restricted to a lesser character class if so desired.
 
Using an XML parser might make your life easier. XML::Simple will probably do the trick.

But, if you want to use a regex, this should also work for you:
Code:
m/FileLocation[^>]+>[^<]+\.([^<]+)</;
 
thanks to both.
rharsh,I would appreciate some explanation on your suggetsted regex syntax which I did not manage to tackle so far.
thanks

Long live king Moshiach !
 
See if this helps:

Code:
m/  FileLocation        # FileLocation
    [^>]+               # 1 or more non > characters
    >                   # >
    [^<]+               # 1 or more non < characters
    \.                  # .
    ([^<]+)             # Capture non < characters in $1
    <                   # <
    /x;                 # Allow White Space in Regex
 
Status
Not open for further replies.

Similar threads

Part and Inventory Search

Sponsor

Back
Top