Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Chriss Miller on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

libpcre : pcre_callout? 1

Status
Not open for further replies.

ppc386

Programmer
Feb 4, 2003
124
US
Excerpt from the PCRE man page:

Perl has a feature whereby using the sequence (?{...}) causes arbitrary Perl code to be
obeyed in the middle of matching a regular expression. This makes it possible, amongst
other things, to extract different substrings that match the same pair of parentheses
when there is a repetition.

PCRE provides a similar feature, but of course it cannot obey arbitrary Perl code.
The feature is called "callout". The caller of PCRE provides an external function
by putting its entry point in the global variable pcre_callout.

Within a regular expression, (?C) indicates the points at which the external function is to be called.

The offset_vector field of a pcre_callout_block is a pointer to the vector of offsets
that was passed by the caller to pcre_exec(). The contents can be inspected in order
to extract substrings that have been matched so far, in the same way as for extracting
substrings after a match has completed.



But I can't figure out how to get valid offset_vector values from inside the callback.
Here is my example program:
[tt]
#include <pcre.h>
#include <stdio.h>

#define pattern &quot;(?C)ain&quot;
#define subject &quot;the rain in spain falls mainly on the plain&quot;
#define VEC_SIZE 3


static int my_callout(pcre_callout_block *block) {
int i;
for (i = 0; i < VEC_SIZE-1; i++)
printf(&quot;vector[%d] = %3d @callout\n&quot;, i, block->offset_vector[ i ]);
printf(&quot;\n&quot;);
return 0;
}



int main( int argc, char *argv[] ) {
pcre *re;
int rc;
const char *err_msg;
int err_loc;
int ovector[VEC_SIZE];
int options;
int offset;
int i;
int done;

pcre_callout = my_callout;

re = pcre_compile(pattern, 0, &err_msg, &err_loc, NULL);
if (re == NULL) {
printf(&quot;regex failed: %s xat %d\n&quot;, err_msg, err_loc);
exit(1);
}
rc = pcre_exec(re, NULL, subject, strlen(subject), 0, 0, ovector, VEC_SIZE);
done = 0;
do {
for (i = 0; i < VEC_SIZE-1; i++)
printf(&quot;vector[%d] = %2d @main\n&quot;, i, ovector[ i ]);
printf(&quot;\n&quot;);
options = 0;
offset = ovector[1];
if (ovector[0] == ovector[1]) {
if (ovector[0] == strlen(subject)) done = 1;
else { options = PCRE_NOTEMPTY | PCRE_ANCHORED; }
}
rc = pcre_exec(re, NULL, subject, strlen(subject), offset, options, ovector, VEC_SIZE);
if (!done && rc == PCRE_ERROR_NOMATCH) {
if (options == 0) done = 1; else ovector[1] = offset + 1;
}
} while (!done);
printf(&quot;rc=%d\n&quot;, rc);
exit(0);
}
[/tt]


When I run the program, this is the output I get:

[tt]

vector[0] = -1 @callout
vector[1] = -1 @callout

vector[0] = 5 @main
vector[1] = 8 @main

vector[0] = -1 @callout
vector[1] = -1 @callout

vector[0] = 14 @main
vector[1] = 17 @main

vector[0] = -1 @callout
vector[1] = -1 @callout

vector[0] = -1 @callout
vector[1] = -1 @callout

vector[0] = 25 @main
vector[1] = 28 @main

vector[0] = -1 @callout
vector[1] = -1 @callout

vector[0] = 40 @main
vector[1] = 43 @main
[/tt]


As you can see, the vectors correctly point to the matched expression
when I test them from main() , but when I check them from inside the
callback, they are always unset (-1)

Does anybody know what I am doing wrong ?
 
Have you tried to put (?C) at the end of the pattern? As there is nothing matched before the callout when it's at the beginning.
 
Hello, pfournier -

Thanks for responding.

I tried your suggestion, but it didn't seem to help.
I will try to do some more investigating ( a.k.a. RTFM )
and post whatever I find here...

- Jeff
 
As the search is not yet done, the whole match vector is not yet set (at offset 0). You can either change your code to use block->start_match and block->current_position in the callout (and put (?C) at the end of the pattern) or make the following changes:
Code:
#define pattern  &quot;(ain)(?C)&quot;
#define VEC_SIZE 6
and replace your for loops with:

Code:
    for (i = 2; i < 4; i++)

0-1 is for the whole match, 2-3 is for the first captured match, ...
 
Okay, it's working now.

( Although I still haven't quite got my brain around it )

Thanks!
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top