Excerpt from the PCRE man page:
Perl has a feature whereby using the sequence (?{...}) causes arbitrary Perl code to be
obeyed in the middle of matching a regular expression. This makes it possible, amongst
other things, to extract different substrings that match the same pair of parentheses
when there is a repetition.
PCRE provides a similar feature, but of course it cannot obey arbitrary Perl code.
The feature is called "callout". The caller of PCRE provides an external function
by putting its entry point in the global variable pcre_callout.
Within a regular expression, (?C) indicates the points at which the external function is to be called.
The offset_vector field of a pcre_callout_block is a pointer to the vector of offsets
that was passed by the caller to pcre_exec(). The contents can be inspected in order
to extract substrings that have been matched so far, in the same way as for extracting
substrings after a match has completed.
But I can't figure out how to get valid offset_vector values from inside the callback.
Here is my example program:
[tt]
#include <pcre.h>
#include <stdio.h>
#define pattern "(?C)ain"
#define subject "the rain in spain falls mainly on the plain"
#define VEC_SIZE 3
static int my_callout(pcre_callout_block *block) {
int i;
for (i = 0; i < VEC_SIZE-1; i++)
printf("vector[%d] = %3d @callout\n", i, block->offset_vector[ i ]);
printf("\n"
;
return 0;
}
int main( int argc, char *argv[] ) {
pcre *re;
int rc;
const char *err_msg;
int err_loc;
int ovector[VEC_SIZE];
int options;
int offset;
int i;
int done;
pcre_callout = my_callout;
re = pcre_compile(pattern, 0, &err_msg, &err_loc, NULL);
if (re == NULL) {
printf("regex failed: %s xat %d\n", err_msg, err_loc);
exit(1);
}
rc = pcre_exec(re, NULL, subject, strlen(subject), 0, 0, ovector, VEC_SIZE);
done = 0;
do {
for (i = 0; i < VEC_SIZE-1; i++)
printf("vector[%d] = %2d @main\n", i, ovector[ i ]);
printf("\n"
;
options = 0;
offset = ovector[1];
if (ovector[0] == ovector[1]) {
if (ovector[0] == strlen(subject)) done = 1;
else { options = PCRE_NOTEMPTY | PCRE_ANCHORED; }
}
rc = pcre_exec(re, NULL, subject, strlen(subject), offset, options, ovector, VEC_SIZE);
if (!done && rc == PCRE_ERROR_NOMATCH) {
if (options == 0) done = 1; else ovector[1] = offset + 1;
}
} while (!done);
printf("rc=%d\n", rc);
exit(0);
}
[/tt]
When I run the program, this is the output I get:
[tt]
vector[0] = -1 @callout
vector[1] = -1 @callout
vector[0] = 5 @main
vector[1] = 8 @main
vector[0] = -1 @callout
vector[1] = -1 @callout
vector[0] = 14 @main
vector[1] = 17 @main
vector[0] = -1 @callout
vector[1] = -1 @callout
vector[0] = -1 @callout
vector[1] = -1 @callout
vector[0] = 25 @main
vector[1] = 28 @main
vector[0] = -1 @callout
vector[1] = -1 @callout
vector[0] = 40 @main
vector[1] = 43 @main
[/tt]
As you can see, the vectors correctly point to the matched expression
when I test them from main() , but when I check them from inside the
callback, they are always unset (-1)
Does anybody know what I am doing wrong ?
Perl has a feature whereby using the sequence (?{...}) causes arbitrary Perl code to be
obeyed in the middle of matching a regular expression. This makes it possible, amongst
other things, to extract different substrings that match the same pair of parentheses
when there is a repetition.
PCRE provides a similar feature, but of course it cannot obey arbitrary Perl code.
The feature is called "callout". The caller of PCRE provides an external function
by putting its entry point in the global variable pcre_callout.
Within a regular expression, (?C) indicates the points at which the external function is to be called.
The offset_vector field of a pcre_callout_block is a pointer to the vector of offsets
that was passed by the caller to pcre_exec(). The contents can be inspected in order
to extract substrings that have been matched so far, in the same way as for extracting
substrings after a match has completed.
But I can't figure out how to get valid offset_vector values from inside the callback.
Here is my example program:
[tt]
#include <pcre.h>
#include <stdio.h>
#define pattern "(?C)ain"
#define subject "the rain in spain falls mainly on the plain"
#define VEC_SIZE 3
static int my_callout(pcre_callout_block *block) {
int i;
for (i = 0; i < VEC_SIZE-1; i++)
printf("vector[%d] = %3d @callout\n", i, block->offset_vector[ i ]);
printf("\n"
return 0;
}
int main( int argc, char *argv[] ) {
pcre *re;
int rc;
const char *err_msg;
int err_loc;
int ovector[VEC_SIZE];
int options;
int offset;
int i;
int done;
pcre_callout = my_callout;
re = pcre_compile(pattern, 0, &err_msg, &err_loc, NULL);
if (re == NULL) {
printf("regex failed: %s xat %d\n", err_msg, err_loc);
exit(1);
}
rc = pcre_exec(re, NULL, subject, strlen(subject), 0, 0, ovector, VEC_SIZE);
done = 0;
do {
for (i = 0; i < VEC_SIZE-1; i++)
printf("vector[%d] = %2d @main\n", i, ovector[ i ]);
printf("\n"
options = 0;
offset = ovector[1];
if (ovector[0] == ovector[1]) {
if (ovector[0] == strlen(subject)) done = 1;
else { options = PCRE_NOTEMPTY | PCRE_ANCHORED; }
}
rc = pcre_exec(re, NULL, subject, strlen(subject), offset, options, ovector, VEC_SIZE);
if (!done && rc == PCRE_ERROR_NOMATCH) {
if (options == 0) done = 1; else ovector[1] = offset + 1;
}
} while (!done);
printf("rc=%d\n", rc);
exit(0);
}
[/tt]
When I run the program, this is the output I get:
[tt]
vector[0] = -1 @callout
vector[1] = -1 @callout
vector[0] = 5 @main
vector[1] = 8 @main
vector[0] = -1 @callout
vector[1] = -1 @callout
vector[0] = 14 @main
vector[1] = 17 @main
vector[0] = -1 @callout
vector[1] = -1 @callout
vector[0] = -1 @callout
vector[1] = -1 @callout
vector[0] = 25 @main
vector[1] = 28 @main
vector[0] = -1 @callout
vector[1] = -1 @callout
vector[0] = 40 @main
vector[1] = 43 @main
[/tt]
As you can see, the vectors correctly point to the matched expression
when I test them from main() , but when I check them from inside the
callback, they are always unset (-1)
Does anybody know what I am doing wrong ?