thank you both for replying.
@MillerH
I think I see how your recursion works, although from my limited knowledge of perl it appears that the recursion is programmatic whereas I was assuming I would need some form of lookahead or lookaround within the pattern. Unfortunately I am not actually using perl as the programming language (PHP), I was simply asking the regex question here as other conversations with TT users led me to believe that you guys are the most advanced on regex topics.
Apologies for not being clear on this. as I said, I thought the solution could be pure regex (but on reading further I see that only the .Net library has a balancing engine which, I think, would have been needed to balance brackets etc).
@KevinADC
your method would not work for me as it stands because I cannot guarantee that the sql function is the last item in a select phrase. a typical query could, for example, look like
Code:
select date_format(date_add(dateField, INTERVAL '3' days), '%Y-%m-%d) as fDate, lastname, firstname from sometable order by dateField ASC LIMIT 3, 4;
I'm sorry that I did not make this clear.
Also, I don't think it would work if there were a comma in the first parameter of the function, e.g. the second example I posted above. I may have misunderstood though.
By the way, this exercise is part of a sql rewriting engine I am building to allow Wordpress to run on different database backends, aiming primarily at SQLite
But, as it happens, a good night's sleep and your posts led me to reanalyse the logical approach and I've realised that there is one constant in the pattern, and that is that the end of the date_format clause must always be " comma quote formatstring quote close-bracket " with varying degrees of interspersed white space. this pattern is unlikely (within the class of application I am looking at) to be repeated elsewhere in the sql clause so I have focused on that to provide a (currently) working solution.
for posterity (and peer review), the pattern I am using is as follows (for PHP preg_replace use). I have split the pattern out completely (and inefficiently) to aid interpretation and review. in practice, i would compress to a single line and avoid the x modifier.
Code:
$pattern = '/date_format'; //match the function call
$pattern .= '\s*'; //allow white space after the function call
$pattern .= '\('; //match the opening of the function bracket
$pattern .= '(?P<param1>.*?)'; //match everything in the first parameter of the function
$pattern .= ',\s*'; //match the comma that starts the second parameter of the function with an arbitrary amount of whitespace after
$pattern .= '(?P<formatString>'; //start a backreference
$pattern .= '('; //start a second backreference to match quotes
$pattern .= '[\'"]'; //match the quotes that start the format string
$pattern .= ')'; //end the second backreference
$pattern .= '\s*'; //more arbitrary whitespace (but there should not be any)
$pattern .= '%'; //match the format control character
$pattern .= '.*?'; //match any characters following it
$pattern .= '[^\\\\]\\3'; //match the same quote as started the format string (might need to allow for escaped backslashes)
$pattern .= ')'; //close the primary backreference (captures the whole format string
$pattern .= '\s*\)'; //match an arbitrary amount of white space followed by a closing bracket
$pattern .= '/imx'; //pattern modifiers to allow case insensitivity, multiline and arbitrary white space in the pattern.