Smart questions
Smart answers
Smart people
INTELLIGENT WORK FORUMS
FOR COMPUTER PROFESSIONALS

Member Login

Come Join Us!

Are you a
Computer / IT professional?
Join Tek-Tips now!
  • Talk With Other Members
  • Be Notified Of Responses
    To Your Posts
  • Keyword Search
  • One-Click Access To Your
    Favorite Forums
  • Automated Signatures
    On Your Posts
  • Best Of All, It's Free!

Join Tek-Tips
*Tek-Tips's functionality depends on members receiving e-mail. By joining you are opting in to receive e-mail.

LINK TO THIS FORUM!

Add Stickiness To Your Site By Linking To This Professionally Managed Technical Forum.
Just copy and paste the
code below into your site.

Partner With Us!

"Best Of Breed" Forums Add Stickiness To Your Site
Partner Button
(Download This Button Today!)

Feedback

"...What a great service! This is the best site I've ever seen!!! It totally restores my faith in humanity when people take time out to help other people..."

Geography

Where in the world do Tek-Tips members come from?

General programming in Awk

Reusing regular expression groups with awk
Posted: 12 Feb 05 (Edited 23 Feb 05)

Regular expression groups is stuff matched in brackets, eg /some([^:]+):here:(.*)/

awk, in general, cannot use or reuse those. They are normally processed as a match.

The only way to accomplish this (besides alternative techniques), is using gawk's gensub() and match() function

match() is extended by an optional third argument, a destination array in which the groups will get stored
gensub() is kindof an advanced g/sub() returning the new string. The group is specified by \\<number> in its second argument, \\0 represents the whole match, as does &. Its third argument indicates which match to actually replace, or "g" for all. Note on where .* is wanted/needed.

Note, awk does not support look-(ahead|behind), or any (<modifier><regex>) classes as PCRE or other advanced (standarized partly) libraries do, not even {<number>} counted matches, excepts gawk with --re-interval

# btw, my prompt looks like this: xmb (gp:4:3)~/awk $
# PS1='\u (\[\e[1;31m\]\h\[\e[m\]:\[\e[33m\]\l\[\e[m\]:\[\e[32m\]\j\[\e[m\])\[\e[1m\]\w\[\e[m\] \$ '
Examples:

CODE

$ echo 'From: "Sumone @home" <home@me.com>
From: malformed <doh>' |
    gawk '{ print gensub(/.*:[ "]+([^"]+| *<).*<([^>]+).*/, "\\1 -+- \\2", 1) }'
Sumone @home -+- home@me.com
malformed  -+- doh

CODE

$ echo XabF XcdF XdeF | gawk '{ print gensub(/X([^X]+)F/, "\\1", 2) }'
XabF cd XdeF

$ echo XabF XcdF XdeF | gawk '{ print gensub(/X([^X]+)F/, "\\1", "g") }'
ab cd de

CODE

$ echo "<html><head><blah foo=bar>yeah<..>" | gawk '{
    match($0, /head><([^ ]+) ([^=]+)=([^>]+)>([^<]+)/, Arr)
    printf "%s (%s->%s) == %s\n", Arr[1], Arr[2], Arr[3], Arr[4]
    }'
blah (foo->bar) == yeah

; [ng tag]

Back to AWK FAQ Index
Back to AWK Forum

My Archive

Close Box

Join Tek-Tips® Today!

Join your peers on the Internet's largest technical computer professional community.
It's easy to join and it's free.

Here's Why Members Love Tek-Tips Forums:

Register now while it's still free!

Already a member? Close this window and log in.

Join Us             Close