Contact US

Log In

Come Join Us!

Are you a
Computer / IT professional?
Join Tek-Tips Forums!
  • Talk With Other Members
  • Be Notified Of Responses
    To Your Posts
  • Keyword Search
  • One-Click Access To Your
    Favorite Forums
  • Automated Signatures
    On Your Posts
  • Best Of All, It's Free!

*Tek-Tips's functionality depends on members receiving e-mail. By joining you are opting in to receive e-mail.

Posting Guidelines

Promoting, selling, recruiting, coursework and thesis posting is forbidden.

Students Click Here

General programming in Awk

Reusing regular expression groups with awk by xmb
Posted: 12 Feb 05 (Edited 23 Feb 05)

Regular expression groups is stuff matched in brackets, eg /some([^:]+):here:(.*)/

awk, in general, cannot use or reuse those. They are normally processed as a match.

The only way to accomplish this (besides alternative techniques), is using gawk's gensub() and match() function

match() is extended by an optional third argument, a destination array in which the groups will get stored
gensub() is kindof an advanced g/sub() returning the new string. The group is specified by \\<number> in its second argument, \\0 represents the whole match, as does &. Its third argument indicates which match to actually replace, or "g" for all. Note on where .* is wanted/needed.

Note, awk does not support look-(ahead|behind), or any (<modifier><regex>) classes as PCRE or other advanced (standarized partly) libraries do, not even {<number>} counted matches, excepts gawk with --re-interval

# btw, my prompt looks like this: xmb (gp:4:3)~/awk $
# PS1='\u (\[\e[1;31m\]\h\[\e[m\]:\[\e[33m\]\l\[\e[m\]:\[\e[32m\]\j\[\e[m\])\[\e[1m\]\w\[\e[m\] \$ '


$ echo 'From: "Sumone @home" <home@me.com>
From: malformed <doh>' |
    gawk '{ print gensub(/.*:[ "]+([^"]+| *<).*<([^>]+).*/, "\\1 -+- \\2", 1) }'
Sumone @home -+- home@me.com
malformed  -+- doh


$ echo XabF XcdF XdeF | gawk '{ print gensub(/X([^X]+)F/, "\\1", 2) }'
XabF cd XdeF

$ echo XabF XcdF XdeF | gawk '{ print gensub(/X([^X]+)F/, "\\1", "g") }'
ab cd de


$ echo "<html><head><blah foo=bar>yeah<..>" | gawk '{
    match($0, /head><([^ ]+) ([^=]+)=([^>]+)>([^<]+)/, Arr)
    printf "%s (%s->%s) == %s\n", Arr[1], Arr[2], Arr[3], Arr[4]
blah (foo->bar) == yeah

; [ng tag]

Back to AWK FAQ Index
Back to AWK Forum

My Archive

Close Box

Join Tek-Tips® Today!

Join your peers on the Internet's largest technical computer professional community.
It's easy to join and it's free.

Here's Why Members Love Tek-Tips Forums:

Register now while it's still free!

Already a member? Close this window and log in.

Join Us             Close