Smart questions
Smart answers
Smart people
INTELLIGENT WORK FORUMS
FOR COMPUTER PROFESSIONALS

Member Login




Remember Me
Forgot Password?
Join Us!

Come Join Us!

Are you a
Computer / IT professional?
Join Tek-Tips now!
  • Talk With Other Members
  • Be Notified Of Responses
    To Your Posts
  • Keyword Search
  • One-Click Access To Your
    Favorite Forums
  • Automated Signatures
    On Your Posts
  • Best Of All, It's Free!

Join Tek-Tips
*Tek-Tips's functionality depends on members receiving e-mail. By joining you are opting in to receive e-mail.

Donate Today!

Do you enjoy these
technical forums?
Donate Today! Click Here

Posting Guidelines

Promoting, selling, recruiting, coursework and thesis posting is forbidden.
Jobs from Indeed

Link To This Forum!

Partner Button
Add Stickiness To Your Site By Linking To This Professionally Managed Technical Forum.
Just copy and paste the
code below into your site.

General programming in Awk

Reusing regular expression groups with awk
Posted: 12 Feb 05 (Edited 23 Feb 05)

Regular expression groups is stuff matched in brackets, eg /some([^:]+):here:(.*)/

awk, in general, cannot use or reuse those. They are normally processed as a match.

The only way to accomplish this (besides alternative techniques), is using gawk's gensub() and match() function

match() is extended by an optional third argument, a destination array in which the groups will get stored
gensub() is kindof an advanced g/sub() returning the new string. The group is specified by \\<number> in its second argument, \\0 represents the whole match, as does &. Its third argument indicates which match to actually replace, or "g" for all. Note on where .* is wanted/needed.

Note, awk does not support look-(ahead|behind), or any (<modifier><regex>) classes as PCRE or other advanced (standarized partly) libraries do, not even {<number>} counted matches, excepts gawk with --re-interval

# btw, my prompt looks like this: xmb (gp:4:3)~/awk $
# PS1='\u (\[\e[1;31m\]\h\[\e[m\]:\[\e[33m\]\l\[\e[m\]:\[\e[32m\]\j\[\e[m\])\[\e[1m\]\w\[\e[m\] \$ '
Examples:

CODE

$ echo 'From: "Sumone @home" <home@me.com>
From: malformed <doh>' |
    gawk '{ print gensub(/.*:[ "]+([^"]+| *<).*<([^>]+).*/, "\\1 -+- \\2", 1) }'
Sumone @home -+- home@me.com
malformed  -+- doh

CODE

$ echo XabF XcdF XdeF | gawk '{ print gensub(/X([^X]+)F/, "\\1", 2) }'
XabF cd XdeF

$ echo XabF XcdF XdeF | gawk '{ print gensub(/X([^X]+)F/, "\\1", "g") }'
ab cd de

CODE

$ echo "<html><head><blah foo=bar>yeah<..>" | gawk '{
    match($0, /head><([^ ]+) ([^=]+)=([^>]+)>([^<]+)/, Arr)
    printf "%s (%s->%s) == %s\n", Arr[1], Arr[2], Arr[3], Arr[4]
    }'
blah (foo->bar) == yeah

; [ng tag]

Back to AWK FAQ Index
Back to AWK Forum

My Archive

Close Box

Join Tek-Tips® Today!

Join your peers on the Internet's largest technical computer professional community.
It's easy to join and it's free.

Here's Why Members Love Tek-Tips Forums:

Register now while it's still free!

Already a member? Close this window and log in.

Join Us             Close