Navigation

More options

Style variation

Close Menu

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Congratulations Rhinorhino on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Reusing regular expression groups with awk

General programming in Awk

Reusing regular expression groups with awk

by xmb Posted Feb 12, 2005 (Edited Feb 22, 2005)

Regular expression groups is stuff matched in brackets, eg /some([^:]+):here

.*)/

awk, in general, cannot use or reuse those. They are normally processed as a match.

The only way to accomplish this (besides alternative techniques), is using gawk's gensub() and match() function

match() is extended by an optional third argument, a destination array in which the groups will get stored
gensub() is kindof an advanced g/sub() returning the new string. The group is specified by \\<number> in its second argument, \\0 represents the whole match, as does &. Its third argument indicates which match to actually replace, or "g" for all. Note on where .* is wanted/needed.

Note, awk does not support look-(ahead|behind), or any (<modifier><regex>) classes as PCRE or other advanced (standarized partly) libraries do, not even {<number>} counted matches, excepts gawk with --re-interval

# btw, my prompt looks like this: xmb ([color red]gp[/color]:[color yellow]4[/color]:[color green]3[/color])~/awk $
# PS1='\u (\[\e[1;31m\]\h\[\e[m\]:\[\e[33m\]\l\[\e[m\]:\[\e[32m\]\j\[\e[m\])\[\e[1m\]\w\[\e[m\] \$ '
Examples:

Code:

$ echo 'From: "Sumone @home" <home@me.com>
From: malformed <doh>' |
    gawk '{ print gensub(/.*:[ "]+([^"]+| *<).*<([^>]+).*/, "\\1 -+- \\2", 1) }'
[b]Sumone @home -+- home@me.com
malformed  -+- doh[/b]

Code:

$ echo XabF XcdF XdeF | gawk '{ print gensub(/X([^X]+)F/, "\\1", 2) }'
[b]XabF cd XdeF[/b]

$ echo XabF XcdF XdeF | gawk '{ print gensub(/X([^X]+)F/, "\\1", "g") }'
[b]ab cd de[/b]

Code:

$ echo "<html><head><blah foo=bar>yeah<..>" | gawk '{
    match($0, /head><([^ ]+) ([^=]+)=([^>]+)>([^<]+)/, Arr)
    printf "%s (%s->%s) == %s\n", Arr[1], Arr[2], Arr[3], Arr[4]
    }'
[b]blah (foo->bar) == yeah[/b]

; [ng tag]

Please Note: 1 is Bad, 10 is Good :-)

Part and Inventory Search

This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
By continuing to use this site, you are consenting to our use of cookies.

Accept Learn more…

Back

Top