Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations wOOdy-Soft on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Finding the right RE for sed substitution 2

Status
Not open for further replies.

cmeyers

Programmer
Jul 6, 2001
24
US
Hi all,

I'm having writer's block (must be Fri. 13th)

Trying to write a basic sed substitution:

Code:
sed "s/HTR4B/HTR2/g" file

A relevant chunk of file looks like this:

Code:
<PRM_ASSN>HTR4B.L</PRM_ASSN>
<GMBE_NAME>LC_HTR4B.SDCS = PROC</GMBE_NAME>

The problem is I do not want substrings replaced. I only want HTR4B.L replaced with HTR2.L and LC_HTR4B.SDCS would remain unchanged.

This is the result I want:

Code:
<PRM_ASSN>HTR2.L</PRM_ASSN>
<GMBE_NAME>LC_HTR4B.SDCS = PROC</GMBE_NAME>

These are C++ objects in an XML file. So that means they must start with a letter or underscore and can contain letters, digits and underscores. Our system forces uppercase.

So I thought this might work but it doesn't:

Code:
sed &quot;s/(![_A-Z0-9])HTR4B(![_A-Z0-9.])/HTR2/g&quot; file

This is probably an easy one for you regular expression wizards!

Thanks,
CraigMan >:):O>
 
vgersh99,

Your solution may be too generic for my problem.

Here's what I get when I grep on &quot;HTR4B&quot;:

Code:
<OBJ_NAME>VHTR4B_BP</OBJ_NAME>
<OBJ_NAME>VHTR4B_IS</OBJ_NAME>
<PRM_ASSN>HTR4B</PRM_ASSN>
<OBJ_NAME>HTR4B</OBJ_NAME>
<PRM_ASSN>VHTR4B_BP</PRM_ASSN>
<PRM_ASSN>VHTR4B_IS</PRM_ASSN>
<GMBE_PORT_CONNECTOR_NAME>HTR4B.OFI[2] = FDRN3B</GMBE_PORT_CONNECTOR_NAME>
<GMBE_NAME>HTR4B.OFI[2] = FDRN3B</GMBE_NAME>
<GMBE_NAME>HTR4B.OFI[1] = FLP4A1</GMBE_NAME>
<GMBE_NAME>HTR4B.OFI[0] = FLP4B1</GMBE_NAME>
<GMBE_NAME>HTR4B.OVBP = VHTR4B_BP</GMBE_NAME>
<GMBE_STR_VAL>VHTR4B_BP</GMBE_STR_VAL>
<GMBE_NAME>HTR4B.OVISI = VHTR4B_IS</GMBE_NAME>
<GMBE_STR_VAL>VHTR4B_IS</GMBE_STR_VAL>
<GMBE_NAME>HTR4B</GMBE_NAME>
<GMBE_PORT_CONNECTOR_NAME>HTR4B.OVISI = VHTR4B_IS</GMBE_PORT_CONNECTOR_NAME>
<GMBE_PORT_CONNECTOR_NAME>HTR4B.OFX[1] = FDRN4BD</GMBE_PORT_CONNECTOR_NAME>
<GMBE_PORT_CONNECTOR_NAME>HTR4B.OFX[0] = FDRN4B</GMBE_PORT_CONNECTOR_NAME>
<GMBE_PORT_CONNECTOR_NAME>HTR4B.OFI[2] = FDRN3B</GMBE_PORT_CONNECTOR_NAME>
<GMBE_PORT_CONNECTOR_NAME>HTR4B.OFI[0] = FLP4B1</GMBE_PORT_CONNECTOR_NAME>
<GMBE_PORT_CONNECTOR_NAME>HTR4B.OVBP = VHTR4B_BP</GMBE_PORT_CONNECTOR_NAME>
<GMBE_PORT_CONNECTOR_NAME>HTR4B.OFI[1] = FLP4A1</GMBE_PORT_CONNECTOR_NAME>
<GMBE_PORT_CONNECTOR_NAME>FHTR_B.OCOMP[2] = HTR4B</GMBE_PORT_CONNECTOR_NAME>
<GMBE_PORT_CONNECTOR_NAME>HTR4B.OFX[0] = FDRN4B</GMBE_PORT_CONNECTOR_NAME>
<GMBE_NAME>HTR4B.OFX[0] = FDRN4B</GMBE_NAME>
<GMBE_NAME>HTR4B.P</GMBE_NAME>
<GMBE_NAME>HTR4B.TTI</GMBE_NAME>
<GMBE_NAME>HTR4B.TTX</GMBE_NAME>
<GMBE_PORT_CONNECTOR_NAME>HTR4B.OFX[1] = FDRN4BD</GMBE_PORT_CONNECTOR_NAME>
<GMBE_NAME>HTR4B.OFX[1] = FDRN4BD</GMBE_NAME>
<GMBE_NAME>VHTR4B_IS</GMBE_NAME>
<GMBE_NAME>VHTR4B_BP</GMBE_NAME>
<GMBE_STR_VAL>VHTR4B_BP</GMBE_STR_VAL>
<GMBE_STR_VAL>VHTR4B_IS</GMBE_STR_VAL>
<PRM_EQN>MIN(1.0,LC_HTR4B.OUT/0.75)</PRM_EQN>
<PRM_EQN>MAX(0.0,(LC_HTR4B.OUT-0.75)/0.25)</PRM_EQN>
<OBJ_NAME>LC_HTR4B</OBJ_NAME>
<PRM_ASSN>HTR4B.L</PRM_ASSN>
<GMBE_NAME>LC_HTR4B.SDCS = FOXBORO</GMBE_NAME>
<GMBE_NAME>LC_HTR4B.PV = HTR4B.L</GMBE_NAME>
<GMBE_STR_VAL>HTR4B.L</GMBE_STR_VAL>
<GMBE_NAME>LC_HTR4B</GMBE_NAME>
<GMBE_PORT_CONNECTOR_NAME>LC_HTR4B.SDCS = FOXBORO</GMBE_PORT_CONNECTOR_NAME>
<GMBE_PORT_CONNECTOR_NAME>LC_HTR4B.PV = HTR4B.L</GMBE_PORT_CONNECTOR_NAME>

What I want to match is:

Code:
<PRM_ASSN>HTR4B</PRM_ASSN>
<OBJ_NAME>HTR4B</OBJ_NAME>
<GMBE_PORT_CONNECTOR_NAME>HTR4B.OFI[2] = FDRN3B</GMBE_PORT_CONNECTOR_NAME>
<GMBE_NAME>HTR4B.OFI[2] = FDRN3B</GMBE_NAME>
<GMBE_NAME>HTR4B.OFI[1] = FLP4A1</GMBE_NAME>
<GMBE_NAME>HTR4B.OFI[0] = FLP4B1</GMBE_NAME>
<GMBE_NAME>HTR4B.OVBP = VHTR4B_BP</GMBE_NAME>
<GMBE_NAME>HTR4B.OVISI = VHTR4B_IS</GMBE_NAME>
<GMBE_NAME>HTR4B</GMBE_NAME>
<GMBE_PORT_CONNECTOR_NAME>HTR4B.OVISI = VHTR4B_IS</GMBE_PORT_CONNECTOR_NAME>
<GMBE_PORT_CONNECTOR_NAME>HTR4B.OFX[1] = FDRN4BD</GMBE_PORT_CONNECTOR_NAME>
<GMBE_PORT_CONNECTOR_NAME>HTR4B.OFX[0] = FDRN4B</GMBE_PORT_CONNECTOR_NAME>
<GMBE_PORT_CONNECTOR_NAME>HTR4B.OFI[2] = FDRN3B</GMBE_PORT_CONNECTOR_NAME>
<GMBE_PORT_CONNECTOR_NAME>HTR4B.OFI[0] = FLP4B1</GMBE_PORT_CONNECTOR_NAME>
<GMBE_PORT_CONNECTOR_NAME>HTR4B.OVBP = VHTR4B_BP</GMBE_PORT_CONNECTOR_NAME>
<GMBE_PORT_CONNECTOR_NAME>HTR4B.OFI[1] = FLP4A1</GMBE_PORT_CONNECTOR_NAME>
<GMBE_PORT_CONNECTOR_NAME>FHTR_B.OCOMP[2] = HTR4B</GMBE_PORT_CONNECTOR_NAME>
<GMBE_PORT_CONNECTOR_NAME>HTR4B.OFX[0] = FDRN4B</GMBE_PORT_CONNECTOR_NAME>
<GMBE_NAME>HTR4B.OFX[0] = FDRN4B</GMBE_NAME>
<GMBE_NAME>HTR4B.P</GMBE_NAME>
<GMBE_NAME>HTR4B.TTI</GMBE_NAME>
<GMBE_NAME>HTR4B.TTX</GMBE_NAME>
<GMBE_PORT_CONNECTOR_NAME>HTR4B.OFX[1] = FDRN4BD</GMBE_PORT_CONNECTOR_NAME>
<GMBE_NAME>HTR4B.OFX[1] = FDRN4BD</GMBE_NAME>
<PRM_ASSN>HTR4B.L</PRM_ASSN>
<GMBE_NAME>LC_HTR4B.PV = HTR4B.L</GMBE_NAME>
<GMBE_STR_VAL>HTR4B.L</GMBE_STR_VAL>
<GMBE_PORT_CONNECTOR_NAME>LC_HTR4B.PV = HTR4B.L</GMBE_PORT_CONNECTOR_NAME>

Notice these lines (from what I want to match above):

Code:
<GMBE_PORT_CONNECTOR_NAME>HTR4B.OVISI = VHTR4B_IS</GMBE_PORT_CONNECTOR_NAME>
<GMBE_PORT_CONNECTOR_NAME>HTR4B.OVBP = VHTR4B_BP</GMBE_PORT_CONNECTOR_NAME>
<GMBE_NAME>LC_HTR4B.PV = HTR4B.L</GMBE_NAME>
<GMBE_PORT_CONNECTOR_NAME>LC_HTR4B.PV = HTR4B.L</GMBE_PORT_CONNECTOR_NAME>

My substitution (HTR2 for HTR4B) whould look like this for the lines directly above:

Code:
<GMBE_PORT_CONNECTOR_NAME>HTR2.OVISI = VHTR4B_IS</GMBE_PORT_CONNECTOR_NAME>
<GMBE_PORT_CONNECTOR_NAME>HTR2.OVBP = VHTR4B_BP</GMBE_PORT_CONNECTOR_NAME>
<GMBE_NAME>LC_HTR4B.PV = HTR2.L</GMBE_NAME>
<GMBE_PORT_CONNECTOR_NAME>LC_HTR4B.PV = HTR2.L</GMBE_PORT_CONNECTOR_NAME>

I WAS using pattern space operators:

sed &quot;s/\(pattern 1\)HTR4B\(pattern 2\)/\1HTR2\2/g&quot; file

But I was having problems which were probably due to filling the pattern space buffers. I could not figure out how to flush the buffers.

So I was hoping for a REGEX magic bullet!

Thanks,
CraigMan
 
How 'bout these two - my &quot;sed&quot; doesn't have &quot;+&quot;, so I have
to &quot;simulate&quot; it.

s/>[a-ZA-Z0-9\.][a-ZA-Z0-9\.]* />HTR2.L /g
s/ [a-ZA-Z0-9\.][a-ZA-Z0-9\.]*</ HTR2.L</g
 
Thanks for the help.
You have given me some ideas I can work through.
 
How about

s/\([^_]HTR\)4B/\12/g

This would work with ex/vi but I'm not sure about sed.

You could do it with ex by writing a script t.ex say

%s/\([^_]HTR\)4B/\12/g
w

and executing with

ex - file < t.ex
 
CraigMan-

Try it this way:

sed '/HTR4B/ {
s/^HTR4B</HTR2</
s/^HTR4B\./HTR2\./
s/>HTR4B</>HTR2</
s/>HTR4B\./>HTR2\./
s/= HTR4B\./= HTR2\./
}' inputfile > outputfile

The reason ranges don't work in sed is because the
range only affects one character at a time! This
applies all these substitutions to each line before
moving on. The braces are what allows this and if
one of the substitutions does not apply, it just falls
through.

HTH


flogrr
flogr@yahoo.com

 
flogrr, I don't know what you mean by ranges don't work in sed. I finally read what craigman was asking and the following solution seems to work:

sed &quot;s/\([^_A-Z0-9]\)HTR4B\([^_A-Z0-9]\)/\1HTR2\2/&quot; file

CaKiwi
 
CaKiwi,

You are right on the mark. I came up with EXACTLY the same pattern after many moments of iteration. I was able to use this pattern exclusively (which helps with execution speed). I had something similar to flogrr but the execution speed was unacceptable. My script runs even faster since I incorporated an awk that determines the address range for each substitution:

Code:
XMLFILE=$(ls -1t *.xml | gawk 'NR==1 {print}')
CSVFILE=searchreplace.csv
dos2unix $CSVFILE $CSVFILE

set -A oldname $(gawk 'BEGIN {FS=&quot;,&quot;} {print $1}' $CSVFILE)
set -A newname $(gawk 'BEGIN {FS=&quot;,&quot;} {print $3}' $CSVFILE)

cp $XMLFILE file0

NUMOBJECTS=$(gawk 'END {print NR}' $CSVFILE)

i=0
while (( $i<${#oldname[*]} ))
do
RANGE=$(gawk &quot;/${oldname[$i]}/ {if (x==0) printf(\&quot;%d,\&quot;,NR);x=1}&quot; file$i;gawk &quot;/${oldname[$i]}/ {print NR}&quot; file$i | gawk 'END {printf(&quot;%d\n&quot;,$0)}')
clear
print &quot;Processing $NUMOBJECTS Object Names...&quot;
print &quot;$((${i}+1))) Replacing ${oldname[$i]} with ${newname[$i]}&quot;
sed &quot;${RANGE}s/\(.*[^A-Z0-9_]\)${oldname[$i]}\([^A-Z0-9_].*\)/\1${newname[$i]}\2/g&quot; file$i > file$((${i}+1))
rm -f file$i
i=$((${i}+1))
done
 
Sorry, poor choice of words. Ranges work just fine in
sed. I now see that by excluding the strings, numerals,
and underscores a more efficient method is obtained.

I took the opposite tack and tried to match exactly
what was required. That meant that each line had
to be parsed several times rather than just once.

I stand corrected!


flogrr
flogr@yahoo.com

 

Reference the sed command:
sed &quot;s/\([^_A-Z0-9]\)HTR4B\([^_A-Z0-9]\)/\1HTR2\2/&quot; file
Please explain what the \1HTR2\2/ is doing in this sed command. I am lost on how this works?
 
Teser,

The text found between the first \( \) pair is saved in buffer 1 , the second in buffer 2 and so on up to buffer 9. You can then replay these in the substitution string by specifying \1 for the first buffer, \2 for the second etc.

CaKiwi
 
Right on CaKiwi.

And you can specify up to 9 buffers.

Although I've only ever needed 2 or 3 at most.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top