Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Chriss Miller on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

use sed remove duplicate lines

Status
Not open for further replies.

jasperx

Technical User
Jan 9, 2002
26
US
Hopefully some of you awk jockeys also dabble a bit in sed... I am trying to use a bit of both on a project and this task seemed more suited to sed but if awk is better I am game.

I have been trying to write a little sed script that would allow me to remove a header record that gets repeated when I concatenate a group of similar data files... all the header lines begin with "first name" so I gave this a try...

1{
h
d
}

/^first name/!{
H
d
}
g

This saved off the first line (the header I want to keep) into the hold space and then looped through the file appending the records to the hold until it hit the first repeated header and it spit out the results... so I added
/^first name/d just above the g to kill off the next header and it killed off everything... I get no output at all. What I want it to do is send the repeats of the header to the bit bucket and keep appending the records to the hold space until the end of the file is reached and at that point output the contents of hold. This utility would probably be better if I blew off using the hold space and just sent the first line and all subsequent non header lines to stdout... it would tie up less memory but i have not figured out how to do that.
 
I am answering my own question... at least partially. I did figure out how to remove the extra headers without using the hold space but my handling of the print is clunky... I don't know how to suppress printing when calling sed this way.
#! /usr/bin/sed -f
/^$/d
1p
/^first name/d
/^first name/!{
}

Adding in a -n on the first line won't work and I could not figure out how to use #n... so maybe this is the way to go?
 
Try something like this:
Code:
awk '/^first name/{if(NR>1)next}{print}' path/to/inputfile

Hope This Help
PH.
 
Or maybe you could use awk to do the concatenation omitting the first line of the second and subsequent files.

awk 'NR == 1 || FNR > 1' input-files.* > output-file

CaKiwi

"I love mankind, it's people I can't stand" - Linus Van Pelt
 
I could not get
awk '/^first name/{if(NR>1)next}{print}' path/to/inputfile to work..all the headers remained and the suggestion from CaKiwi to strip off the xtra lines as part of the concatonation would have set me back a ways while I figured out how to redo it permitting me to automatically handle all files in a dir. I handle this like so
`find . -name \* -print0 | xargs -0 | cat * >> ../$fileName` which does a nice job...and
#! /usr/bin/sed -f
/^$/d
1p
/^first name/d
/^first name/!{
}

is doing a nice job on getting rid of the headers
So now I have cleaned off extra spaces, tabs, merged the files, zapped the extra headers and now I am getting ready to merge fields 15 to NF but that calls for another posting. Thanks guys.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top