Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations MikeeOK on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Data Formating

Status
Not open for further replies.

hill007

Technical User
Mar 9, 2004
60
US
I have some time series data in certain format and I want to format the data in certain type. Could anyone help me as how to do this.

Here is an example of a small input data that I want to format:

"DBKEY","STATION","AGENCY","COUNTY","TYPE","UNITS","STAT","FQ","START","END","LAT","LONG","SECTION","TOWN","RANGE","ALTERNATE ID"
"081306-2","PT181","01-AUG-1997","0","","12-NOV-2003"
"081306-2","PT181","02-AUG-1997",".83","","12-NOV-2003"
"081306-2","PT181","03-AUG-1997","0","","12-NOV-2003"
"081306-2","PT181","04-AUG-1997","0","","12-NOV-2003"
"081306-2","PT181","05-AUG-1997","0","","12-NOV-2003"
"081306-2","PT181","06-AUG-1997",".25","","12-NOV-2003"
"081306-2","PT181","07-AUG-1997","0","T","12-NOV-2003"
"081306-2","PT181","08-AUG-1997","0","","12-NOV-2003"
"081306-2","PT181","09-AUG-1997",".11","","12-NOV-2003"


My final format should be like this:

081306-2 PT181 01-AUG-1997 0 12-NOV-2003
081306-2 PT181 02-AUG-1997 .83 12-NOV-2003
081306-2 PT181 03-AUG-1997 0 12-NOV-2003
081306-2 PT181 04-AUG-1997 0 12-NOV-2003
081306-2 PT181 05-AUG-1997 0 12-NOV-2003
081306-2 PT181 06-AUG-1997 .25 12-NOV-2003
081306-2 PT181 07-AUG-1997 0T 12-NOV-2003
081306-2 PT181 08-AUG-1997 0 12-NOV-2003
081306-2 PT181 09-AUG-1997 .11 12-NOV-2003

Basically removing everything in the first three lines and getting rid of the "" and , .

Any help is appreciated.

 
hill007,

This sed one-liner should work:

sed -e '1,3d' -e 's/"//g' -e 's/,/ /g' infile
 
You can't really tell from my last post, but the , is being replaced with a space.

sed -e '1,3d' -e 's/"//g' -e 's/,/ /g' infile
^ space

John
 
Something like this ?
sed -n '2,${;s!","! !g;s!"!!g;p;}' /path/to/inputfile
If you want to remove the first three lines, replace the 2 by 4 in the above sed script.

Hope This Help, PH.
Want to get great answers to your Tek-Tips questions? Have a look at FAQ219-2884 or FAQ222-2244
 
.... or in awk:

Code:
nawk -F, -v OFS=' ' '{gsub("\"","");$1=$1}1' file

vlad
+----------------------------+
| #include<disclaimer.h> |
+----------------------------+
 
... or if spacing is important, try this awk program

BEGIN {FS = ","}
NR>1{
gsub("\"","")
printf "%-10s%-7s%-12s", $1,$2,$3
if (substr($4,1,1) == ".") fmt = "%-5s%s\n"
else fmt = " %-4s%s\n"
printf fmt,$4$5,$6
}

Use NR>3 instead of NR>1 if you want to delete the first 3 lines.

CaKiwi
 
Thanks everyone.
CatKiwi,

your awk program was helpful. However, the 4th and 5th fields are merging together. How do I go about seperating it out?

Thanks.
 
From your sample output, it looks like you want the 4th and 5th fields merged together. To separate them, change add another %s to the fmt variable and change the last printf statement to

printf fmt,$4,$5,$6


CaKiwi
 
For example

if (substr($4,1,1) == ".") fmt = "%-5s%-3s%s\n"
else fmt = " %-4s%-3s%s\n"
printf fmt,$4,$5,$6

gives

081306-2 PT181 01-AUG-1997 0 12-NOV-2003
081306-2 PT181 02-AUG-1997 .83 12-NOV-2003
081306-2 PT181 03-AUG-1997 0 12-NOV-2003
081306-2 PT181 04-AUG-1997 0 12-NOV-2003
081306-2 PT181 05-AUG-1997 0 12-NOV-2003
081306-2 PT181 06-AUG-1997 .25 12-NOV-2003
081306-2 PT181 07-AUG-1997 0 T 12-NOV-2003
081306-2 PT181 08-AUG-1997 0 12-NOV-2003
081306-2 PT181 09-AUG-1997 .11 12-NOV-2003

CaKiwi
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top