Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations MikeeOK on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Fixing inconsistent .csv data... 4

Status
Not open for further replies.

admoore

IS-IT--Management
May 17, 2002
224
US
I have a comma delimited file that looks kind of like this:

Code:
"SERIAL","RO","CLOSED","MILEAGE","SWR","LABOR$","LBR-COST$","LBR-TYPE","PARTS$","PTS-COST$","MISC$","MISC-COST$","TECH","OP-CODES","SOLD-HOURS","STOCK-NO","OPENED","CLOSED"
"5TBBT4410YS090746",440480,"02APR01",15811,"733   733","30.00   0.00","25.50   0.00","CEMP   IS","29.46   0.00",26.51,,,"136   136","YELLOW   10B","1.50   0.00","43082A","02APR01",""
"5TBBT4813YS097071",440218,"02APR01",7818,184,13.80,2.50,"W93",0.49,0.34,"","",416,74399,0.20,17690,"30MAR01",""

Upon reviewing the sample data you will see that sometimes a field will contain a single value, while other records contain fields with multiple values per field, separated by tabs within a single record field...

What I hope to do is parse the data into a file where those records with multiple values per field get split into multiple records, that is, the two sample records above would now look like this:

Code:
"SERIAL","RO","CLOSED","MILEAGE","SWR","LABOR$","LBR-COST$","LBR-TYPE","PARTS$","PTS-COST$","MISC$","MISC-COST$","TECH","OP-CODES","SOLD-HOURS","STOCK-NO","OPENED","CLOSED"
"5TBBT4410YS090746",440480,"02APR01",15811,"733","30.00","25.50","CEMP","29.46",26.51,,,"136","YELLOW","1.50","43082A","02APR01",""
"5TBBT4410YS090746",440480,"02APR01",15811,"733","0.00","0.00","IS","0.00",26.51,,,"136","10B","0.00","43082A","02APR01",""
"5TBBT4813YS097071",440218,"02APR01",7818,184,13.80,2.50,"W93",0.49,0.34,"","",416,74399,0.20,17690,"30MAR01",""

While the above sample only shows one record includinag a single set of extra values, many records contain numerous extra values within the fields.

As always, any help in overcoming this issue is greatly appreciated...

-Allen
 

How about this


BEGIN { FS=OFS="," }
{
max = 0
for (j=1;j<=NF;j++) {
a[j,0] = split($j,b,&quot;\t&quot;)
if (max<a[j,0]) max = a[j,0]
for (k=1;k<=a[j,0];k++) {
a[j,k] = b[k]
}
if ($j ~ /^&quot;.*&quot;$/)
qt[j] = substr($j,1,1)
else
qt[j] = 0
}
for (i=1;i<=max;i++) {
for (j=1;j<=NF;j++) {
k = i
if (a[j,0] < i) k = a[j,0]
if (j>1) printf OFS
if (qt[j] && a[j,k] !~ /^&quot;/) printf qt[j]
printf a[j,k]
if (qt[j] && a[j,k] !~ /&quot;$/) printf qt[j]
}
print &quot;&quot;
}
} CaKiwi

&quot;I love mankind, it's people I can't stand&quot; - Linus Van Pelt
 
Not sure why, but using suggested code just seems to duplicate the original file...

Any more help would still be greatly appreciated...

-A
 
Seems to work ok for me. I put the awk script into a file call ad1.awk and ran

awk -f ad1.awk inptu_file > output file

output file contained 4 lines of data that looked the same as the output data you posted. How did you run it, what version of awk, what OS? CaKiwi

&quot;I love mankind, it's people I can't stand&quot; - Linus Van Pelt
 
Dunno what's wrong...

I've tried using an AWK distribution for Windows as well as the standard distribution within SUSE Linux with the same result...

I also put the awk script into a file similar to described...

Still get same # of lines out as in...

Still very appreciative of help...
 
Put some debug print statements in it. e.g. put

for (k=1;k<a[j,0];k++) printf b[k]
print a[j,0],max

after the if (max... statement. CaKiwi

&quot;I love mankind, it's people I can't stand&quot; - Linus Van Pelt
 
Did this...
Code:
BEGIN { FS=OFS=&quot;,&quot; }
{
max = 0
for (j=1;j<=NF;j++) {
a[j,0] = split($j,b,&quot;\t&quot;)
if (max<a[j,0]) max = a[j,0]

for (k=1;k<a[j,0];k++) printf b[k]
print a[j,0],max


for (k=1;k<=a[j,0];k++) {
a[j,k] = b[k]
}
if ($j ~ /^&quot;.*&quot;$/)
qt[j] = substr($j,1,1)
else
qt[j] = 0
}
for (i=1;i<=max;i++) {
for (j=1;j<=NF;j++) {
k = i
if (a[j,0] < i) k = a[j,0]
if (j>1) printf OFS
if (qt[j] && a[j,k] !~ /^&quot;/) printf qt[j]
printf a[j,k]
if (qt[j] && a[j,k] !~ /&quot;$/) printf qt[j]
}
print &quot;&quot;
}
}

and got this:

Code:
1,1
1,1
1,1
1,1
1,1
1,1
1,1
1,1
1,1
1,1
1,1
1,1
1,1
1,1
1,1
1,1
1,1
1,1
&quot;SERIAL&quot;,&quot;RO&quot;,&quot;CLOSED&quot;,&quot;MILEAGE&quot;,&quot;SWR&quot;,&quot;LABOR$&quot;,&quot;LBR-COST$&quot;,&quot;LBR-TYPE&quot;,&quot;PARTS$&quot;,&quot;PTS-COST$&quot;,&quot;MISC$&quot;,&quot;MISC-COST$&quot;,&quot;TECH&quot;,&quot;OP-CODES&quot;,&quot;SOLD-HOURS&quot;,&quot;STOCK-NO&quot;,&quot;OPENED&quot;,&quot;CLOSED&quot;
1,1
1,1
1,1
1,1
1,1
1,1
1,1
1,1
1,1
1,1
0,1
0,1
1,1
1,1
1,1
1,1
1,1
1,1
&quot;5TBBT4410YS090746&quot;,440480,&quot;02APR01&quot;,15811,&quot;733 733&quot;,&quot;30.00 0.00&quot;,&quot;25.50 0.00&quot;,&quot;CEMP IS&quot;,&quot;29.46 0.00&quot;,26.51,0,0,&quot;136 136&quot;,&quot;YELLOW 10B&quot;,&quot;1.50 0.00&quot;,&quot;43082A&quot;,&quot;02APR01&quot;,&quot;&quot;
1,1
1,1
1,1
1,1
1,1
1,1
1,1
1,1
1,1
1,1
1,1
1,1
1,1
1,1
1,1
1,1
1,1
1,1
&quot;5TBBT4813YS097071&quot;,440218,&quot;02APR01&quot;,7818,184,13.80,2.50,&quot;W93&quot;,0.49,0.34,&quot;&quot;,&quot;&quot;,416,74399,0.20,17690,&quot;30MAR01&quot;,&quot;&quot;

Not sure if this is as expected or how to interpret results. Again, huge thanks.
 
Are you sure the sub fields are seperated by tabs? It looks like they are not because the split is not doing anything.

run

od -c input-file

to check. Change the &quot;\t&quot; in the split to whatever the seperator is CaKiwi

&quot;I love mankind, it's people I can't stand&quot; - Linus Van Pelt
 
Will be out of office for a while; but, will verify upon return- suspect that's exactly the prob- may be multiple spaces and not a tab-

All help Greatly Appreciated!
 

If the sub fields are separated by spaces change the &quot;\t&quot; to &quot; &quot;.

I also noticed that my script prints out a zero for a null field. Add an if statement as follows:

if (a[j,0] > 0) {
if (qt[j] && a[j,k] !~ /^&quot;/) printf qt[j]
printf a[j,k]
if (qt[j] && a[j,k] !~ /&quot;$/) printf qt[j]
} CaKiwi

&quot;I love mankind, it's people I can't stand&quot; - Linus Van Pelt
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top