Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations TouchToneTommy on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

awk script to average values from multiple files 1

Status
Not open for further replies.

pdtak1

Technical User
Mar 29, 2016
6
US
Hello awk experts, I need help with writing an awk script for the following task:

I have a 10 files full of values (8 fixed size columns), and I want to average out the values from each field from all of the input files.

Here are the values from 1 of 10 files (other 9 files have same fields with different values:

Input file1:
1 99.7% 0.0% 90.9% 8.0GB 7.7GB 9.000GB 5.812GB
2 94.8% 23.0% 94.6% 8.0GB 7.6GB 2.000GB 10.781GB
3 93.9% 36.0% 110.3% 5.0GB 6.1GB 4.000GB 3.812GB
4 99.2% 8.0% 116.3% 2.0GB 2.3GB 5.000GB 5.312GB
5 97.5% 12.0% 86.9% 3.0GB 2.6GB 3.000GB 9.562GB
6 95.4% 4.0% 49.4% 12.0GB 5.9GB 8.000GB 10.625GB
7 99.6% 44.0% 142.6% 8.0GB 11.4GB 7.875GB 5.562GB
8 95.4% 35.0% 131.1% 5.0GB 7.2GB 6.188GB 9.375GB
9 98.2% 14.0% 104.6% 6.0GB 6.3GB 6.000GB 12.375GB
10 93.3% 3.0% 79.8% 8.0GB 6.4GB 1.000GB 220.500GB

Input file2:
1 98.5% 14.0% 104.6% 6.0GB 6.3GB 6.000GB 12.500GB
2 93.5% 56.0% 121.0% 10.0GB 12.1GB 5.500GB 9.500GB
3 98.8% 52.0% 126.8% 12.0GB 15.2GB 7.312GB 11.375GB
4 98.7% 42.0% 129.4% 8.0GB 10.4GB 6.250GB 10.500GB
5 95.3% 12.0% 89.7% 10.0GB 9.0GB 5.500GB 11.250GB
6 97.5% 19.0% 107.8% 2.0GB 2.7GB 3.000GB 4.438GB
7 99.4% 17.0% 109.4% 13.0GB 14.2GB 10.000GB 5.500GB
8 94.5% 27.0% 96.7% 5.0GB 4.8GB 3.000GB 11.375GB
9 99.7% 26.0% 123.0% 12.0GB 14.8GB 12.000GB 5.875GB
10 99.0% 58.0% 162.2% 8.0GB 13.0GB 8.375GB 10.938GB

Input file3:
...
...

Input file10:
1 98.4% 14.0% 104.5% 6.0GB 6.3GB 6.000GB 12.500GB
2 93.2% 46.0% 109.8% 10.0GB 11.0GB 5.500GB 9.500GB
3 99.3% 49.0% 126.1% 12.0GB 15.1GB 7.312GB 11.375GB
4 98.8% 43.0% 128.9% 8.0GB 10.3GB 6.250GB 10.500GB
5 95.3% 12.0% 89.7% 10.0GB 9.0GB 5.500GB 11.250GB
6 97.4% 18.0% 106.2% 2.0GB 2.7GB 3.000GB 4.438GB
7 98.6% 16.0% 108.5% 13.0GB 14.1GB 10.000GB 5.500GB
8 94.7% 27.0% 93.0% 5.0GB 4.6GB 3.000GB 11.375GB
9 99.7% 24.0% 120.1% 12.0GB 14.4GB 12.000GB 5.875GB
10 99.6% 60.0% 166.3% 8.0GB 13.3GB 8.375GB 10.938GB


---------------------------------------------------
output:
1 average(99.7% + 98.5% +... + ... +98.4%) average(0.0% + 14% +... + ... 14.0%) average(90.9% + 104.6% +... + ...+ 104.5%) average(8.0GB + 6.0GB +... + ...+ 6.0GB) average 7.7GB + 6.3GB +... + ...+ 6.3GB) average(9.000GB + 6.000GB +... + ... +6.000GB) average(5.812GB + 12.5000GB +... + ... 12.5000GB)
...
...
10 average(99.7% + 98.5% +... + ... + 99.6%) average(0.0% + 14% +... + ... + 60.0%) average(90.9% + 104.6% +... + ...+ 166.3%) average(8.0GB + 6.0GB +... + ...8.0GB) average 7.7GB + 6.3GB +... + ...+ 13.3GB) average(9.000GB + 6.000GB +... + ...+ 8.375GB) average(5.812GB + 12.5000GB +... + ...10.938GB)

-------
I first stripped the non-numeric characters from the files: $ sed s/%//g file1 | sed s/GB//g > file1 (did same for file2 file3 file4 file5 file6 file7 file8 file9 file10)

I then need to average out the same field from each file (ie. [file1, field1], [file2, field1], [file3, field1]... [file10, field1])

Any help is greatly appreciated. Thanks!
 
Compute the column averages using something like this:
Code:
[COLOR=#0000ff]# run:[/color]
[COLOR=#0000ff]# awk -f pdtak1.awk pdtak1_01.txt pdtak1_02.txt .. pdtak1_10.txt[/color]
[COLOR=#6a5acd]BEGIN[/color] {
   [COLOR=#0000ff]# initialize sums and averages[/color]
   [COLOR=#a52a2a][b]for[/b][/color] (i=[COLOR=#ff00ff]1[/color][COLOR=#6a5acd];[/color] i<=[COLOR=#ff00ff]8[/color][COLOR=#6a5acd];[/color] i++) { 
     sum[[COLOR=#6a5acd]i[/color]] = [COLOR=#ff00ff]0[/color]
     avg[[COLOR=#6a5acd]i[/color]] = [COLOR=#ff00ff]0[/color]
   }
}
{
   [COLOR=#0000ff]# remove unwanted characters[/color]
   [COLOR=#008b8b]gsub[/color]([COLOR=#ff00ff]/%/[/color][COLOR=#6a5acd],[/color] [COLOR=#ff00ff]""[/color][COLOR=#6a5acd],[/color] [COLOR=#6a5acd]$0[/color])
   [COLOR=#008b8b]gsub[/color]([COLOR=#ff00ff]/GB/[/color][COLOR=#6a5acd],[/color] [COLOR=#ff00ff]""[/color][COLOR=#6a5acd],[/color] [COLOR=#6a5acd]$0[/color])
   [COLOR=#0000ff]# sum colum values[/color]
   [COLOR=#a52a2a][b]for[/b][/color] (i=[COLOR=#ff00ff]1[/color][COLOR=#6a5acd];[/color] i<=[COLOR=#ff00ff]8[/color][COLOR=#6a5acd];[/color] i++) { 
     sum[[COLOR=#6a5acd]i[/color]] = sum[[COLOR=#6a5acd]i[/color]] + $i
   }   
}
[COLOR=#6a5acd]END[/color] {
   total_records = [COLOR=#6a5acd]NR[/color]
   [COLOR=#a52a2a][b]print[/b][/color] [COLOR=#ff00ff]"Record processed: "[/color] total_records
   [COLOR=#0000ff]# compute and print averages[/color]
   [COLOR=#a52a2a][b]print[/b][/color] [COLOR=#ff00ff]"Column Averages :"[/color]
   [COLOR=#a52a2a][b]for[/b][/color] (i=[COLOR=#ff00ff]1[/color][COLOR=#6a5acd];[/color] i<=[COLOR=#ff00ff]8[/color][COLOR=#6a5acd];[/color] i++) { 
     avg[[COLOR=#6a5acd]i[/color]] = sum[[COLOR=#6a5acd]i[/color]] / total_records
     [COLOR=#a52a2a][b]printf[/b][/color] [COLOR=#ff00ff]"[/color][COLOR=#6a5acd]%8.4f[/color][COLOR=#ff00ff] "[/color][COLOR=#6a5acd],[/color] avg[[COLOR=#6a5acd]i[/color]]
   }
   [COLOR=#a52a2a][b]printf[/b][/color] [COLOR=#ff00ff]"[/color][COLOR=#6a5acd]\n[/color][COLOR=#ff00ff]"[/color]
}

Applying it on your 3 files we get
Code:
$ awk -f pdtak1.awk pdtak1_01.txt pdtak1_02.txt pdtak1_10.txt
Record processed: 30
Column Averages :
  5.5000  97.2300  27.0333 111.0067   7.9000   8.8933   6.1979  16.0073
 
I then need to average out the same field from each file (ie. [file1, field1], [file2, field1], [file3, field1]... [file10, field1])
What do you mean with field ? The column ?

Then you can run the script extra on every file any you get the averages form this file - e.g.:
Code:
$ awk -f pdtak1.awk pdtak1_01.txt
Record processed: 10
Column Averages :
  5.5000  96.7000  17.9000 100.6500   6.5000   6.3500   5.2063  29.3716 

$ awk -f pdtak1.awk pdtak1_02.txt
Record processed: 10
Column Averages :
  5.5000  97.4900  32.3000 117.0600   8.6000  10.2500   6.6937   9.3251 

$ awk -f pdtak1.awk pdtak1_10.txt
Record processed: 10
Column Averages :
  5.5000  97.5000  30.9000 115.3100   8.6000  10.0800   6.6937   9.3251
 
Thank you Mikrom!
You are awsome!
Yes, I meant columns.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top