INTELLIGENT WORK FORUMS
FOR COMPUTER PROFESSIONALS

Log In

Come Join Us!

Are you a
Computer / IT professional?
Join Tek-Tips Forums!
  • Talk With Other Members
  • Be Notified Of Responses
    To Your Posts
  • Keyword Search
  • One-Click Access To Your
    Favorite Forums
  • Automated Signatures
    On Your Posts
  • Best Of All, It's Free!

*Tek-Tips's functionality depends on members receiving e-mail. By joining you are opting in to receive e-mail.

Posting Guidelines

Promoting, selling, recruiting, coursework and thesis posting is forbidden.

Jobs

Sub-totalling an array

Sub-totalling an array

(OP)

CODE -->

gawk ' BEGIN { sitea=0 ; siteb=0 }
{
if ($1 ~ "192.168.6") { sitea++ } else { siteb++ }

x[sitea","siteb","$7","$9]
}

END {
for (i in x)
print i

}' $FILE


No doubt and probably needless to say, the above code does not give me the required result.
It is reading an apache access log and I am seeking TOTAL http hits for "sitea" and TOTAL http hits for "siteb"

Example report:
SITEA,SITEB,URL,HTTP Response Code,
29,53,/apps/tube/icon_Tube.png,200
60,102,/apps/mix/icon_mix.png,200
389,536,/publish/images/vpl.png,404
etc....

As always thanks in advance,

Madasafish

RE: Sub-totalling an array

(OP)
Hi Feherke,

I am using gawk V4. I believe it was you who introduced me to "patsplit" on another thread.

Madasafish

RE: Sub-totalling an array

(OP)
gawk --version
GNU Awk 4.0.0
Copyright (C) 1989, 1991-2011 Free Software Foundation.

RE: Sub-totalling an array

Hi

Quote (Madasafish)

I am using gawk V4.
Great. Then the real multidimensional array will do the job. I would prefer this way :

CODE

awk -vOFS=, '{x[$1][$7][$9]++}END{for(h in x)for(u in x[h])for(s in x[h][u])print x[h][u][s],h,u,s}' /var/log/httpd/access_log
Which produces count,host,path,status output like this :

CODE

1,192.168.0.1,/mustache/syntax.htm,200
1,192.168.6.1,/mustache/syntax.htm,200
1,192.168.0.1,/mustache/style.css,200
5,192.168.6.1,/mustache/style.css,304
7,192.168.6.1,/mustache/style.css,200

To exactly reproduce your sitea,siteb,path,status output format :

CODE

awk -vOFS=, '{x[$7][$9][$1~/192\.168\.2/?"a":"b"]++}END{for(u in x)for(s in x[u])print x[u][s]["a"]+0,x[u][s]["b"]+0,u,s}' /var/log/httpd/access_log
Which produces this output from the same input data :

CODE

1,1,/mustache/syntax.htm,200
5,0,/mustache/style.css,304
7,1,/mustache/style.css,200

Feherke.
http://feherke.github.com/

RE: Sub-totalling an array

(OP)
Absolutely brilliant!

Thank-you very much Feherke. A payment to the club is long overdue thumbsup2

As always, now the hard work has been done I think I can embellish it, only to find I get stuck again.

As it's a 3852 line CSV file it lends itself to Spreadsheet filters. I am trying to add a couple of filters and cannot understand why it will not work. In your print statement you print "u". If I try to split "u" or try to reference a string in "u" it stops working and do not understand why?.

Here is the working code only if my "embelishments" are commented out.

CODE -->

gawk -vOFS=, '{
x[$7][$9][$1~/192\.168\.2/?"a":"b"]++
}
END {
for(u in x)
for(s in x[u])

#split(u,z,"/")
#if (z[2] ~ /debug/) ft1="Debug"
#if (z[3] ~ /vod/) ft1="Vod"
#if (z[4] ~ /flashapp.xml/) ft1="FlashApp"
#if (u ~ /SSI|Sky|sky/) ft2="Sky"
#if (u ~ /bbc/) ft2="BBC"

print x[u][s]["b"]+x[u][s]["a"]+0,x[u][s]["b"]+0,x[u][s]["a"]+0,u,s,ft1,ft2

}' $FILE

As always, Thanks in advance
Madasafish




RE: Sub-totalling an array

Hi

Quote (Madasafish)

In your print statement you print "u". If I try to split "u" or try to reference a string in "u" it stops working and do not understand why?
Just as you wrote, I print u.

But you are split()ing, doing 5 conditional assignments, then printing. The for statement will execute only the very next one instruction. To make the for execute all those, enclose them in braces ( {} ).

Feherke.
http://feherke.github.com/

RE: Sub-totalling an array

What about this ?

CODE

gawk -vOFS=, '{
x[$7][$9][$1~/192\.168\.2/?"a":"b"]++
}
END {
for(u in x) {
split(u,z,"/")
if (z[2] ~ /debug/) ft1="Debug"
if (z[3] ~ /vod/) ft1="Vod"
if (z[4] ~ /flashapp.xml/) ft1="FlashApp"
if (u ~ /SSI|Sky|sky/) ft2="Sky"
if (u ~ /bbc/) ft2="BBC"
for(s in x[u])
print x[u][s]["b"]+x[u][s]["a"]+0,x[u][s]["b"]+0,x[u][s]["a"]+0,u,s,ft1,ft2
}
}' $FILE

Hope This Helps, PH.
FAQ219-2884: How Do I Get Great Answers To my Tek-Tips Questions?
FAQ181-2886: How can I maximize my chances of getting an answer?

RE: Sub-totalling an array

(OP)
Another embelishment
Sorry sad

CODE -->

gawk -v OFS="," -v hoururl=$HOURURL -v hourdir=$HOURDIR '
{
split($4,b,/:/)
hour=b[2]
min=b[3]
url=$7
gsub(/%20/,"_",url)
split(url,f,"/")
appname=f[3]


if (f[4] ~ /flashapp.xml/) {
url="\"=HYPERLINK(\"\""hoururl"/"appname".csv\"\",\"\""$7"\"\")\""
t[appname][hour][$1~/10\.185\.116/?"c":"d"]++
}

x[url][$9][$1~/192\.168\.2/?"a":"b"]++
}
END {
for(m in t) {
for(n in t[m]) {
n=sprintf("%02d",n) #<---Does not work
print n,t[m][n]["d"]+0,t[m][n]["c"]+0 > hourdir"/"m".csv"
}
}

for(u in x) {
for(s in x[u]) {
print x[u][s]["b"]+x[u][s]["a"]+0,x[u][s]["b"]+0,x[u][s]["a"]+0,u,s,ft1,ft2
}
}

}' ${INFILE} | sort -n -r >> ${OUTFILE}

Ferherke's code is perfect. I have removed the filters mentioned earlier for clarity.
As you can see I introduced another loop which provides hourly hits in seperate files "m". The hours "n" run from 00 to 23.

I have three gotcha's.
For the hours 00 to 09 it prints 0 to 9. (single digits). I would ideally like double digits 00 to 09.
The file/s it creates are not sorted for the hours 00 through to 23. Can this be done within the gawk prog?
I need a header of Hour,Site A,Site B for each file created.

As always, thanks in advance

Madasafish

RE: Sub-totalling an array

(OP)

CODE -->

for(m in t) {
for(n in t[m]) {
print "=\""n"\"",t[m][n]["d"]+0,t[m][n]["c"]+0 > hourdir"/"m
}

For the benefit of other readers,

Quote:

For the hours 00 to 09 it prints 0 to 9. (single digits). I would ideally like double digits 00 to 09.

It was Excell that was truncating the leading zero. I managed to fix this using the above syntax for "n".

Quote:

The file/s it creates are not sorted for the hours 00 through to 23. Can this be done within the gawk prog?
Unfortunately this is way beyond my remit and resolved this with an external bash "for loop" at the end which uses the excellent sort command. I would welcome a gawk solution

Quote:

I need a header of Hour,Site A,Site B for each file created.

Easily accommodated with the external bash "for loop". Again, would be very interested in seeing a gawk solution with the above code.

Cheers,

Madasafish



RE: Sub-totalling an array

1) "hour" is extracted from a string. So it is a string. To consider it as a number (in order to format it), I would have use a command such as int()

[edit] reading again your code, I mentionned you have 2 n. One for the loop and one to store the result of sprintf. This is not good.[edit]

CODE --> awk

printf("%02d",int(n))

2) to sort in awk like outside awk:

CODE --> awk

system("MyPersonalExternalCommand")

Red Flag This Post

Please let us know here why this post is inappropriate. Reasons such as off-topic, duplicates, flames, illegal, vulgar, or students posting their homework.

Red Flag Submitted

Thank you for helping keep Tek-Tips Forums free from inappropriate posts.
The Tek-Tips staff will check this out and take appropriate action.

Reply To This Thread

Posting in the Tek-Tips forums is a member-only feature.

Click Here to join Tek-Tips and talk with other members!

Resources

Close Box

Join Tek-Tips® Today!

Join your peers on the Internet's largest technical computer professional community.
It's easy to join and it's free.

Here's Why Members Love Tek-Tips Forums:

Register now while it's still free!

Already a member? Close this window and log in.

Join Us             Close