awk parsing is getting slower and stalled :(
awk parsing is getting slower and stalled :(
(OP)
hello, I just wrote my second script with awk and it's getting slower and slower :(
It parses requests from a tomcat server log which contains 448478 ones (on a 246MB file) :
Here is the awk script :
The script collects the log between the start and end strings, then outputs the filename that is pushed to a bash script who does smalls tests and removes the file.
$4 is something like "http-thread-89"
Sometimes it is just stalled... I don't understand why...
the size of f is about 200.
And I don't think it has to do with the bash script since.... it is faster to do this with only bash !
Since I am a beginner, any help would be appreciated.
Cheers !
PS: edit
Same awk script with bash script removed
It parses requests from a tomcat server log which contains 448478 ones (on a 246MB file) :
CODE --> shell
...10000 / 448478 (83 secs) ...20000 / 448478 (91 secs) ...30000 / 448478 (90 secs) ...40000 / 448478 (87 secs) ...50000 / 448478 (86 secs) ...60000 / 448478 (87 secs) ...70000 / 448478 (88 secs) ...80000 / 448478 (90 secs) ...90000 / 448478 (94 secs) ...100000 / 448478 (98 secs) ...110000 / 448478 (94 secs) ...120000 / 448478 (119 secs) ...130000 / 448478 (134 secs) ...140000 / 448478 (153 secs) ...150000 / 448478 (188 secs) ...160000 / 448478 (211 secs) ...170000 / 448478 (226 secs) ...180000 / 448478 (240 secs) ...190000 / 448478 (260 secs) ...200000 / 448478 (253 secs) ...210000 / 448478 (259 secs)
Here is the awk script :
CODE --> awk
awk -F'[][]' -v serv="$host" ' BEGIN { cur="dummy" ; c=0 ; num="%06d" } { # nouveau thread : incrément if ( $0 ~ / startstring /) { cur=$4 ; f[cur]++ ; c++; fn=serv"/"cur"-"sprintf(num,f[cur]) ; # autres lignes } else { if (length($4) > 4 ) { cur=$4 ; fn=serv"/"cur"-"sprintf(num,f[cur]) } # dernière ligne if ( $0 ~ / endstring /) { print fn } } print $0 > fn } END { print "#TotalRequests="c > "/dev/stderr" }' $hlog
The script collects the log between the start and end strings, then outputs the filename that is pushed to a bash script who does smalls tests and removes the file.
$4 is something like "http-thread-89"
Sometimes it is just stalled... I don't understand why...
the size of f is about 200.
And I don't think it has to do with the bash script since.... it is faster to do this with only bash !
Since I am a beginner, any help would be appreciated.
Cheers !
PS: edit
Same awk script with bash script removed
CODE -->
...10000 / 448478 (9 secs) ...20000 / 448478 (8 secs) ...30000 / 448478 (14 secs) ...40000 / 448478 (22 secs) ...50000 / 448478 (35 secs) ...60000 / 448478 (53 secs) ...70000 / 448478 (59 secs)
RE: awk parsing is getting slower and stalled :(
- close the output files when they are done
- free variables