×
INTELLIGENT WORK FORUMS
FOR COMPUTER PROFESSIONALS

Contact US

Log In

Come Join Us!

Are you a
Computer / IT professional?
Join Tek-Tips Forums!
  • Talk With Other Members
  • Be Notified Of Responses
    To Your Posts
  • Keyword Search
  • One-Click Access To Your
    Favorite Forums
  • Automated Signatures
    On Your Posts
  • Best Of All, It's Free!

*Tek-Tips's functionality depends on members receiving e-mail. By joining you are opting in to receive e-mail.

Posting Guidelines

Promoting, selling, recruiting, coursework and thesis posting is forbidden.

Students Click Here

awk parsing is getting slower and stalled :(

awk parsing is getting slower and stalled :(

awk parsing is getting slower and stalled :(

(OP)
hello, I just wrote my second script with awk and it's getting slower and slower :(

It parses requests from a tomcat server log which contains 448478 ones (on a 246MB file) :

CODE --> shell

...10000 / 448478 (83 secs)
...20000 / 448478 (91 secs)
...30000 / 448478 (90 secs)
...40000 / 448478 (87 secs)
...50000 / 448478 (86 secs)
...60000 / 448478 (87 secs)
...70000 / 448478 (88 secs)
...80000 / 448478 (90 secs)
...90000 / 448478 (94 secs)
...100000 / 448478 (98 secs)
...110000 / 448478 (94 secs)
...120000 / 448478 (119 secs)
...130000 / 448478 (134 secs)
...140000 / 448478 (153 secs)
...150000 / 448478 (188 secs)
...160000 / 448478 (211 secs)
...170000 / 448478 (226 secs)
...180000 / 448478 (240 secs)
...190000 / 448478 (260 secs)
...200000 / 448478 (253 secs)
...210000 / 448478 (259 secs) 


Here is the awk script :


CODE --> awk

awk -F'[][]' -v serv="$host" '
			BEGIN { cur="dummy" ; c=0 ; num="%06d" } 
			{  
				# nouveau thread : incrément
				if ( $0 ~ / startstring /) { 
					cur=$4 ; 
					f[cur]++ ;
					c++;
					fn=serv"/"cur"-"sprintf(num,f[cur]) ; 
				# autres lignes
				} else { 
					if (length($4) > 4 ) { 
						cur=$4 ; 
						fn=serv"/"cur"-"sprintf(num,f[cur]) 
					}
					# dernière ligne
					if ( $0 ~ / endstring /) {
						print fn 
					}
				} 
				print $0 > fn  
			}
			END { print "#TotalRequests="c > "/dev/stderr" }' $hlog 

The script collects the log between the start and end strings, then outputs the filename that is pushed to a bash script who does smalls tests and removes the file.
$4 is something like "http-thread-89"
Sometimes it is just stalled... I don't understand why...
the size of f is about 200.
And I don't think it has to do with the bash script since.... it is faster to do this with only bash !


Since I am a beginner, any help would be appreciated.

Cheers !


PS: edit
Same awk script with bash script removed

CODE -->

...10000 / 448478 (9 secs)
...20000 / 448478 (8 secs)
...30000 / 448478 (14 secs)
...40000 / 448478 (22 secs)
...50000 / 448478 (35 secs)
...60000 / 448478 (53 secs)
...70000 / 448478 (59 secs) 


RE: awk parsing is getting slower and stalled :(

(OP)
I think I found why... and now awk takes 3% CPU instead of 100%, I needed to :

- close the output files when they are done
- free variables

Red Flag This Post

Please let us know here why this post is inappropriate. Reasons such as off-topic, duplicates, flames, illegal, vulgar, or students posting their homework.

Red Flag Submitted

Thank you for helping keep Tek-Tips Forums free from inappropriate posts.
The Tek-Tips staff will check this out and take appropriate action.

Reply To This Thread

Posting in the Tek-Tips forums is a member-only feature.

Click Here to join Tek-Tips and talk with other members! Already a Member? Login

Close Box

Join Tek-Tips® Today!

Join your peers on the Internet's largest technical computer professional community.
It's easy to join and it's free.

Here's Why Members Love Tek-Tips Forums:

Register now while it's still free!

Already a member? Close this window and log in.

Join Us             Close