Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Chriss Miller on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Need help with parse efficiency 1

Status
Not open for further replies.

mlibeson

Programmer
Mar 6, 2002
311
US
Can anyone help to increase the speed of my code snippet. I use the code to parse a web log that is formatted in any variation of the Common Log Format (CLF). It breaks every record into fields that I will use for analysis later.

Code:
               $main::tlp = $main::readline;
               $main::currentField = 0;
               @main::line = ();
               $main::spaceTest   = 0;
               $main::quoteTest   = 0;
               $main::squoteTest  = 0;
               $main::bracketTest = 0;
               $main::parenTest   = 0;

               while (length($main::tlp) > 0) {
                    $main::c = substr($main::tlp,0,1);
                    $main::tlp = substr($main::tlp,1);
                    if ($main::spaceTest > 0) {
                         if ($main::c =~ /^\s/) {
                              ### End of field.
                              $main::spaceTest = 0;
                              $main::tlp =~ s/^\s+//;
                              $main::currentField++;
                         } elsif ($main::c eq "\\") {
                              ### Next character does not terminate field if ' '.
                              if ($main::tlp =~ /^\s/) {
                                   $main::line[$main::currentField] .= $main::c . ' ';
                              } else {
                                   $main::line[$main::currentField] .= $main::c;
                              }
                         } else {
                              ### Still in current field.
                              $main::line[$main::currentField] .= $main::c;
                         }
                    } elsif ($main::quoteTest > 0) {
                         if (($main::c eq '"') && ($main::tlp =~ /^(\s+|$)/)) {
                              ### End of field.
                              $main::line[$main::currentField] .= '"';
                              $main::quoteTest = 0;
                              $main::tlp =~ s/^\s+//;
                              $main::currentField++;
                         } elsif ($main::c eq "\\") {
                              ### Next character does not terminate field if '"'.
                              if ($main::tlp =~ /^\"/) {
                                   $main::line[$main::currentField] .= $main::c . '"';
                                   $main::tlp =~ s/^\"//;
                              } else {
                                   $main::line[$main::currentField] .= $main::c;
                              }
                         } else {
                              ### Still in current field.
                              $main::line[$main::currentField] .= $main::c;
                         }
                    } elsif ($main::squoteTest > 0) {
                         if (($main::c eq '\'') && ($main::tlp =~ /^(\s+|$)/)) {
                              ### End of field.
                              $main::line[$main::currentField] .= '\'';
                              $main::squoteTest = 0;
                              $main::tlp =~ s/^\s+//;
                              $main::currentField++;
                         } elsif ($main::c eq "\\") {
                              ### Next character does not terminate field if '\''.
                              if ($main::tlp =~ /^\'/) {
                                   $main::line[$main::currentField] .= $main::c . '\'';
                                   $main::tlp =~ s/^\'//;
                              } else {
                                   $main::line[$main::currentField] .= $main::c;
                              }
                         } else {
                              ### Still in current field.
                              $main::line[$main::currentField] .= $main::c;
                         }
                    } elsif ($main::bracketTest > 0 ) {
                         if ($main::c eq ']') {
                              ### End of field.
                              $main::line[$main::currentField] .= ']';
                              $main::bracketTest = 0;
                              $main::tlp =~ s/^\s+//;
                              $main::currentField++;
                         } elsif ($main::c eq "\\") {
                              ### Next character does not terminate field if ']'.
                              if ($main::tlp =~ /^\]/) {
                                   $main::line[$main::currentField] .= $main::c . ']';
                                   $main::tlp =~ s/^\]//;
                              } else {
                                   $main::line[$main::currentField] .= $main::c;
                              }
                         } else {
                              ### Still in current field.
                              $main::line[$main::currentField] .= $main::c;
                         }
                    } elsif ($main::parenTest > 0 ) {
                         if ($main::c eq '(') {
                              ### Next ')' character does not terminate field if '('.
                              $main::line[$main::currentField] .= '(';
                              $main::parenTest++;
                         } elsif ($main::c eq ')') {
                              ### Check End of field.
                              if ($main::parenTest > 1) {
                                   $main::line[$main::currentField] .= ')';
                                   $main::parenTest--;
                              } else {
                                   $main::line[$main::currentField] .= ')';
                                   $main::parenTest = 0;
                                   $main::tlp =~ s/^\s+//;
                                   $main::currentField++;
                              }
                         } elsif ($main::c eq "\\") {
                              ### Next character does not terminate field if ')'.
                              if ($main::tlp =~ /^\)/) {
                                   $main::line[$main::currentField] .= $main::c . ')';
                                   $main::tlp =~ s/^\)//;
                              } else {
                                   $main::line[$main::currentField] .= $main::c;
                              }
                         } else {
                              ### Still in current field.
                              $main::line[$main::currentField] .= $main::c;
                         }
                    } else {
                         if ($main::c =~ /^\s$/) {
                              if ($main::tlp !~ /^\s*[\"\'\[\(]/) { $main::spaceTest++; }
                              $main::tlp =~ s/^\s+//;
                              $main::currentField++;
                         } elsif (($main::c eq '"') && ($main::spaceTest == 0)) {
                              $main::quoteTest++;
                              $main::line[$main::currentField] = '"';
                              $main::tlp =~ s/^\s+//;
                         } elsif (($main::c eq '\'') && ($main::spaceTest == 0)) {
                              $main::squoteTest++;
                              $main::line[$main::currentField] = '\'';
                              $main::tlp =~ s/^\s+//;
                         } elsif (($main::c eq '[') && ($main::spaceTest == 0)) {
                              $main::bracketTest++;
                              $main::line[$main::currentField] = '[';
                              $main::tlp =~ s/^\s+//;
                         } elsif (($main::c eq '(') && ($main::spaceTest == 0)) {
                              $main::parenTest++;
                              $main::line[$main::currentField] = '(';
                              $main::tlp =~ s/^\s+//;
                         } else {
                              $main::spaceTest++;
                              $main::line[$main::currentField] .= $main::c;
                         }
                    }
               }

Your help is appreciated.

Thank you.


Michael Libeson
 
Here is the latest version:

Code:
               $main::currentField = 0;
               @main::line = ();
               $main::spaceTest   = 0;
               $main::quoteTest   = 0;
               $main::squoteTest  = 0;
               $main::bracketTest = 0;
               $main::parenTest   = 0;

               @main::linec = grep(! /^$/, (split(/([\s\'\"\[\]\(\)\\])/, $main::readline)));

               while (defined(@main::linec) && join('',@main::linec) ne '') {
                    $main::c = shift(@main::linec);
                    if ($main::spaceTest > 0) {
                         if ($main::c =~ /^\s/) {
                              ### End of field.
                              $main::spaceTest = 0;
                              while ($main::linec[0] =~ /^\s$/) { shift(@main::linec); }
                              $main::currentField++;
                         } elsif ($main::c eq "\\") {
                              ### Next character does not terminate field.
                              $main::line[$main::currentField] .= $main::c . shift(@main::linec);
                         } else {
                              ### Still in current field.
                              $main::line[$main::currentField] .= $main::c;
                         }
                    } elsif ($main::quoteTest > 0) {
                         if (($main::c eq '"') && ((! defined(@main::linec)) || ($main::linec[0] =~ /^(\s|$)/))) {
                              ### End of field.
                              $main::line[$main::currentField] .= '"';
                              $main::quoteTest = 0;
                              while ($main::linec[0] =~ /^\s$/) { shift(@main::linec); }
                              $main::currentField++;
                         } elsif ($main::c eq "\\") {
                              ### Next character does not terminate field.
                              $main::line[$main::currentField] .= $main::c . shift(@main::linec);
                         } else {
                              ### Still in current field.
                              $main::line[$main::currentField] .= $main::c;
                         }
                    } elsif ($main::squoteTest > 0) {
                         if (($main::c eq '\'') && ((! defined(@main::linec)) || ($main::linec[0] =~ /^(\s|$)/))) {
                              ### End of field.
                              $main::line[$main::currentField] .= '\'';
                              $main::squoteTest = 0;
                              while ($main::linec[0] =~ /^\s$/) { shift(@main::linec); }
                              $main::currentField++;
                         } elsif ($main::c eq "\\") {
                              ### Next character does not terminate field.
                              $main::line[$main::currentField] .= $main::c . shift(@main::linec);
                         } else {
                              ### Still in current field.
                              $main::line[$main::currentField] .= $main::c;
                         }
                    } elsif ($main::bracketTest > 0 ) {
                         if ($main::c eq ']') {
                              ### End of field.
                              $main::line[$main::currentField] .= ']';
                              $main::bracketTest = 0;
                              while ($main::linec[0] =~ /^\s$/) { shift(@main::linec); }
                              $main::currentField++;
                         } elsif ($main::c eq "\\") {
                              ### Next character does not terminate field.
                              $main::line[$main::currentField] .= $main::c . shift(@main::linec);
                         } else {
                              ### Still in current field.
                              $main::line[$main::currentField] .= $main::c;
                         }
                    } elsif ($main::parenTest > 0 ) {
                         if ($main::c eq '(') {
                              ### Next ')' character does not terminate field if '('.
                              $main::line[$main::currentField] .= '(';
                              $main::parenTest++;
                         } elsif ($main::c eq ')') {
                              ### Check End of field.
                              if ($main::parenTest > 1) {
                                   $main::line[$main::currentField] .= ')';
                                   $main::parenTest--;
                              } else {
                                   $main::line[$main::currentField] .= ')';
                                   $main::parenTest = 0;
                                   while ($main::linec[0] =~ /^\s$/) { shift(@main::linec); }
                                   $main::currentField++;
                              }
                         } elsif ($main::c eq "\\") {
                              ### Next character does not terminate field if ')'.
                              $main::line[$main::currentField] .= $main::c . shift(@main::linec);
                         } else {
                              ### Still in current field.
                              $main::line[$main::currentField] .= $main::c;
                         }
                    } else {
                         if ($main::c =~ /^\s$/) {
                              if ($main::line[0] !~ /^\s*[\"\'\[\(]/) { $main::spaceTest++; }
                              while ($main::linec[0] =~ /^\s$/) { shift(@main::linec); }
                              $main::currentField++;
                         } elsif (($main::c eq '"') && ($main::spaceTest == 0)) {
                              $main::quoteTest++;
                              $main::line[$main::currentField] = '"';
                              while ($main::linec[0] =~ /^\s$/) { shift(@main::linec); }
                         } elsif (($main::c eq '\'') && ($main::spaceTest == 0)) {
                              $main::squoteTest++;
                              $main::line[$main::currentField] = '\'';
                              while ($main::linec[0] =~ /^\s$/) { shift(@main::linec); }
                         } elsif (($main::c eq '[') && ($main::spaceTest == 0)) {
                              $main::bracketTest++;
                              $main::line[$main::currentField] = '[';
                              while ($main::linec[0] =~ /^\s$/) { shift(@main::linec); }
                         } elsif (($main::c eq '(') && ($main::spaceTest == 0)) {
                              $main::parenTest++;
                              $main::line[$main::currentField] = '(';
                              while ($main::linec[0] =~ /^\s$/) { shift(@main::linec); }
                         } else {
                              $main::spaceTest++;
                              $main::line[$main::currentField] .= $main::c;
                         }
                    }


Michael Libeson
 
Hi Michael,

Looks good, have you gotten much speed improvement?

Mike

You cannot really appreciate Dilbert unless you've read it in the
original Klingon.

Want great answers to your Tek-Tips questions? Have a look at faq219-2884

 
I have not benchmarked it, but it does seem to run faster. At this point I am going to look into other parts of my code to see where I can improve run time. I already improved the loading into a database by getting the prepare/execute to work properly for a stored procedure. I just wish I could get the fork to work properly. I get a restartop error when I try to fork the process that handles and runs the prepare/execute. I have been able to get the fork to work for other code, just not this one. I am not sure if it is an issue with TK or DBI.

Thanks all for your help. It has been greatly appreciated.

Michael Libeson
 
It's TK. I've tried to use fork() with TK and it's a lot like pulling your own teeth out with pliers whilst stabbing your eyes out with rusty nails - but without that warm comfy feeling you get when you ram nails in your eyes. I wouldn't bother, in other words.

I solved my problem by creating another script and passing it the data it needed on the end of a pipe.


Mike

You cannot really appreciate Dilbert unless you've read it in the
original Klingon.

Want great answers to your Tek-Tips questions? Have a look at faq219-2884

 
MikeLacey,

thanks for pinpointing the problem. I will look into your solution, but I am not sure it will fit my scenario. Thanks for the help. Your descriptive comment was spot on.


Michael Libeson
 
:)

fork() with DBI works ok by the way.

Mike

You cannot really appreciate Dilbert unless you've read it in the
original Klingon.

Want great answers to your Tek-Tips questions? Have a look at faq219-2884

 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top