×
INTELLIGENT WORK FORUMS
FOR COMPUTER PROFESSIONALS

Contact US

Log In

Come Join Us!

Are you a
Computer / IT professional?
Join Tek-Tips Forums!
  • Talk With Other Members
  • Be Notified Of Responses
    To Your Posts
  • Keyword Search
  • One-Click Access To Your
    Favorite Forums
  • Automated Signatures
    On Your Posts
  • Best Of All, It's Free!

*Tek-Tips's functionality depends on members receiving e-mail. By joining you are opting in to receive e-mail.

Posting Guidelines

Promoting, selling, recruiting, coursework and thesis posting is forbidden.

Students Click Here

Efficient File Parsing

Efficient File Parsing

Efficient File Parsing

(OP)
I have a Perl script that need to do the following-

For all files in a known directory
  Do some validation
     If Validation succeeds Then
        Concatenate the content of line 2-last line from
        this file and all other file into a String
      end if
  while
end for all files

now print that string into a file

What I did for grabbing line 2-n is something like the following:
$theFiles= "";  #the string variable storing the
                #concatenated files
forAllFiles
{
  open(ONEOFTHEFILE,$fileName);
   #If vadliation succeeds
    {
      @aFile = ONEOFTHEFILE;
      splice @aFile,0,1;
      foreach $eachLine(@aFile)
      {
           $theFiles .= $eachLine ."\n";
      }
  }
}
#open another file and print $theFiles to a new file.

The important thing is that I don't want to alter the original file, but it seems like the splicing is not very efficient.  Processing about 20files totalling 7MB takes 19 minutes!

Any more efficient methods of doing what I need to do?
It would be nice if I could for instance transfer $aFile[2-N] to String without using the $forEach, etc etc.??  Possilbe or not?

RE: Efficient File Parsing

  NOTICE:  in your post, the text "$#64;" is really an "@".  the TGML parser seems to mess up when "@"'s are inside of "code" brackets, and is seemingly trying to 'escape' it one too many times before it gets to html ("$#64;" turns into an "@" when left raw in html).  I use the "tt" bracket delimiters, and don't seem to get this problem.

 Now, as to the question...  the issue you want addressed is speed.  

$theFiles .= join("", <ONEOFTHEFILES>);

is listed in the documents for perl 5.6 (somewhere concerning porting) as the best way to put a file into a string.  this reads it in all at once, but then doesn't assign it to a namespace, just processes it into a string.
 the only thing you have to do first is read in the first lines of the file, and do nothing with them.  so:

scalar <ONEOFTHEFILES>;   #in scalar context
scalar <ONEOFTHEFILES>;   #returns only one line
$theFiles .= join("", <ONEOFTHEFILES>);

 One last thing: although this may seem faster logically, i'm not certain actually will be.  speed problems can usually only addressed with side-by-side benchmarks, or other speed testing functions.  if you can isolate the potential bottlenecks in your code, then put them each in their own separate benchmark to find out how efficient they are, you can then be certain of where the hangup is.  It may be that the function is slow due to something else entirely.

"If you think you're too small to make a difference, try spending a night in a closed tent with a mosquito."

RE: Efficient File Parsing

From the Perl documentation, I've never used it but it looks good, I may well do:


File::Slurp -- single call read & write file routines; read directories


--------------------------------------------------------------------------------

SUPPORTED PLATFORMS
Linux
Solaris
Windows
This module is not included with the standard ActivePerl distribution. It is available as a separate download using PPM.
--------------------------------------------------------------------------------

SYNOPSIS
        use File::Slurp;
        $all_of_it = read_file($filename);
        @all_lines = read_file($filename);
        write_file($filename, @contents)
        overwrite_file($filename, @new_contnts);
        append_file($filename, @additional_contents);
        @files = read_dir($directory);


--------------------------------------------------------------------------------

DESCRIPTION
These are quickie routines that are meant to save a couple of lines of code over and over again. They do not do anything fancy.


read_file() does what you would expect.  If you are using its output
in array context, then it returns an array of lines.  If you are calling
it from scalar context, then returns the entire file in a single string.
It croaks()s if it can't open the file.

write_file() creates or overwrites files.

append_file() appends to a file.

overwrite_file() does an in-place update of an existing file or creates a new file if it didn't already exist. Write_file will also replace a file. The difference is that the first that that write_file() does is to trucate the file whereas the last thing that overwrite_file() is to trucate the file. Overwrite_file() should be used in situations where you have a file that always needs to have contents, even in the middle of an update.

read_dir() returns all of the entries in a directory except for ``.' and ``..'. It croaks if it cannot open the directory.



--------------------------------------------------------------------------------

AUTHOR
David Muir Sharnoff <muir@idiom.com>

 File::Slurp -- single call read & write file routines; read directories

Mike
michael.j.lacey@ntlworld.com
Email welcome if you're in a hurry or something -- but post in tek-tips as well please, and I will post my reply here as well.

RE: Efficient File Parsing

(OP)
Chers for suggesting the join() operation
Should have looked on the Efficiency Section of Programming Perl!

Anyway, I still have to do the following-
    splice @lines,0,1;
    
    $allIFiles .= join("",@lines);

As I couldn't get the scalar working??  It end up after all that splicing isn't that slow... It is now taking onlyw 45 seconds instead of 18minutes!!
hurray

Red Flag This Post

Please let us know here why this post is inappropriate. Reasons such as off-topic, duplicates, flames, illegal, vulgar, or students posting their homework.

Red Flag Submitted

Thank you for helping keep Tek-Tips Forums free from inappropriate posts.
The Tek-Tips staff will check this out and take appropriate action.

Reply To This Thread

Posting in the Tek-Tips forums is a member-only feature.

Click Here to join Tek-Tips and talk with other members! Already a Member? Login


Close Box

Join Tek-Tips® Today!

Join your peers on the Internet's largest technical computer professional community.
It's easy to join and it's free.

Here's Why Members Love Tek-Tips Forums:

Register now while it's still free!

Already a member? Close this window and log in.

Join Us             Close