Memory efficient way to manage archive files

Kirsle · Nov 13, 2006

Hey:

I've come up with this algorithm for archiving binary (or ascii) files together into one giant file. How it works, is: it first creates this scalar which has, on each line, the file's name and then its internal binary data:

Code:

index.html::some html code
README.txt::some text information
logo.gif::binary gif data
button.gif::binary gif data
demo.avi::a large avi file in binary

And once it's generated this scalar full of file data, it RC4 encodes it with a user-provided password key, and this key is required to decode it properly later. So it ends up saving to the file a big nasty mess of binary data and nothing is readible (not even the plain text files).... the combination of every file is encoded.

This system works fine assuming all the files in the archive combined are able to fit in memory. When it reads a file in, it opens the filehandle and takes all the encrypted garbage into memory in a scalar, decodes the scalar with the password, and cycles through the lines splitting out file data and storing them in a hashref, in memory.

Then when the program wants to read from a file, it's readily accessible.

But, like I said, this system fails miserably when you try to archive one or more very large files together, because the file data has to be kept in memory and it drains a lot of memory to do this.

I was thinking of maybe revising the algorithm a little bit, to where the first line in the (unencoded) file would be an index that would list the file names and what lines they could be found on, i.e.

Code:

0=index.html:1=README.txt;2=logo.gif;3=button.gif;4=demo.avi
some html code
some text information
binary gif data
binary gif data
a large avi file in binary

So that when it reads in the file:

Code:

open (ARCHIVE, "archive.bin");

my $index = <ARCHIVE>; # read off the first line

# manipulate $index to find out which line everything is on

...

sub extract {
   my $file = shift;

   # Find out which line this file was on...
   return unless exists $files->{$file};
   my $line = $files->{$file};

   # say we wanted demo.avi (on line 4)
   my $avi_filehandle = <ARCHIVE>[4]; # ???

   # save its filehandle to a file
   open (SAVE, ">$file");
   while (<$avi_filehandle>) {
      print SAVE;
   }
   close (SAVE);
}

So that, if I have a folder with this in it:

Code:

extract.pl
archive.bin

And extract.pl has this:

Code:

use My::Archive;

my $obj = new My::Archive;
$obj->load ("archive.bin");

# extract the AVI
$obj->extract ("demo.avi");

And then the file will have these contents:

Code:

extract.pl
archive.bin
demo.avi

Do you get what I'm talking about? How would I open a file handle for a file (a very large file which would take up lots of memory to be slurped in), and then be able to make new filehandles from parts of the filehandle (in this case, line 4 of the filehandle should be made into its own virtual filehandle that can be read as if it was opened from a real file that existed on the hard drive).

-------------
Kirsle.net | Kirsle's Programs and Projects

MillerH · Nov 13, 2006

I understand the draw of this project from a curiosity point of view. But aren't you just reinventing the wheel here? What was wrong with using CPAN modules for this problem?

http://search.cpan.org/search?query=Archive::Zip

Kirsle · Nov 13, 2006

The idea is to have a kind of proprietary archiving algorithm to for example create a series of games that store all their resources in these archives (think Blizzard's MPQ or Maxis's FAR), except for they would be that much harder to decipher and make a "MPQ Viewer" or "FAR Edit" type of program since, as part of the algorithm, each package has a password which is the entire basis of the encryption, and each software program would then have its own password that only I would know.

-------------
Kirsle.net | Kirsle's Programs and Projects

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Memory efficient way to manage archive files

Kirsle

Programmer

MillerH

Programmer

Kirsle

Programmer

Similar threads

Part and Inventory Search

Sponsor