Hey:
I've come up with this algorithm for archiving binary (or ascii) files together into one giant file. How it works, is: it first creates this scalar which has, on each line, the file's name and then its internal binary data:
And once it's generated this scalar full of file data, it RC4 encodes it with a user-provided password key, and this key is required to decode it properly later. So it ends up saving to the file a big nasty mess of binary data and nothing is readible (not even the plain text files).... the combination of every file is encoded.
This system works fine assuming all the files in the archive combined are able to fit in memory. When it reads a file in, it opens the filehandle and takes all the encrypted garbage into memory in a scalar, decodes the scalar with the password, and cycles through the lines splitting out file data and storing them in a hashref, in memory.
Then when the program wants to read from a file, it's readily accessible.
But, like I said, this system fails miserably when you try to archive one or more very large files together, because the file data has to be kept in memory and it drains a lot of memory to do this.
I was thinking of maybe revising the algorithm a little bit, to where the first line in the (unencoded) file would be an index that would list the file names and what lines they could be found on, i.e.
So that when it reads in the file:
So that, if I have a folder with this in it:
And extract.pl has this:
And then the file will have these contents:
Do you get what I'm talking about? How would I open a file handle for a file (a very large file which would take up lots of memory to be slurped in), and then be able to make new filehandles from parts of the filehandle (in this case, line 4 of the filehandle should be made into its own virtual filehandle that can be read as if it was opened from a real file that existed on the hard drive).
-------------
Kirsle.net | Kirsle's Programs and Projects
I've come up with this algorithm for archiving binary (or ascii) files together into one giant file. How it works, is: it first creates this scalar which has, on each line, the file's name and then its internal binary data:
Code:
index.html::some html code
README.txt::some text information
logo.gif::binary gif data
button.gif::binary gif data
demo.avi::a large avi file in binary
And once it's generated this scalar full of file data, it RC4 encodes it with a user-provided password key, and this key is required to decode it properly later. So it ends up saving to the file a big nasty mess of binary data and nothing is readible (not even the plain text files).... the combination of every file is encoded.
This system works fine assuming all the files in the archive combined are able to fit in memory. When it reads a file in, it opens the filehandle and takes all the encrypted garbage into memory in a scalar, decodes the scalar with the password, and cycles through the lines splitting out file data and storing them in a hashref, in memory.
Then when the program wants to read from a file, it's readily accessible.
But, like I said, this system fails miserably when you try to archive one or more very large files together, because the file data has to be kept in memory and it drains a lot of memory to do this.
I was thinking of maybe revising the algorithm a little bit, to where the first line in the (unencoded) file would be an index that would list the file names and what lines they could be found on, i.e.
Code:
0=index.html:1=README.txt;2=logo.gif;3=button.gif;4=demo.avi
some html code
some text information
binary gif data
binary gif data
a large avi file in binary
So that when it reads in the file:
Code:
open (ARCHIVE, "archive.bin");
my $index = <ARCHIVE>; # read off the first line
# manipulate $index to find out which line everything is on
...
sub extract {
my $file = shift;
# Find out which line this file was on...
return unless exists $files->{$file};
my $line = $files->{$file};
# say we wanted demo.avi (on line 4)
my $avi_filehandle = <ARCHIVE>[4]; # ???
# save its filehandle to a file
open (SAVE, ">$file");
while (<$avi_filehandle>) {
print SAVE;
}
close (SAVE);
}
So that, if I have a folder with this in it:
Code:
extract.pl
archive.bin
And extract.pl has this:
Code:
use My::Archive;
my $obj = new My::Archive;
$obj->load ("archive.bin");
# extract the AVI
$obj->extract ("demo.avi");
And then the file will have these contents:
Code:
extract.pl
archive.bin
demo.avi
Do you get what I'm talking about? How would I open a file handle for a file (a very large file which would take up lots of memory to be slurped in), and then be able to make new filehandles from parts of the filehandle (in this case, line 4 of the filehandle should be made into its own virtual filehandle that can be read as if it was opened from a real file that existed on the hard drive).
-------------
Kirsle.net | Kirsle's Programs and Projects