Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Chriss Miller on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Process huge file in chunks 2

Status
Not open for further replies.

MoshiachNow

IS-IT--Management
Joined
Feb 6, 2002
Messages
1,851
Location
IL
HI,

If I need to process huge 500MB postscript file which is effectively a one long line (no "\n" breacks at all) .
Is there a way to breack the file analizing into chunks,so that all the computer resources are do not get consumed by this? Possibly closing and reopening it ?
Actualy the processing is analizing the file and searching for some regex match (Creator,No of pages,etc).

Thanks

Long live king Moshiach !
 
Why shouldn't this be possible?
If you know the structure of your file, and you know that it is divided into chunks (or pages or sections), just read one of those at a time (don't personally know the structure of postscript files, but I would expect them to be composed of pages, and possibly a header?).
Otherwise this becomes a little trickier.
If the length of string searched by your regex is fixed throughout, just read a number of characters at least equal to two times that length, then at subsequent reads discard half of the already read string and append to it a number new characters at least equal to that length.
If the length is not fixed (you could look for two matching strings that may be separated by any nunber of characters), then I think there is no other way than assuming a maximum length for the match.


prex1
: Online tools for structural design
: Magnetic brakes for fun rides
: Air bearing pads
 
Thanks,

The file is NOT build in chunks,and I can get a full 500MB with just one long line of text+special charachters.
I normally read the "line" and get rid of special charachters (can be a very long process on a 500MB file...)

I need a method to possibly open a file,read some fixed number of charachters,close it and process that chunk.
Then reopen the file and continue from the charachter I have stopped on.
Thanks

Long live king Moshiach !
 
To read a fixed number of characters just see the documentation for
read FILEHANDLE,SCALAR,LENGTH
You don't need to close and reopen the file between reads: if you are obliged to do so for some reason, you need to save the position in the file with tell an go there at reopen with seek.

prex1
: Online tools for structural design
: Magnetic brakes for fun rides
: Air bearing pads
 
Look into the sysread function. Here is an example of it in use from perlfaq5 Files and Formats.


To cover the fact that the regex may span any particular buffer chunk, you should save and join each adjacent chunk. Something like the following.

Code:
[url=http://perldoc.perl.org/functions/my.html][black][b]my[/b][/black][/url] [blue]$lastchunk[/blue] = [red]'[/red][purple][/purple][red]'[/red][red];[/red]
[black][b]my[/b][/black] [blue]$buffer[/blue] = [red]'[/red][purple][/purple][red]'[/red][red];[/red]
[url=http://perldoc.perl.org/functions/open.html][black][b]open[/b][/black][/url][red]([/red]FILE, [blue]$filename[/blue][red])[/red] or [url=http://perldoc.perl.org/functions/die.html][black][b]die[/b][/black][/url] [red]"[/red][purple]Can't open `[blue]$filename[/blue]': [blue]$![/blue][/purple][red]"[/red][red];[/red]
[olive][b]while[/b][/olive] [red]([/red][url=http://perldoc.perl.org/functions/sysread.html][black][b]sysread[/b][/black][/url] FILE, [blue]$buffer[/blue], [fuchsia]4096[/fuchsia][red])[/red] [red]{[/red]
	[black][b]my[/b][/black] [blue]$matching[/blue] = [blue]$lastchunk[/blue] . [blue]$buffer[/blue][red];[/red]
	[olive][b]if[/b][/olive] [red]([/red][blue]$matching[/blue] =~ [red]/[/red][purple]regex[/purple][red]/[/red][red])[/red] [red]{[/red]
		[url=http://perldoc.perl.org/functions/print.html][black][b]print[/b][/black][/url] [red]"[/red][purple]Yippy aye kye yeah[purple][b]\n[/b][/purple][/purple][red]"[/red][red];[/red]
	[red]}[/red]
	[blue]$lastchunk[/blue] = [blue]$buffer[/blue][red];[/red]
[red]}[/red]
[url=http://perldoc.perl.org/functions/close.html][black][b]close[/b][/black][/url] FILE[red];[/red]

- Miller
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top