Process huge file in chunks 2

MoshiachNow · Jun 19, 2007

HI,

If I need to process huge 500MB postscript file which is effectively a one long line (no "\n" breacks at all) .
Is there a way to breack the file analizing into chunks,so that all the computer resources are do not get consumed by this? Possibly closing and reopening it ?
Actualy the processing is analizing the file and searching for some regex match (Creator,No of pages,etc).

Thanks

Long live king Moshiach !

http://www.noahide.com

prex1 · Jun 19, 2007

Why shouldn't this be possible?
If you know the structure of your file, and you know that it is divided into chunks (or pages or sections), just read one of those at a time (don't personally know the structure of postscript files, but I would expect them to be composed of pages, and possibly a header?).
Otherwise this becomes a little trickier.
If the length of string searched by your regex is fixed throughout, just read a number of characters at least equal to two times that length, then at subsequent reads discard half of the already read string and append to it a number new characters at least equal to that length.
If the length is not fixed (you could look for two matching strings that may be separated by any nunber of characters), then I think there is no other way than assuming a maximum length for the match.

prex1

http://www.xcalcs.com

: Online tools for structural design

http://www.megamag.it

: Magnetic brakes for fun rides

http://www.levitans.com

: Air bearing pads

MoshiachNow · Jun 19, 2007

Thanks,

The file is NOT build in chunks,and I can get a full 500MB with just one long line of text+special charachters.
I normally read the "line" and get rid of special charachters (can be a very long process on a 500MB file...)

I need a method to possibly open a file,read some fixed number of charachters,close it and process that chunk.
Then reopen the file and continue from the charachter I have stopped on.
Thanks

Long live king Moshiach !

http://www.noahide.com

prex1 · Jun 19, 2007

To read a fixed number of characters just see the documentation for
read FILEHANDLE,SCALAR,LENGTH
You don't need to close and reopen the file between reads: if you are obliged to do so for some reason, you need to save the position in the file with tell an go there at reopen with seek.

prex1

http://www.xcalcs.com

: Online tools for structural design

http://www.megamag.it

: Magnetic brakes for fun rides

http://www.levitans.com

: Air bearing pads

MillerH · Jun 19, 2007

Look into the sysread function. Here is an example of it in use from perlfaq5 Files and Formats.

http://tinyurl.com/g6l3q

To cover the fact that the regex may span any particular buffer chunk, you should save and join each adjacent chunk. Something like the following.

Code:

[url=http://perldoc.perl.org/functions/my.html][black][b]my[/b][/black][/url] [blue]$lastchunk[/blue] = [red]'[/red][purple][/purple][red]'[/red][red];[/red]
[black][b]my[/b][/black] [blue]$buffer[/blue] = [red]'[/red][purple][/purple][red]'[/red][red];[/red]
[url=http://perldoc.perl.org/functions/open.html][black][b]open[/b][/black][/url][red]([/red]FILE, [blue]$filename[/blue][red])[/red] or [url=http://perldoc.perl.org/functions/die.html][black][b]die[/b][/black][/url] [red]"[/red][purple]Can't open `[blue]$filename[/blue]': [blue]$![/blue][/purple][red]"[/red][red];[/red]
[olive][b]while[/b][/olive] [red]([/red][url=http://perldoc.perl.org/functions/sysread.html][black][b]sysread[/b][/black][/url] FILE, [blue]$buffer[/blue], [fuchsia]4096[/fuchsia][red])[/red] [red]{[/red]
	[black][b]my[/b][/black] [blue]$matching[/blue] = [blue]$lastchunk[/blue] . [blue]$buffer[/blue][red];[/red]
	[olive][b]if[/b][/olive] [red]([/red][blue]$matching[/blue] =~ [red]/[/red][purple]regex[/purple][red]/[/red][red])[/red] [red]{[/red]
		[url=http://perldoc.perl.org/functions/print.html][black][b]print[/b][/black][/url] [red]"[/red][purple]Yippy aye kye yeah[purple][b]\n[/b][/purple][/purple][red]"[/red][red];[/red]
	[red]}[/red]
	[blue]$lastchunk[/blue] = [blue]$buffer[/blue][red];[/red]
[red]}[/red]
[url=http://perldoc.perl.org/functions/close.html][black][b]close[/b][/black][/url] FILE[red];[/red]

- Miller

MoshiachNow · Jun 21, 2007

Thanks a lot !
What a releaf ...

Long live king Moshiach !

http://www.noahide.com

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Process huge file in chunks 2

MoshiachNow

IS-IT--Management

prex1

Programmer

MoshiachNow

IS-IT--Management

prex1

Programmer

MillerH

Programmer

MoshiachNow

IS-IT--Management

Similar threads

Part and Inventory Search

Sponsor