Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations bkrike on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Find Pattern in Binary File 1

Status
Not open for further replies.

menkes

Programmer
Nov 18, 2002
47
US
I have a binary file (PCL5) where I need to insert some information at specific points. The file consists of groups of pages (from 1 to n) separated by XML comments. The largest files are only about 50MB.

This is the first binary file I have worked with, so I am having difficulty...and after hours of searching, do not have a good answer.

Here is the first part - separate the page groups:
Code:
my $pattern = chr(27) . chr(37) . chr(49) . chr(66) . chr(67) . chr(79);
my $cnt;

# Read each group into an array element
local($/) = $pattern;
open(FH, "< some.pcl");
binmode(FH);
@slurp = <FH>;
close(FH);

open(OUTFILE, "> summary.dat");

foreach $recip (@slurp)
{
    # Ignore empty elements (1st is always empty)	
    if($recip ne '')
    {
	$end_tags = chr(47) . chr(62) . chr(34);
	$position = index($recip, $end_tags);
	$tags = substr($recip,1,$position + 2);
	print OUTFILE $tags, "\n";
    ## This is where I need help
    }
}
close OUTFILE;

The next part is finding form feeds (Hex=0C, Dec=12) in each array element. However, because the file uses raster images there are "false" form feeds that I need to ignore. In this file, raster images begin with Esc*r? (Hex:1B 2A 72 ??, Dec: 27 42 114 ??) where ? is anything BUT 'B'. That may clue you in to the end of a raster image, which is Esc*rB (Hex:1B 2A 72 42, Dec:27 42 114 66).

So, my question is: How do I find each chr(12) that is not inside of a raster image, and then store the count of chr(12)'s for that array element as well as the byte position of each one?

Thanks in advance for the brain power.
 
Look for you end of raster image, call that |XxXxX| eg, replace your form feeds then, and then replace |XxXxX| with your end of raster block

HTH
--Paul

Nancy Griffith - songstress extraordinaire,
and composer of the snipers anthem "From a distance ...
 
Unfortunately, that will not tell me if I am inside of a raster image. Consider the following pattern (assuming much more data between each code):

A = Start Raster Image
B = End Raster Image
F = Form Feed
x = Some data

xAxFxFxFxFxFxBxAxFxFxBxAxFxFxFxFxFxBxFxAxFxFxFxBxAxFxBxAxFxFxFxFxBxFxAxBx

Now, how do I locate any F that is not between an A and a B? Any single array element(scalar var) can be as large as 200K bytes.

I did come up with an idea like this:
Find the first A. Find the first B. Now store the position as a hash: $raster{Apos} = Bpos. Repeat for each subsequent A/B pair.

Now find the position of each F and push it into an array: push (@ff,Fpos).

Do a foreach on the keys of the A/B hash. Then on each iteration, do a foreach on each element of the ff array. See if the value of the ff array element is between the hash key and the hash value...if it is, splice the ff array to remove the form feed position.

What I should end up with is an array that has only the "true" form feed positions.

I believe this will work (I will test it today), but at face value it seems very inefficient.

I will post my test code later, but if anyone has another idea, I'd love to hear it.
 
$str="xAxFxFxFxFxFxBxAxFxFxBxAxFxFxFxFxFxBxFxAxFxFxFxBxAxFxBxAxFxFxFxFxBxFxAxBx"
@data=split /B/, $str;
foreach(@data) {
$_.="|BB|BB|"; #just using literal string B here,and A below
$_=~s/A/\|AA\|AA/;
if (index $_, "F"< index $_, "|AA|AA|") {
# exists between B and A
}
}

not ideal, but it does answer the question in a vague sort of way

--Paul
This should give you xAxFxFxFxFxFxB as your first element,

Nancy Griffith - songstress extraordinaire,
and composer of the snipers anthem "From a distance ...
 
Ah...great suggestion. Needed a little tweaking, but here is the result:

Code:
# Raster image start sequence
$pat_start = chr(27) . chr(42) . chr(114) . chr(49);
# Raster image end sequence
$pat_start = chr(27) . chr(42) . chr(114) . chr(66);

@data = split (/$pat_end/, $recip);
undef @ff;

foreach(@data)
{
    # Need to account for the possibility a form feed is not in this array element
    if((index($_, chr(12)) < index($_, $pat_start)) && (index($_, $pat_start) > -1))
    {
        # Now we have a form feed that is not in a raster image
        push(@ff, index($_, chr(12)));
    }

    # The last array element can have a form feed, but will not have a raster image
    if ((index($_, chr(12)) > -1) && (index($_, $pat_start) < 0))
    {
        # Now we have a form feed that is not in a raster image
        push(@ff, index($_, chr(12)));
    }
}

This gives me the valid form feed locations and count.

Thanks so much Paul for the help on this one!

-Scott
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top