Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Chriss Miller on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

How to insert a string in an ascii file at a specific place? 1

Status
Not open for further replies.

lcs01

Programmer
Joined
Aug 2, 2006
Messages
182
Location
US
Here is what I need to do:

I have a flat file. I need to highlight the certain parts (more than one occurance) in red of the file. I can get the file name, the offset and the length of the string to be highlighted from a DB, which I know how. But I don't know how to correctly insert html tags into this file at the RIGHT position.

I guess the short answer is using seek(). But I don't exactly know how to use it. Could someone kindly show me an example?

BTW, I want the newfile with the correct highlights stored in memory instead of in the hard drive.

Thank you for your help.
 
a sample line or two of the flat file and where the tags go would be helpful in answering your question.
 
A sample file could be any ascii file. For instance, I just copied couple of paragraphs from WSJ as a sample:

Sample file:
===========
Stocks turned lower Friday as traders searched for clues about the direction of the economy amid rising oil prices and lingering worries about the housing market.

A speech from Federal Reserve Chairman Ben Bernanke in Jackson Hole, Wyo., at the central bank's annual confab, had little impact on stocks.

Mr. Bernanke didn't comment on monetary policy, instead addressing issues concerning the "unprecedented" pace of global economic integration, giving traders who delayed their end-of-summer vacations for the speech -- just in case the Fed chief dropped any bombshells -- the all clear to abandon ship.

"The market didn't get any sense of direction from Bernanke," said Hugh Johnson, chief investment officer at Johnson Illington Advisors. A lack of major economic or corporate news has left "the markets very directionless," he said.
==================
End of sample file

I want to highlight the above file at following specific spot:

1) offset = 50, length = 40;
2) offset = 130, length = 58;
......

Where offset is in bytes and always starts from the beginning of the file. Length indicates the number of bytes starting at 'offset' should be highlighted.

Thank you for your help.
 
one way:

Code:
my $sample = q~Stocks turned lower Friday as traders searched for clues about the direction of the economy amid rising oil prices and lingering worries about the housing market.

A speech from Federal Reserve Chairman Ben Bernanke in Jackson Hole, Wyo., at the central bank's annual confab, had little impact on stocks.

Mr. Bernanke didn't comment on monetary policy, instead addressing issues concerning the "unprecedented" pace of global economic integration, giving traders who delayed their end-of-summer vacations for the speech -- just in case the Fed chief dropped any bombshells -- the all clear to abandon ship.

"The market didn't get any sense of direction from Bernanke," said Hugh Johnson, chief investment officer at Johnson Illington Advisors. A lack of major economic or corporate news has left "the markets very directionless," he said.
~;

$sample =~ s/^(.{50})(.{40})/$1<span style="color:red">$2<\/span>/;
$sample =~ s/^(.{130})(.{58})/$1<span style="color:red">$2<\/span>/;

print $sample;
 
Thank you so much, Kevin.

I tested your code. There is a small bug in your code. When you do substitution 2nd time, it can not start from 130 any more.

I modified you code a bit:

Code:
my $str1 = qq{<span style="color:red">};
my $str2 = qq{<\/span>};
my $len1 = length($str1);
my $len2 = length($str2);
my $next = $len1 + $len2 + 130;
print "\$len1 = $len1, \$len2 = $len2, \$next = $next\n\n\n";

$sample =~ s/^(.{50})(.{40})/$1<span style="color:red">$2<\/span>/;
#$sample =~ s/^(.{130})(.{58})/$1<span style="color:red">$2<\/span>/;
[b]$sample =~ s/^(.{$next})(.{58})/$1<span style="color:red">$2<\/span>/;[/b]

The line in BOLD does not work. It does not recogonize $next. How to solve this?

In addition, I was trying to use:

Code:
[b]
seek(HANDLE, OFFSET, POSITION)
read(HANDLE, SCALAR, LENGTH, OFFSET)
[/b]

But I can not figure out how to correctly set LENGTH, OFFSET, POSITION.

Again, thank you for your help.
 
I did one more test. The modified code is this:

Code:
$sample =~ s/^(.{50})(.{40})/$1<span style="color:red">$2<\/span>/;
[b]$sample =~ s/^(.{[COLOR=red]161[/color]})(.{58})/$1<span style="color:red">$2<\/span>/;
[/b]

And the line in bold stops working. When I changed it back to '130', it works again. I am soooo confused. You must have a magic touch, Kevin.
 
I think if you do the substitutions in reverse order, the insertion of the html code will not throw off the "second" substitution:

Code:
my $str1 = qq{<span style="color:red">};
my $str2 = qq{</span>};
#my $len1 = length($str1);
#my $len2 = length($str2);
#my $next = $len1 + $len2 + 130;
#print "\$len1 = $len1, \$len2 = $len2, \$next = $next\n\n\n";

my $sample = q~Stocks turned lower Friday as traders searched for clues about the direction of the economy amid rising oil prices and lingering worries about the housing market.

A speech from Federal Reserve Chairman Ben Bernanke in Jackson Hole, Wyo., at the central bank's annual confab, had little impact on stocks.

Mr. Bernanke didn't comment on monetary policy, instead addressing issues concerning the "unprecedented" pace of global economic integration, giving traders who delayed their end-of-summer vacations for the speech -- just in case the Fed chief dropped any bombshells -- the all clear to abandon ship.

"The market didn't get any sense of direction from Bernanke," said Hugh Johnson, chief investment officer at Johnson Illington Advisors. A lack of major economic or corporate news has left "the markets very directionless," he said.
~;

$sample =~ s/^(.{130})(.{58})/$1$str1$2$str2/s;
$sample =~ s/^(.{50})(.{40})/$1$str1$2$str2/s;

print $sample;

I added the 's' option to the end of the regexps since the example you posted has multiple newline characters. 's' treats the string as one long line insted of multiple lines.

as far as I know, seek and read can't do what you want.
 
Kevin,

Thank you so much! The 's' option is th KEY!!

However, I am still having problems. When an offset is too big (>32,766 bytes), the substitution would not work. The error is:

Code:
Quantifier in {,} bigger than 32766 before HERE mark in regex m/^(.{ << HERE 61394})(.{398})/\n

One way I can think of is that I chop the file. Then I have to count the offset accordingly.

Is there an alternative way to fix this problem?

Thanks.
 
32766 is the upper limit on a {,} quantifier for most platforms if not all of them. I am not sure what to suggest because now I am not sure what you are doing. How do you determine where the html has to be inserted in the text?
 
Kevin,

After my modification of your code, here comes my version:

Code:
#! /usr/bin/perl

my $sample = q~Stocks turned lower Friday as traders searched for clues about the direction of the economy amid rising oil prices and lingering worries about the housing market.

A speech from Federal Reserve Chairman Ben Bernanke in Jackson Hole, Wyo., at the central bank's annual confab, had little impact on stocks.

Mr. Bernanke didn't comment on monetary policy, instead addressing issues concerning the "unprecedented" pace of global economic integration, giving traders who delayed their end-of-summer vacations for the speech -- just in case the Fed chief dropped any bombshells -- the all clear to abandon ship. 

"The market didn't get any sense of direction from Bernanke," said Hugh Johnson, chief investment officer at Johnson Illington Advisors. A lack of major economic or corporate news has left "the markets very directionless," he said. 
~;

my $len = length($sample);
print "\$len = $len\n";

my $str1 = qq{<span style="color:red">};
my $str2 = qq{<\/span>};

my %offsetLength =
(
  '20', '20',
  '140', '10',
  '270', '20',
);

my @keys = sort numerically (keys(%offsetLength));
for(my $i = $#keys; $i >= 0; $i--) {
  my $offset = $keys[$i];
  my $length = $offsetLength{$keys[$i]};
  print "$i, \$offset = $offset, \$length = $length\n";
  $sample =~ s/^(.{$offset})(.{$length})/$1$str1$2$str2/s;
}

print "\n\n";
print $sample;

sub numerically {$a <=> $b;}

I hope the sample code above could tell you what I am trying to do. And as you can see here, if $offset > 32,766, the code would not work.

Any suggestions?

Many thanks!!
 
I have no idea how efficient this is but it works OK. See if it works with the bigger files you must be using:

Code:
my $sample = q~Stocks turned lower Friday as traders searched for clues about the direction of the economy amid rising oil prices and lingering worries about the housing market.

A speech from Federal Reserve Chairman Ben Bernanke in Jackson Hole, Wyo., at the central bank's annual confab, had little impact on stocks.

Mr. Bernanke didn't comment on monetary policy, instead addressing issues concerning the "unprecedented" pace of global economic integration, giving traders who delayed their end-of-summer vacations for the speech -- just in case the Fed chief dropped any bombshells -- the all clear to abandon ship.

"The market didn't get any sense of direction from Bernanke," said Hugh Johnson, chief investment officer at Johnson Illington Advisors. A lack of major economic or corporate news has left "the markets very directionless," he said.
~;

my $str1 = q{<span style="color:red">};
my $str2 = q{</span>};

my %offsetLength =
(
  '20', '20',
  '140', '10',
  '270', '20',
);

my @keys = sort {$b <=> $a} keys %offsetLength;
for my $i (0..$#keys) {
   my $red_parts = substr($sample,$keys[$i],$offsetLength{$keys[$i]});
   substr($sample,$keys[$i],$offsetLength{$keys[$i]},"$str1$red_parts$str2");
}
print $sample;
 
Thank you Kevin for you wonderful help! The new implementation works very well. And sorry for getting back to you so late, 'cause I was out of town for the last couple days.

However, I have one more question:

Do you think this can also be done by using seek/tell/read? I tried it and could not make it work.

Again, thank you so much, Kevin.
 
I don't think it can be done using seek/tell/read. Maybe someone else knows different.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top