How to read only certain lines of file

chep80 · Jul 21, 2006

My input file looks like this:

stk.v.4.0

BEGIN Ephemeris

NumberOfEphemerisPoints 40321

ScenarioEpoch 15 Oct 2008 12:00:00.000

EphemerisEciTimePosVel

0.00000000000000e+000 -5.10704934943446e+006 -4.30468742388535e+006 2.34134338660962e+006 1.14622110187997e+003 2.44791644659563e+003 7.00081996659693e+003
6.00000000000000e+001 -5.02804956947231e+006 -4.14946215282220e+006 2.75587684053564e+006 1.48771794293052e+003 2.73281630226811e+003 6.82905617631048e+003
1.20000000000000e+002 -4.92876004041422e+006 -3.97748713421454e+006 3.15928732170990e+006 1.82320352040023e+003 3.00669437297306e+003 6.62972968657319e+003
1.80000000000000e+002 -4.80958189711036e+006 -3.78945599354728e+006 3.54994662828607e+006 2.15132406142333e+003 3.26844510161458e+003 6.40364499730543e+003
2.40000000000000e+002 -4.67099652926446e+006 -3.58612717085058e+006 3.92627802332154e+006 2.47075554499157e+003 3.51701184603798e+003 6.15171460690441e+003
3.00000000000000e+002 -4.51356363698536e+006 -3.36832086146875e+006 4.28676259862815e+006 2.78020904459190e+003 3.75139114450449e+003 5.87495532932304e+003
3.60000000000000e+002 -4.33791897034733e+006 -3.13691570752920e+006 4.62994540522125e+006 3.07843592910052e+003 3.97063676665218e+003 5.57448419012317e+003
4.20000000000000e+002 -4.14477176208832e+006 -2.89284525345699e+006 4.95444132562417e+006 3.36423290094763e+003 4.17386353358942e+003 5.25151391806686e+003
4.80000000000000e+002 -3.93490186381359e+006 -2.63709417983416e+006 5.25894066432676e+006 3.63644685122379e+003 4.36025089169865e+003 4.90734805044189e+003
5.40000000000000e+002 -3.70915659726814e+006 -2.37069433079677e+006 5.54221443383469e+006 3.89397951213889e+003 4.52904622572164e+003 4.54337567187682e+003
6.00000000000000e+002 -3.46844733339136e+006 -2.09472055099714e+006 5.80311931497438e+006 4.13579188806219e+003 4.67956789774935e+003 4.16106580787981e+003
6.60000000000000e+002 -3.21374581296524e+006 -1.81028634892688e+006 6.04060227143353e+006 4.36090844726627e+003 4.81120799984520e+003 3.76196149572940e+003
7.20000000000000e+002 -2.94608022371052e+006 -1.51853940409808e+006 6.25370479991249e+006 4.56842105746374e+003 4.92343480918622e+003 3.34767355664728e+003
2.41914000000000e+006 2.70364071932224e+005 -2.16131771510656e+006 -6.73422486544053e+006 -2.66180561274872e+003 -6.71131669171029e+003 2.04709973126991e+003
2.41920000000000e+006 1.10441469725533e+005 -2.55887565119471e+006 -6.59804014382690e+006 -2.67455826403108e+003 -6.55228978814786e+003 2.49636428731320e+003

END Ephemeris

I just want to read the array of values. How do I tell the script to start reading at the 11th line and stop at the (# of rows in array) - 11?

KevinADC · Jul 21, 2006

you could read the whole file into an array and then just start reading the array from the desired position:

Code:

my @array = <INPUT>;
for (10..$#array) {
   print $array[$_];
}

you could use $. which is the input line number variable to skip the first ten lines:

Code:

while(<INPUT>){
   next if $. < 10;
   print;
}

there are a few more ways to accomplish the same thing.

ishnid · Jul 21, 2006

Think you've slightly misread the question, Kevin. The OP wishes to stop at (@array-11). In that case, the first solution posted can easily be adapted. The second won't work as you won't know how far from the end you are at any point.

Whether the first is acceptable or not depends on whether the file is small enough to be held in memory all together. If not, you might consider something like this (a buffer of sorts):

Code:

my @array;
while(<INPUT>) {
   push @array, $_ if ( $. > 10 );
   last if ( eof );
   print shift @array if ( @array > 10 );
}

raklet · Jul 21, 2006

Ishnid,

How would you modify your "buffer of sorts" to capture all of the numerical data. That particular snippet only captures the first ten lines of numbers, but leaves off substantially more.

Chep80,
I know your file is some 40,000+ lines, so you may not want to read the whole thing into an array due to possible memory issues. Reading and processing the file one line at a time would be ideal. Here is another way of doing it.

Code:

my @array;
my $capture;
while(<DATA>) {
    chomp;
    if (/EphemerisEciTimePosVel/) { # Find this line in the file
        $capture = 1; # Turn data capture when line is found
        next;
    }
    if (/END Ephemeris/) { # Find this line in the file
        $capture = 0; # Turn data capture off when line is found
        next;
    }
    if ($capture) { # Capture the data
        push @array, $_ unless !$_;
    }
}
print "$_\n" foreach @array; 
# Print out each line.  The numbers aren't split into
# individual numbers, each position in the array is just 
# being treated as a big string.  You would have to do 
# another loop here with split to divide up the numbers if 
# needed.

ishnid · Jul 22, 2006

For my test, I have a file with the numbers 1 .. 300 in it, one on each line. That snipped prints out lines 11 ... 289, which is what I interpreted the OP's question as.

Actually, that code is a line too long: don't know what I was thinking (in my defence, it was late ;-))

Code:

while(<INPUT>) {
   push @array, $_ if ( $. > 10 );
   print shift @array if ( @array > 11 )
}

TrojanWarBlade · Jul 22, 2006

I'm assuming that you want to start at line 11 because you're only interested in the lines with the numbers (and end early for the same reason).
If that is the case you could simply read all lines and skip the ones that do not have 7 space separated scientific notation numbers in.
I have never been a fan of reading the whole thing into an array and I would guess that you probably have lots of data to process and so could easily have a process that runs out of memory.
So here goes my two penneth:

Code:

#!/usr/bin/perl -w
use strict;
while(<>) {
    # The following line only tests for 6 numbers but that should be enough
    # to differentiate.
    next unless /(?:[-+]?\d+\.\d+e[-+]?\d\d\d?\s+){6}/;
    # No point chomping before rejecting a record.
    chomp;
    # Split the record up into fields to process.
    my @fields = split;
    # Process @fields here.  If you really need all records in memory at once
    # you could push them onto an array like this:
    # push @records, [@fields];
}

Trojan.

PaulTEG · Jul 22, 2006

Another tuppence in the ring.

That's a lot of comparisons to evaluate if one were to fire up the regex for each line of a 40K file.

Code:

#!/usr/bin/perl
open FH, "<temp.txt"; #60K lines used about 15MB of RAM
while (<FH>) {
  if ($. > 10) {
     chomp;
     push (@values, split (/ /, $_));
  }
}
pop (@values); #remove Ephemeris
pop (@values); #remove End
close FH;
print $_."\n" foreach(@values);

I monitored the memory usage using MemTurbo, so it's not an official reading but an average after observation of 5 runs

60K lines ~= 9MB, so in this case the size of the file would be manageable, even with Kevin's code

Paul
------------------------------------
Spend an hour a week on CPAN, helps cure all known programming ailments ;-)

TrojanWarBlade · Jul 22, 2006

Back to reading it all into memory are we?
How do we know it's gonna be 60K records?
What if it's 600K records?
Or 6 million records?
or 60 million records?
Still managable?
Granted the regex is a little heavy but at least processing record by record the code would still be working at the end of a feed of 6 million records.

Trojan.

PaulTEG · Jul 22, 2006

when they get to 6 million, they'll be able to afford us

Paul
------------------------------------
Spend an hour a week on CPAN, helps cure all known programming ailments ;-)

TrojanWarBlade · Jul 23, 2006

Hahahaha
That made me chuckle!

Trojan.

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

How to read only certain lines of file

chep80

Technical User

KevinADC

Technical User

ishnid

Programmer

raklet

MIS

ishnid

Programmer

TrojanWarBlade

Programmer

PaulTEG

Technical User

TrojanWarBlade

Programmer

PaulTEG

Technical User

TrojanWarBlade

Programmer

Similar threads

Part and Inventory Search

Sponsor