Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations TouchToneTommy on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

print from a Regular Expression, repeating lines

Status
Not open for further replies.

mrlambdin

IS-IT--Management
Apr 9, 2008
3
US
I have an archive file that has 300+ archives in this format

____________________________________________________
Archive[0]: d:\archives\piarch.012 (500MB, Used: 9.0%)
PIarcfilehead[$Workfile: piarfile.cxx $ $Revision: 114 $]::
Version: 5 Path: d:\archives\piarch.012
State: 4 Type: 0 (fixed) Write Flag: 1 Shift Flag: 1
Record Size: 1024 Count: 512000 Add Rate/Hour: 4118.3
Offsets: Primary: 25853/128000 Overflow: 491596/512000
Start Time: 1-Apr-08 22:02:38
End Time: Current Time
Backup Time: 2-Apr-08 02:01:07
______________________________________________________

and I need to extract the "Archive[x]... Start Time:... End Time:.... lines, printed on (1) SINGLE line

expected output: "Archive[x]...Start Time:...End Time:...


here's my code:

Code:
my $m1;
my $m2;

$filename = "PIDATA";

open( FILE, "< $filename" ) or die "Can't open $filename: $!";

print "PI Archive Record Extraction.\n";
print "Matthew R. Lambdin\n";
print "______________________________________________________\n";

while(<FILE>){

chomp;

if(/^\s*Archive\[\d+\].*/){
        $m1 =~ $_;
}

if(/^\s*Start\s+Time.*/){
        $m2 =~ $_;
}

if(/^\s*End\s+Time.*/){
        print "$m1 ...$m2 ...$_\n";
}
}


When I run this, it prints out: " ... ... End Time: " meaning, only the 3rd regular expression gets printed. Why won't the first and second Regular Expression print? all I get is a space.

Thank you in advance.
 
change $m1 =~ $_; etc to $m1 = $_;

so use "=" instead of "=~"

 
If all you want to do is print the records:

Code:
my $filename = "PIDATA";

open( FILE, "< $filename" ) or die "Can't open $filename: $!";

print "PI Archive Record Extraction.\n";
print "Matthew R. Lambdin\n";
print "______________________________________________________\n";

while(<FILE>){

chomp;

if(/^\s*Archive\[\d+\].*/){
        print "$_ ";
}

if(/^\s*Start\s+Time.*/){
        print "$_ ";
}

if(/^\s*End\s+Time.*/){
        print "$_\n";
}
}


------------------------------------------
- Kevin, perl coder unexceptional! [wiggle]
 
actually the 'if' conditions are better written as if/elsif conditions:

Code:
if(/^\s*Archive\[\d+\].*/){
        print "$_ ";
}

elsif(/^\s*Start\s+Time.*/){
        print "$_ ";
}

elsif(/^\s*End\s+Time.*/){
        print "$_\n";
}

otherwise perl will evaluate all of the regexps instead of stopping when it finds the correct one.

------------------------------------------
- Kevin, perl coder unexceptional! [wiggle]
 
I changed the code to the last reply, and my output is:

Archive[0]: d:\archives\piarch.012 (500MB, Used: 9.0%)
Start Time: 1-Apr-08 22:02:38
End Time: Current Time


I thought "chomp" would remove the extra whitespace and the result would be printed all on ONE line?
 
chomp removes the newline at the end of the data, if there is one. What constitutes a newline depends on which operating system you use. Things can go awry if the file was created on say a Unix box, and you are reading it on Windows or a Mac.

Steve

[small]"Every program can be reduced by one instruction, and every program has at least one bug. Therefore, any program can be reduced to one instruction which doesn't work." (Object::perlDesignPatterns)[/small]
 
I copied the output into MS Word, and found each line had a "newline" entered, which explains why the output is printing this way... Does PERL have a command to remove formatting of a text file before extracting data?
 
All you need to do is alter Kevin's suggestion to use capturing...

Code:
if(/^\s*(Archive\[\d+\].*)\s*$/){
        print "$1 ";
}

elsif(/^\s*(Start\s+Time.*)\s*$/){
        print "$1 ";
}

elsif(/^\s*(End\s+Time.*)\s*$/){
        print "$1\n";
}

The 'stuff inside' the ()'s is captured into '$1'
If there was a second set of parentheses, the stuff inside them would be captured into $2.
And so on.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top