Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations wOOdy-Soft on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

File search and parameter evaluation 1

Status
Not open for further replies.

sedawk

Programmer
Feb 5, 2002
247
US
Hi,

The problem is like this:

I have many files with fixed format except some values in the files are different. The file will take about 600 lines on Unix/Linux machines. Some lines are very long so viewing from text editors, the lines will be broken into two or more.

Now the task pull out some data from the long file. I know the data locates between texts "** start " and "results", for example:

Code:
..... many other texts ...
** start: abc, def
row   1 2   11
col   3 4   22
(more lines follow)

results:
...
(the other lines follow)

My question is how to locate the part of text so that I can pull out the data, say, row=11 and col=22 in the above example?

The other question I remember someone had asked before but I could not get it from search. So bear with patience. In the above example, row = 11 is needed to be extracted. However, row=1.1e1 is also the same as row=11. How to treat this wild case? I plan to use an array, but considering this case may happen, the array is too large. In more detail, all these values should be in the array:
@one={1.1 11 110 1100 1.1k 11k 11000 110k 110000 2.2 22 ...}

there are 30 such values(11,22,33,...)

Is there a better way to do that?
 
Hi,

The first part is easy:
Code:
#!/usr/bin/perl -w
use strict;
use diagnostics;
my $started = 0;
while(<>) {
  last if /^results:/;
  print if $started;
  $started = 1 if /^\*\* start:/;
}
If you prefer, you could simply push the selected records onto an array instead for later processing.
Also, if there are more blocks like this in the same file then you might like to change the "last if /^results:/;" line to be "$started=0 if /^results:/;"

The second part of your question is more difficult only because I just don't understand what you need.

If you try explaining again, with lots of "before" and "after" data examples, I'm sure we can help.


Trojan.

 
While you could do all your processing inside the while loop, Trojan's method makes it easier to see what's going on. If you change it to
Code:
#!/usr/bin/perl -w
use strict;
use diagnostics;
my $started = 0;
[red]my @data;[/red]

# select just the data you need

while(<>) {
  last if /^results:/;
  [red]push ($_, @data)[/red]if $started;
  $started = 1 if /^\*\* start:/;
}

[red]chomp @data;

foreach my $line (@data) {
   # process your data here
}[/red]
you can separate the data extraction from the data processing, which allows you to focus on one problem at at time.
 
stevexff,
Have you tested your mods to my code?
I'm not sure of your interpretation of the syntax for the "push" function.

I would suggest that it should be "push @data, $_ if $started;"

Trojan.


 
Hi Trojan and stevexff,

Thanks for your replies. They are helpful. Get back to Trojan's doubts on my second part, I try to use the modified pseudo code to express the problem:

Code:
#!/usr/bin/perl -w
use strict;
use diagnostics;
my $started = 0;
my @data;

# select just the data you need

while(<>) {
  last if /^results:/;
  push @data, $_ if $started;"
  $started = 1 if /^\*\* start:/;
}

chomp @data;

# here is the second part problem. 
# Each line has the pattern like this: 
# name start_index end_index value, 
# eg, row 1 2 11 or
# row 1 2 1100 or row 1 2 11k etc (the value could be 
# anything, say 1.1 , 11, 110, etc. But they belong to the 
# same group), 
# Let's call this an 11-group

# pseudo code on data processing
foreach my $line (@data) {
   data = value in each line;
   check if the value in 11-group;
       yes => count11++;
   no check if the value in other value group(say, 22-group)
       yes => count22++;
   other value checking
   else not in any of the value groups
       notincluded++;
}
 
I'm sorry but there are still things I don't understand about this.
Your sample data (correct me if I'm wrong):
row 1 2 11
row 1 2 1100
row 1 2 11k

How do you define an 11-group?
Is it anything that starts with 11?
Is 1.1 part of an 11-group?

You show that you want to count 11-group info and 22-group info.
What is your [red]complete[/red] list of groups you want to collect counts for or would you like to collect for all groups?


If I can understand how you classify your data and what you actually want to count then I'm sure this is an easy fix.



Trojan.


 
Hi Trojan,

1.1 is part of an 11-group. It may not be started with 11. In detail, the complete 11-group list will be as the following:

@eleven={1.1 11 110 1100 1.1k 11000 11k 110000 110k}

as long as 11 occurs despite the decimal point locations, they are in 11-group. Same as 22-group and other groups:

@twotwo={2.2 22 220 2200 2.2k 22000 22k 220000 220k}.

The group list may look like this:

11-group, 22-group, 33-group, 34-group, ...
 
Hopefully this gives you a complete solution.
Code:
#!/usr/bin/perl -w
use strict;
use diagnostics;
my $started = 0;
my @data = ();
my %lookups = ( (map {($_, "11-group")}
                     qw(1.1 11 110 1100 1.1k 11000 11k 110000 110k)),
                (map {($_, "22-group")}
                     qw(2.2 22 220 2200 2.2k 22000 22k 220000 220k)),
              );
my %results = ();
my $notincluded = 0;
while(<>) {
  chomp;
  last if /^results:/;
  if ($started) {
    my ($name, $start, $end, $value) = split;
    if(exists $lookups{$value}) {
      $results{$lookups{$value}}++;
    } else {
      $notincluded++;
    }
  }
  $started = 1 if /^\*\* start:/;
}
foreach my $groupname (keys %results) {
  print $groupname, " has ", $results{$groupname}, " value(s)\n";
}
print "$notincluded were not included\n";

We could, of course, consider using regex's but this should be faster and is simple to maintain.

If you need more groups, just add them to the %lookups hash in the same way the others have been done.


Trojan.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top