Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations bkrike on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Regex multiple times per line 3

Status
Not open for further replies.

stevio

Vendor
Jul 24, 2002
78
AU
Have the following data in a text file:
Code:
 0   0-0     1-0     2-0     3-0     0-1     1-1     2-1     
 1   0-2     1-2     2-2     3-2     0-3     1-3     2-3     
 2   0-4     1-4     2-4     3-4     0-5     1-5     2-5     
 3   0-6     1-6     2-6     3-6     0-7     1-7     2-7     
...
etc

Can anyone think of a more elegant solution to repeat the regex search below which basically repeats every x-x pair. The current code caters for fixed number of matches of 7, but it could be more.

Code:
open(FILE,"sample.txt") || die "Cannot open sample.txt $!\n";

while(@line = <FILE>) {    
     $line = FILE;    
      foreach my $line(@line){
        if ($line =~ /\s+\d+\s+\d+-\d+\s+\d+-\d+\s+\d+-\d+\s+\d+-\d+\s+\d+-\d+\s+\d+-\d+\s+\d+-\d+/){
         @records = split(/-/,$line);
         #put into csv here; 
        }
      }
}

Eventual format needs to be in the csv format, where each row above is re-aligned in following structure:

Code:
0,0,0
0,1,0
0,2,0
0,3,0
0,0,1
0,1,1
0,2,1
1,0,2
1,1,2
etc
 
Perl:
/\s+\d+\(s+\d+-\d+)+/
(not tested though...)

Steve

[small]"Every program can be reduced by one instruction, and every program has at least one bug. Therefore, any program can be reduced to one instruction which doesn't work." (Object::perlDesignPatterns)[/small]
 
Also the $line = FILE statement is superfluous.

Another approach:

Code:
[gray]#!/usr/bin/perl -w[/gray]
[url=http://perldoc.perl.org/functions/use.html][black][b]use[/b][/black][/url] [green]strict[/green][red];[/red]

[url=http://perldoc.perl.org/functions/open.html][black][b]open[/b][/black][/url][red]([/red]FILE,[red]"[/red][purple]sample.txt[/purple][red]"[/red][red])[/red] || [url=http://perldoc.perl.org/functions/die.html][black][b]die[/b][/black][/url] [red]"[/red][purple]Cannot open sample.txt [blue]$![/blue][purple][b]\n[/b][/purple][/purple][red]"[/red][red];[/red]

[olive][b]while[/b][/olive][red]([/red][url=http://perldoc.perl.org/functions/my.html][black][b]my[/b][/black][/url] [blue]@line[/blue] = <FILE>[red])[/red] [red]{[/red]
  [olive][b]foreach[/b][/olive] [black][b]my[/b][/black] [blue]$line[/blue][red]([/red][blue]@line[/blue][red])[/red][red]{[/red]
    [black][b]my[/b][/black] [blue]@fields[/blue] = [url=http://perldoc.perl.org/functions/split.html][black][b]split[/b][/black][/url][red]([/red][red]/[/red][purple][[:space:]]+[/purple][red]/[/red],[blue]$line[/blue][red])[/red][red];[/red]
    [olive][b]for[/b][/olive] [red]([/red][black][b]my[/b][/black] [blue]$i[/blue]=[fuchsia]2[/fuchsia][red];[/red] [blue]$i[/blue]<[blue]@fields[/blue][red];[/red] [blue]$i[/blue]++[red])[/red] [red]{[/red]
      [black][b]my[/b][/black] [blue]@records[/blue] = [black][b]split[/b][/black][red]([/red][red]/[/red][purple]-[/purple][red]/[/red],[blue]$fields[/blue][red][[/red][blue]$i[/blue][red]][/red][red])[/red][red];[/red]
      [olive][b]if[/b][/olive] [red]([/red][blue]@records[/blue] == [fuchsia]2[/fuchsia][red])[/red] [red]{[/red]
        [url=http://perldoc.perl.org/functions/print.html][black][b]print[/b][/black][/url] [red]"[/red][purple][blue]$fields[/blue][1],[blue]$records[/blue][0],[blue]$records[/blue][1][purple][b]\n[/b][/purple][/purple][red]"[/red][red];[/red]
        [gray][i]#put into csv here;[/i][/gray]
      [red]}[/red]
    [red]}[/red]
  [red]}[/red]
[red]}[/red]

Annihilannic.
 
... or another way. :)

Code:
while (<DATA>) {
	if (/^\s*\d+(?:\s*\d+-\d+)*\s*$/) {
		my ($prefix, @pairs) = split;
		foreach my $pair (@pairs) {
			print( join(',', $prefix, split('-', $pair)), "\n");
		}
	}
}

__DATA__
 0   0-0     1-0     2-0     3-0     0-1     1-1     2-1     
 1   0-2     1-2     2-2     3-2     0-3     1-3     2-3     
 2   0-4     1-4     2-4     3-4     0-5     1-5     2-5     
 3   0-6     1-6     2-6     3-6     0-7     1-7     2-7
 
Oops.. change
Code:
if (/^\s*\d+(?:\s*\d+-\d+)[red][b]*[/b][/red]\s*$/) {
to this
Code:
if (/^\s*\d+(?:\s*\d+-\d+)[blue][b]+[/b][/blue]\s*$/) {
 
Both of you deserve a star! Thanks

rharsh, can you please explain in plain english the regex

my stab at it
Code:
if (/^\s*\d+(?:\s*\d+-\d+)+\s*$/) {

^\s* # anchor start of line with zero or more spaces

\d+ # one or more digits

(?: # don't assign matches to $1,$2 etc

\s*\d+-\d+)+ #look for one or more sections which match space digit-digit. I assume the '+' \s*\d+-\d+)+ means look for one or more of these

\s*$ #anchor end of line

With this line
Code:
my ($prefix, @pairs) = split;

Is that splitting the ^\s*\d+(?:\s*\d+-\d+), so that $prefix is the variable in blue, and the red section becomes the @pairs array

I think I understand the last line, this is where you join the whole line together, separated by commas.

Thanks again.
 
Hi stevio, your stab at the regex is good, you hit all the highlights. Do you need further explanation on that (it looks like you don't.)

As far as the split goes, it doesn't have anything to do with regex (except that the regex has to match before the split is run on the input line.) From the docs:
perldoc -f split said:
split /PATTERN/,EXPR,LIMIT
split /PATTERN/,EXPR
split /PATTERN/
...
If EXPR is omitted, splits the $_ string. If PATTERN is also omitted, splits on whitespace (after skipping any leading whitespace). Anything matching PATTERN is taken to be a delimiter separating the fields. (Note that the delimiter may be longer than one character.)
Now back to the code:
Code:
my ($prefix, @pairs) = split
It splits $_ on whitespace (skipping the leading whitespace if any), assigns the first field to $prefix and everything else (all the \d-\d stuff) to @pairs.
 
one last question. So with the split
Code:
my ($prefix, @pairs) = split

Perl knows where to split based on the brackets from the regex?

Code:
^\s*\d+[b]([/b]?:\s*\d+-\d+[b])[/b]
 
No, it's completely separate. split with no parameters defaults to splitting up the "default variable", $_, by white space.

Perl knows that the left hand side is a list containing a scalar variable (which can only contain one value, so it gets the prefix), and a list (which can contain any number of values, so it gets the rest). If it just contained two scalars, the first one would get the prefix, the second would get the first 1-0 value, for example, and the rest of the fields would be discarded.

Annihilannic.
 
thanks Annihilannic, that makes perfect sense, but you almost need vulcan mind meld to know the intricacies of what Perl is doing behind the scenes - I guess it just comes with experience.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top