Regex multiple times per line 3

stevio · Sep 22, 2009

Have the following data in a text file:

Code:

 0   0-0     1-0     2-0     3-0     0-1     1-1     2-1     
 1   0-2     1-2     2-2     3-2     0-3     1-3     2-3     
 2   0-4     1-4     2-4     3-4     0-5     1-5     2-5     
 3   0-6     1-6     2-6     3-6     0-7     1-7     2-7     
...
etc

Can anyone think of a more elegant solution to repeat the regex search below which basically repeats every x-x pair. The current code caters for fixed number of matches of 7, but it could be more.

Code:

open(FILE,"sample.txt") || die "Cannot open sample.txt $!\n";

while(@line = <FILE>) {    
     $line = FILE;    
      foreach my $line(@line){
        if ($line =~ /\s+\d+\s+\d+-\d+\s+\d+-\d+\s+\d+-\d+\s+\d+-\d+\s+\d+-\d+\s+\d+-\d+\s+\d+-\d+/){
         @records = split(/-/,$line);
         #put into csv here; 
        }
      }
}

Eventual format needs to be in the csv format, where each row above is re-aligned in following structure:

Code:

0,0,0
0,1,0
0,2,0
0,3,0
0,0,1
0,1,1
0,2,1
1,0,2
1,1,2
etc

stevexff · Sep 22, 2009

Perl:

/\s+\d+\(s+\d+-\d+)+/

(not tested though...)

Steve

[small]"Every program can be reduced by one instruction, and every program has at least one bug. Therefore, any program can be reduced to one instruction which doesn't work." (Object:erlDesignPatterns)[/small]

Annihilannic · Sep 22, 2009

Also the $line = FILE statement is superfluous.

Another approach:

Code:

[gray]#!/usr/bin/perl -w[/gray]
[url=http://perldoc.perl.org/functions/use.html][black][b]use[/b][/black][/url] [green]strict[/green][red];[/red]

[url=http://perldoc.perl.org/functions/open.html][black][b]open[/b][/black][/url][red]([/red]FILE,[red]"[/red][purple]sample.txt[/purple][red]"[/red][red])[/red] || [url=http://perldoc.perl.org/functions/die.html][black][b]die[/b][/black][/url] [red]"[/red][purple]Cannot open sample.txt [blue]$![/blue][purple][b]\n[/b][/purple][/purple][red]"[/red][red];[/red]

[olive][b]while[/b][/olive][red]([/red][url=http://perldoc.perl.org/functions/my.html][black][b]my[/b][/black][/url] [blue]@line[/blue] = <FILE>[red])[/red] [red]{[/red]
  [olive][b]foreach[/b][/olive] [black][b]my[/b][/black] [blue]$line[/blue][red]([/red][blue]@line[/blue][red])[/red][red]{[/red]
    [black][b]my[/b][/black] [blue]@fields[/blue] = [url=http://perldoc.perl.org/functions/split.html][black][b]split[/b][/black][/url][red]([/red][red]/[/red][purple][[:space:]]+[/purple][red]/[/red],[blue]$line[/blue][red])[/red][red];[/red]
    [olive][b]for[/b][/olive] [red]([/red][black][b]my[/b][/black] [blue]$i[/blue]=[fuchsia]2[/fuchsia][red];[/red] [blue]$i[/blue]<[blue]@fields[/blue][red];[/red] [blue]$i[/blue]++[red])[/red] [red]{[/red]
      [black][b]my[/b][/black] [blue]@records[/blue] = [black][b]split[/b][/black][red]([/red][red]/[/red][purple]-[/purple][red]/[/red],[blue]$fields[/blue][red][[/red][blue]$i[/blue][red]][/red][red])[/red][red];[/red]
      [olive][b]if[/b][/olive] [red]([/red][blue]@records[/blue] == [fuchsia]2[/fuchsia][red])[/red] [red]{[/red]
        [url=http://perldoc.perl.org/functions/print.html][black][b]print[/b][/black][/url] [red]"[/red][purple][blue]$fields[/blue][1],[blue]$records[/blue][0],[blue]$records[/blue][1][purple][b]\n[/b][/purple][/purple][red]"[/red][red];[/red]
        [gray][i]#put into csv here;[/i][/gray]
      [red]}[/red]
    [red]}[/red]
  [red]}[/red]
[red]}[/red]

Annihilannic.

rharsh · Sep 22, 2009

... or another way.

Code:

while (<DATA>) {
	if (/^\s*\d+(?:\s*\d+-\d+)*\s*$/) {
		my ($prefix, @pairs) = split;
		foreach my $pair (@pairs) {
			print( join(',', $prefix, split('-', $pair)), "\n");
		}
	}
}

__DATA__
 0   0-0     1-0     2-0     3-0     0-1     1-1     2-1     
 1   0-2     1-2     2-2     3-2     0-3     1-3     2-3     
 2   0-4     1-4     2-4     3-4     0-5     1-5     2-5     
 3   0-6     1-6     2-6     3-6     0-7     1-7     2-7

rharsh · Sep 22, 2009

Oops.. change

Code:

if (/^\s*\d+(?:\s*\d+-\d+)[red][b]*[/b][/red]\s*$/) {

to this

Code:

if (/^\s*\d+(?:\s*\d+-\d+)[blue][b]+[/b][/blue]\s*$/) {

stevio · Sep 22, 2009

Both of you deserve a star! Thanks

rharsh, can you please explain in plain english the regex

my stab at it

Code:

if (/^\s*\d+(?:\s*\d+-\d+)+\s*$/) {

^\s* # anchor start of line with zero or more spaces

\d+ # one or more digits

(?: # don't assign matches to $1,$2 etc

\s*\d+-\d+)+ #look for one or more sections which match space digit-digit. I assume the '+' \s*\d+-\d+)+ means look for one or more of these

\s*$ #anchor end of line

With this line

Code:

my ($prefix, @pairs) = split;

Is that splitting the ^\s*\d+(?:\s*\d+-\d+), so that $prefix is the variable in blue, and the red section becomes the @pairs array

I think I understand the last line, this is where you join the whole line together, separated by commas.

Thanks again.

rharsh · Sep 23, 2009

Hi stevio, your stab at the regex is good, you hit all the highlights. Do you need further explanation on that (it looks like you don't.)

As far as the split goes, it doesn't have anything to do with regex (except that the regex has to match before the split is run on the input line.) From the docs:

perldoc -f split said:
split /PATTERN/,EXPR,LIMIT
split /PATTERN/,EXPR
split /PATTERN/
...
If EXPR is omitted, splits the $_ string. If PATTERN is also omitted, splits on whitespace (after skipping any leading whitespace). Anything matching PATTERN is taken to be a delimiter separating the fields. (Note that the delimiter may be longer than one character.)

Now back to the code:

Code:

my ($prefix, @pairs) = split

It splits $_ on whitespace (skipping the leading whitespace if any), assigns the first field to $prefix and everything else (all the \d-\d stuff) to @pairs.

stevio · Sep 23, 2009

one last question. So with the split

Code:

my ($prefix, @pairs) = split

Perl knows where to split based on the brackets from the regex?

Code:

^\s*\d+[b]([/b]?:\s*\d+-\d+[b])[/b]

Annihilannic · Sep 23, 2009

No, it's completely separate. split with no parameters defaults to splitting up the "default variable", $_, by white space.

Perl knows that the left hand side is a list containing a scalar variable (which can only contain one value, so it gets the prefix), and a list (which can contain any number of values, so it gets the rest). If it just contained two scalars, the first one would get the prefix, the second would get the first 1-0 value, for example, and the rest of the fields would be discarded.

Annihilannic.

stevio · Sep 23, 2009

thanks Annihilannic, that makes perfect sense, but you almost need vulcan mind meld to know the intricacies of what Perl is doing behind the scenes - I guess it just comes with experience.

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Regex multiple times per line 3

stevio

Vendor

stevexff

Programmer

Annihilannic

MIS

rharsh

Technical User

rharsh

Technical User

stevio

Vendor

rharsh

Technical User

stevio

Vendor

Annihilannic

MIS

stevio

Vendor

Similar threads

Part and Inventory Search

Sponsor