progrem using pattern matching

rafiu · Dec 10, 2007

I'm working on a data mining script that will be used to save some pattern in an hash table. Below is the information on a sample file that I'm working on

.raf_group(first line);
.licoln_group(second line);
.start_raf_group(fitth line); //templated
wired .raf_group_firm ({help me please}); //templated
.muf_group (raf, lee you)

I will like to "pattern match" each line that start with a "." and stop the matching at the end of the closing parenthesis ")" Then I will love to split each setence and store the first part as a key and the second path as a value to a hash table. Thanks for your help in advance

KevinADC · Dec 10, 2007

what have you tried so far?

------------------------------------------
- Kevin, perl coder unexceptional! [wiggle]

MillerH · Dec 10, 2007

As Kevin prompted, show us what you've tried thus far. Also, it would probably help if you wrote out the exact data structure that you are hoping to obtain. Probably something like this:

Code:

[url=http://perldoc.perl.org/functions/my.html][black][b]my[/b][/black][/url] [blue]%hash[/blue] = [red]([/red]
	[red]'[/red][purple]raf_group[/purple][red]'[/red]       => [red]'[/red][purple]first line[/purple][red]'[/red],
	[red]'[/red][purple]licoln_group[/purple][red]'[/red]    => [red]'[/red][purple]second line[/purple][red]'[/red],
	[red]'[/red][purple]start_raf_group[/purple][red]'[/red] => [red]'[/red][purple]fitth line[/purple][red]'[/red],
	[red]'[/red][purple]muf_group[/purple][red]'[/red]       => [red]'[/red][purple]raf, lee you[/purple][red]'[/red],
[red])[/red][red];[/red]

- Miller

rafiu · Dec 11, 2007

sofar below is what I have
open (FILE, $filename) or die " can't";
while (<FILE>)
{
if ( /(.raf | .licoln)_group*\)/) #pattern matching each line from the begining to the closing parenthesis
$new_info = $_;
$key_info= (split '', $new_info)[1]; #trying to split the each line into two so that the first half will be save as the key and the second half as the value
$value_info = (split '',$new_info)[2];
%hash{$key_info}= value_info;
The overall hash will look like this:

%hash = (
'.raf_group' => '(first line)',
'.licoln_group' => '(second line)',
'.licoln_raf_group' => '(fifth line)',
'.raf_group' => '(raf, lee you)',

BELOW IS THE MODIFIED FILE:
.raf_group(first line);
.licoln_group(second line);
.licoln_raf_group(fitth line); //templated
wired .raf_group_firm ({help me please}); //templated
.raf_group (raf, lee you)

MillerH · Dec 11, 2007

Something like the following should work. The only difficult part to this regex is knowing what the special characters are that need to be escaped.

Also, you should note that you have two keys of the string 'raf_group'. The second one will override the first in a simple hash data structure.

Code:

[url=http://perldoc.perl.org/functions/use.html][black][b]use[/b][/black][/url] [green]strict[/green][red];[/red]

[url=http://perldoc.perl.org/functions/my.html][black][b]my[/b][/black][/url] [blue]%hash[/blue] = [red]([/red][red])[/red][red];[/red]

[olive][b]while[/b][/olive] [red]([/red]<DATA>[red])[/red] [red]{[/red]
	[olive][b]next[/b][/olive] [olive][b]if[/b][/olive] ! [red]/[/red][purple]^([purple][b]\.[/b][/purple][purple][b]\w[/b][/purple]*)[purple][b]\s[/b][/purple]*([purple][b]\([/b][/purple].*[purple][b]\)[/b][/purple])[/purple][red]/[/red][red];[/red]
	[black][b]my[/b][/black] [red]([/red][blue]$key[/blue], [blue]$val[/blue][red])[/red] = [red]([/red][blue]$1[/blue], [blue]$2[/blue][red])[/red][red];[/red]
	
	[blue]$hash[/blue][red]{[/red][blue]$key[/blue][red]}[/red] = [blue]$val[/blue][red];[/red]
[red]}[/red]

[black][b]use[/b][/black] [green]Data::Dumper[/green][red];[/red]
[url=http://perldoc.perl.org/functions/print.html][black][b]print[/b][/black][/url] [maroon]Dumper[/maroon][red]([/red]\[blue]%hash[/blue][red])[/red][red];[/red]

[teal]__DATA__[/teal]
[teal].raf_group(first line);[/teal]
[teal].licoln_group(second line);[/teal]
[teal].licoln_raf_group(fitth line); //templated[/teal]
[teal]wired .raf_group_firm     ({help me please});   //templated[/teal]
[teal].raf_group        (raf, lee you)  [/teal]

[tt]------------------------------------------------------------
Pragmas (perl 5.8.8) used :
[ul]
[li]strict - Perl pragma to restrict unsafe constructs[/li]
[/ul]
Core (perl 5.8.8) Modules used :
[ul]
[li]Data:

umper - stringified perl data structures, suitable for both printing and eval[/li]
[/ul]
[/tt]

- Miller

rafiu · Dec 11, 2007

thank you and I will try it out.

rafiu · Dec 14, 2007

I was able to pattern match what i want but when I tried to put the data into the hash table I did not get the result that I wanted. below is the original file. the result I will like to have is to save the first half of the data as the key and the other half as the value. But some of the data doesn't have space between them and the code you gave me won't work. some of the sentence appears to be together. Below is the data. for example the first line below wont split into two because there is no space between the first word and the other. The result that I will like is to have :
.lo_gr8 => (w_c1_3_n[1:8])
.lo_gr0 => (c_h1_6_i[80:0]),
.up_gr0 => (p_n[47:40]), // LPC

###########################################################
.lo_gr8(w_c1_3_n[1:8]),

.lo_gr0(c_h1_6_i[80:0]),

.up_gr7(p_h_4_n[47:40]),

.up_gr2(p_0_d_n[100:32]),

.lo_gr0 ( c_h1_6_i[80:0] ),

.lo_gr1 ( p_h_4_[47:40] ),

.up_gr2 ( c_h1_6_[80:0] ),

.lo_gr0 (p_4_n[47:40]),

.lo_gr9 ({a_y, p_h_4_n, go, p_h_4_n[47:40]}),

.lo_gr3( p_h_n[90:40] ),

.up_gr0 (p_n[47:40]), // LPC
.lo_gr8(_4_n[47:40]),

KevinADC · Dec 14, 2007

Miller has \s* in his regexp which means zero or more spaces. So if there are no spaces it should still parse correctly. The problem might be the square brackets, which represents a character class in perl, and which you did not include in your original post.

------------------------------------------
- Kevin, perl coder unexceptional! [wiggle]

rafiu · Dec 14, 2007

actually, I did not use Miller's pattern matching, I use a different one that works for what I need it for. the only problem now is to split each line into two and store them in hash. I was able to split the lines spaces in them with this code but not the line that does not have space in between.
$info = (split ' ', $_)[1]; //this gave me the first half of the setence (the one with the space ) and the whole setence for the one without space.

travs69 · Dec 14, 2007

post your code/regex

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[noevil]

Travis - Those who say it cannot be done are usually interrupted by someone else doing it; Give the wrong symptoms, get the wrong solutions;

rafiu · Dec 14, 2007

use strict;

my %hash = ();
open (DATA, "c:\\sample.txt") or "can't";
while (<DATA>)
{
next if ! /^(\.\w*)\s*($.*$)/;;
#print "$_\n";
#the next 2 lines is for taken out whitespace from the data
$data = $_;
$data =~ s/\s+//;
$data =~ s/^\s+$//;
#below is where I want to split the line into two
my ($key, $val) = ($1, $2);
$hash{$key} = $val;

}

KevinADC · Dec 14, 2007

You have repeating keys in the file so it won't work with a one dimensional hash.

------------------------------------------
- Kevin, perl coder unexceptional! [wiggle]

rafiu · Jan 3, 2008

do anyone know the best way to capture pattern match and capture the data from the begining of a delimeter to the end of the delimeter.
see sample data below

/* its a marvelous day and the weather is warm
now is the time to settle our differences
and embrace each other with one love */

I HAVE COUPLE OF THIS DATA IN COUPLE FILES THAT I WILL LIKE TO GREP. COULD YOU PLEASE HELP? THANKS

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

progrem using pattern matching

rafiu

Programmer

KevinADC

Technical User

MillerH

Programmer

rafiu

Programmer

MillerH

Programmer

rafiu

Programmer

rafiu

Programmer

KevinADC

Technical User

rafiu

Programmer

travs69

MIS

rafiu

Programmer

KevinADC

Technical User

rafiu

Programmer

Similar threads

Part and Inventory Search

Sponsor