Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations bkrike on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

progrem using pattern matching

Status
Not open for further replies.

rafiu

Programmer
Jul 3, 2007
14
US
I'm working on a data mining script that will be used to save some pattern in an hash table. Below is the information on a sample file that I'm working on

.raf_group(first line);
.licoln_group(second line);
.start_raf_group(fitth line); //templated
wired .raf_group_firm ({help me please}); //templated
.muf_group (raf, lee you)


I will like to "pattern match" each line that start with a "." and stop the matching at the end of the closing parenthesis ")" Then I will love to split each setence and store the first part as a key and the second path as a value to a hash table. Thanks for your help in advance
 
what have you tried so far?

------------------------------------------
- Kevin, perl coder unexceptional! [wiggle]
 
As Kevin prompted, show us what you've tried thus far. Also, it would probably help if you wrote out the exact data structure that you are hoping to obtain. Probably something like this:

Code:
[url=http://perldoc.perl.org/functions/my.html][black][b]my[/b][/black][/url] [blue]%hash[/blue] = [red]([/red]
	[red]'[/red][purple]raf_group[/purple][red]'[/red]       => [red]'[/red][purple]first line[/purple][red]'[/red],
	[red]'[/red][purple]licoln_group[/purple][red]'[/red]    => [red]'[/red][purple]second line[/purple][red]'[/red],
	[red]'[/red][purple]start_raf_group[/purple][red]'[/red] => [red]'[/red][purple]fitth line[/purple][red]'[/red],
	[red]'[/red][purple]muf_group[/purple][red]'[/red]       => [red]'[/red][purple]raf, lee you[/purple][red]'[/red],
[red])[/red][red];[/red]

- Miller
 
sofar below is what I have
open (FILE, $filename) or die " can't";
while (<FILE>)
{
if ( /(.raf | .licoln)_group*\)/) #pattern matching each line from the begining to the closing parenthesis
$new_info = $_;
$key_info= (split '', $new_info)[1]; #trying to split the each line into two so that the first half will be save as the key and the second half as the value
$value_info = (split '',$new_info)[2];
%hash{$key_info}= value_info;
The overall hash will look like this:

%hash = (
'.raf_group' => '(first line)',
'.licoln_group' => '(second line)',
'.licoln_raf_group' => '(fifth line)',
'.raf_group' => '(raf, lee you)',


BELOW IS THE MODIFIED FILE:
.raf_group(first line);
.licoln_group(second line);
.licoln_raf_group(fitth line); //templated
wired .raf_group_firm ({help me please}); //templated
.raf_group (raf, lee you)





 
Something like the following should work. The only difficult part to this regex is knowing what the special characters are that need to be escaped.

Also, you should note that you have two keys of the string 'raf_group'. The second one will override the first in a simple hash data structure.

Code:
[url=http://perldoc.perl.org/functions/use.html][black][b]use[/b][/black][/url] [green]strict[/green][red];[/red]

[url=http://perldoc.perl.org/functions/my.html][black][b]my[/b][/black][/url] [blue]%hash[/blue] = [red]([/red][red])[/red][red];[/red]

[olive][b]while[/b][/olive] [red]([/red]<DATA>[red])[/red] [red]{[/red]
	[olive][b]next[/b][/olive] [olive][b]if[/b][/olive] ! [red]/[/red][purple]^([purple][b]\.[/b][/purple][purple][b]\w[/b][/purple]*)[purple][b]\s[/b][/purple]*([purple][b]\([/b][/purple].*[purple][b]\)[/b][/purple])[/purple][red]/[/red][red];[/red]
	[black][b]my[/b][/black] [red]([/red][blue]$key[/blue], [blue]$val[/blue][red])[/red] = [red]([/red][blue]$1[/blue], [blue]$2[/blue][red])[/red][red];[/red]
	
	[blue]$hash[/blue][red]{[/red][blue]$key[/blue][red]}[/red] = [blue]$val[/blue][red];[/red]
[red]}[/red]

[black][b]use[/b][/black] [green]Data::Dumper[/green][red];[/red]
[url=http://perldoc.perl.org/functions/print.html][black][b]print[/b][/black][/url] [maroon]Dumper[/maroon][red]([/red]\[blue]%hash[/blue][red])[/red][red];[/red]

[teal]__DATA__[/teal]
[teal].raf_group(first line);[/teal]
[teal].licoln_group(second line);[/teal]
[teal].licoln_raf_group(fitth line); //templated[/teal]
[teal]wired .raf_group_firm     ({help me please});   //templated[/teal]
[teal].raf_group        (raf, lee you)  [/teal]
[tt]------------------------------------------------------------
Pragmas (perl 5.8.8) used :
[ul]
[li]strict - Perl pragma to restrict unsafe constructs[/li]
[/ul]
Core (perl 5.8.8) Modules used :
[ul]
[li]Data::Dumper - stringified perl data structures, suitable for both printing and eval[/li]
[/ul]
[/tt]

- Miller
 
I was able to pattern match what i want but when I tried to put the data into the hash table I did not get the result that I wanted. below is the original file. the result I will like to have is to save the first half of the data as the key and the other half as the value. But some of the data doesn't have space between them and the code you gave me won't work. some of the sentence appears to be together. Below is the data. for example the first line below wont split into two because there is no space between the first word and the other. The result that I will like is to have :
.lo_gr8 => (w_c1_3_n[1:8])
.lo_gr0 => (c_h1_6_i[80:0]),
.up_gr0 => (p_n[47:40]), // LPC



###########################################################
.lo_gr8(w_c1_3_n[1:8]),

.lo_gr0(c_h1_6_i[80:0]),

.up_gr7(p_h_4_n[47:40]),

.up_gr2(p_0_d_n[100:32]),

.lo_gr0 ( c_h1_6_i[80:0] ),

.lo_gr1 ( p_h_4_[47:40] ),

.up_gr2 ( c_h1_6_[80:0] ),

.lo_gr0 (p_4_n[47:40]),

.lo_gr9 ({a_y, p_h_4_n, go, p_h_4_n[47:40]}),

.lo_gr3( p_h_n[90:40] ),

.up_gr0 (p_n[47:40]), // LPC
.lo_gr8(_4_n[47:40]),




 
Miller has \s* in his regexp which means zero or more spaces. So if there are no spaces it should still parse correctly. The problem might be the square brackets, which represents a character class in perl, and which you did not include in your original post.

------------------------------------------
- Kevin, perl coder unexceptional! [wiggle]
 
actually, I did not use Miller's pattern matching, I use a different one that works for what I need it for. the only problem now is to split each line into two and store them in hash. I was able to split the lines spaces in them with this code but not the line that does not have space in between.
$info = (split ' ', $_)[1]; //this gave me the first half of the setence (the one with the space ) and the whole setence for the one without space.
 
post your code/regex

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[noevil]
Travis - Those who say it cannot be done are usually interrupted by someone else doing it; Give the wrong symptoms, get the wrong solutions;
 

use strict;

my %hash = ();
open (DATA, "c:\\sample.txt") or "can't";
while (<DATA>)
{
next if ! /^(\.\w*)\s*(\(.*\))/;;
#print "$_\n";
#the next 2 lines is for taken out whitespace from the data
$data = $_;
$data =~ s/\s+//;
$data =~ s/^\s+$//;
#below is where I want to split the line into two
my ($key, $val) = ($1, $2);
$hash{$key} = $val;

}



 
You have repeating keys in the file so it won't work with a one dimensional hash.

------------------------------------------
- Kevin, perl coder unexceptional! [wiggle]
 
do anyone know the best way to capture pattern match and capture the data from the begining of a delimeter to the end of the delimeter.
see sample data below



/* its a marvelous day and the weather is warm
now is the time to settle our differences
and embrace each other with one love */

I HAVE COUPLE OF THIS DATA IN COUPLE FILES THAT I WILL LIKE TO GREP. COULD YOU PLEASE HELP? THANKS
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top