Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Chriss Miller on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Reading a file by block, optimizing reg exp 4

Status
Not open for further replies.

dmazzini

Programmer
Joined
Jan 20, 2004
Messages
480
Location
US
Hi guys

I have a file that looks like:

Code:
[WBTS]
RncId
WBTSId
template_name
template_set
ATMInterfaceID
COCOId
VCI
VPI
siteId
name
BTSAdditionalInfo
WBTSChangeOrigin
ManagedBy
UserDefinedState
status
last_modified
last_actual_import

[WCEL]
RncId
WBTSId
LcrId
template_name
template_set
CId
FachDataAllowedTotal
LAC
SAC
URAId1
URAId2
URAId3
URAId4
URAId5
URAId6
URAId7
URAId8
MaxNumbUECMcoverHO
name
CellAdditionalInfo
SACB
WCELChangeOrigin
status
last_modified	
last_actual_import

[WCEL_AC]
RncId
WBTSId
LcrId
template_name
template_set
...
....

I have to parse this file and create a structure where
Object_class and Parameter name are the keys.

Below part of the code:

Code:
    my %PARAM;
    open IN, $default_parameters_to_exclude_file or die $!;
    local $/ = "\n\n";
    while (my $intext = <IN>) {  
        $intext = trim($intext);  chomp $intext;
        if ($intext =~ /^\[(\S+)\]\n(.*)/s) {
            my ($objectclass,$parameters) = ($1,$2);
            foreach my $parameter(split /\n/,$parameters){
                    $parameter =trim ($parameter);
                    chomp $parameter;
                    @{$PARAM{$objectclass}{$parameter}}=1;
            }   
        }   
    }     
	
	  close IN;

Condition to use in other sub-routine

Code:
if (defined (@{$PARAM{$objectclass}{$parameter}}){next};


I have re-used an old post from tek-tips for the reg exp. I am wondering if I can optimize it instead of use split function.




dmazzini
GSM/UMTS System and Telecomm Consultant

 
Actually hash is:

Code:
$PARAM{$objectclass}{$parameter}=1;



dmazzini
GSM/UMTS System and Telecomm Consultant

 
All you need it these two lines?

[WCEL_AC]
RncId

------------------------------------------
- Kevin, perl coder unexceptional! [wiggle]
 
What about this (untested):
Code:
my %PARAM;
open IN,$default_parameters_to_exclude_file or die$!;
local($_);
my$objectclass;
while(<IN>){  
  chomp;
  s/^\s+//;
  s/\s+$//;
  if(/^\[(\S+)\]$/){
    $objectclass=$1;
  }else{
    $PARAM{$objectclass}{$_}=1 if$_&&$objectclass;
  }   
}     
close IN;
Didn't use your [tt]trim[/tt] function as it is easily replaced by regexps.
Also some more care about the integrity of your data may be in order. As an example, regexp [tt]/^\[(\S+)\]$/[/tt] will not match if a whitespace is within brackets.
And excuse my concise way of writing...

prex1
: Online tools for structural design
: Magnetic brakes for fun rides
: Air bearing pads
 
Hi Kevin

Actually I want to associate the values inside brakes[] with their parameters . e.g.

Code:
$PARAM{WBTS}{RncID}=1;
$PARAM{WBTS}{WBTSId}=1;
$PARAM{WBTS}{template_name}=1;
....
....
$PARAM{WCEL}{RncID}=1;
$PARAM{WCEL}{WBTSId}=1;
....
....
[code]

Prex1, thanks for your post, it works fine.:-)

Regarding trim function, I have it due to I have to trim many times in different parts of the script, it has the same reg exp that you used. 
Original script does a lot of stuff, this is just a part of it.

The main reason that I posted this thread, it's because I would like to learn how to parse this file using regex that will divide blocks of data. e.g $1=WBTS and then $2=all parameters belong to $WBTS. then next record..you know like Data Munging stuff...



dmazzini
GSM/UMTS System and Telecomm Consultant
 
More ideas are welcome :-)

dmazzini
GSM/UMTS System and Telecomm Consultant

 
You could process each line within a block (or even skip breaking the input into blocks) while keeping track of the class name. The following works with the sample data you provided.

Code:
my %PARAM;
while (<DATA>) {
    if (/^\s*\[(\w+)\]\s*$/) {          # [\w+]
        my $line;
        PARAMS: while ($line = <DATA>) {
            if ($line =~ /^\s*$/) {
                last PARAMS;
            } else {
                chomp $line;
                $PARAM{$1}->{$line} = 1;
            }
        }
    }
}
 
dmazzini,
I'd actually work pretty much like you did, except for not using split (which is what you're asking).

Code:
my %PARAM;
{
  local $/ = "\n\n";
  while(<DATA>) {
    chomp;
    s/^\s+//;
    s/\s+$//;
    my ($objectclass,$params) = /^\[(.+)\]\n(.*)/s;
    $PARAM{$objectclass}{$1}=1 while($params =~ /(.+?)\n/sg);
  }
}

The /g modifier returns each match in turn.
 
You could modify the last regex in the while statement to:
Code:
/\s+(.+?)\s+\n/sg
if you want to trim whitespace from each parameter line also (which you probably DO want to do).
 
Thanks everybody for your posts.

I was looking more into brigmar solution. Regexp per block of data.

Brigmar,your last reg expression did not work:

Code:
/\s+(.+?)\s+\n/sg

probably because the "space" at the beginning of the line. But your firts post did the trick. The "while" plus regexp was something that I would never imagine. Cool :-)





dmazzini
GSM/UMTS System and Telecomm Consultant

 
I like to use labels for this type of task:

Code:
[url=http://perldoc.perl.org/functions/use.html][black][b]use[/b][/black][/url] [green]strict[/green][red];[/red]
[black][b]use[/b][/black] [green]warnings[/green][red];[/red]
[black][b]use[/b][/black] [green]Data::Dumper[/green][red];[/red]
[url=http://perldoc.perl.org/functions/my.html][black][b]my[/b][/black][/url] [blue]$default_parameters_to_exclude_file[/blue] = [red]'[/red][purple]c:/test/test.txt[/purple][red]'[/red][red];[/red]
[black][b]my[/b][/black] [blue]%PARAM[/blue][red];[/red]
[url=http://perldoc.perl.org/functions/open.html][black][b]open[/b][/black][/url] IN, [blue]$default_parameters_to_exclude_file[/blue] or [url=http://perldoc.perl.org/functions/die.html][black][b]die[/b][/black][/url] [red]"[/red][purple][blue]$![/blue][/purple][red]"[/red][red];[/red]
[maroon]OUTTER[/maroon][maroon]:[/maroon] [olive][b]while[/b][/olive] [red]([/red] [black][b]my[/b][/black] [blue]$intext[/blue] = <IN>[red])[/red] [red]{[/red]
   [blue]$intext[/blue] = [maroon]trim[/maroon][red]([/red][blue]$intext[/blue][red])[/red][red];[/red]
   [olive][b]if[/b][/olive] [red]([/red][blue]$intext[/blue] =~ [red]/[/red][purple]^[purple][b]\[[/b][/purple]([purple][b]\S[/b][/purple]+)[purple][b]\][/b][/purple][/purple][red]/[/red][red])[/red] [red]{[/red]
      [black][b]my[/b][/black] [blue]$objectclass[/blue] = [blue]$1[/blue][red];[/red]
      [maroon]INNER[/maroon][maroon]:[/maroon] [olive][b]while[/b][/olive] [red]([/red][black][b]my[/b][/black] [blue]$param[/blue] = <IN>[red])[/red] [red]{[/red]
         [olive][b]next[/b][/olive] OUTTER [olive][b]if[/b][/olive] [red]([/red][blue]$param[/blue] =~ [red]/[/red][purple]^[purple][b]\s[/b][/purple]*$[/purple][red]/[/red][red])[/red][red];[/red] 
         [blue]$param[/blue] = [maroon]trim[/maroon][red]([/red][blue]$param[/blue][red])[/red][red];[/red]
         [url=http://perldoc.perl.org/functions/chomp.html][black][b]chomp[/b][/black][/url] [blue]$param[/blue][red];[/red]
         [blue]$PARAM[/blue][red]{[/red][blue]$objectclass[/blue][red]}[/red][red]{[/red][blue]$param[/blue][red]}[/red]=[fuchsia]1[/fuchsia][red];[/red]
      [red]}[/red]   
   [red]}[/red]   
[red]}[/red]     
[url=http://perldoc.perl.org/functions/close.html][black][b]close[/b][/black][/url] IN[red];[/red]
[url=http://perldoc.perl.org/functions/print.html][black][b]print[/b][/black][/url] Dumper \[blue]%PARAM[/blue][red];[/red]
[url=http://perldoc.perl.org/functions/sub.html][black][b]sub[/b][/black][/url] [maroon]trim[/maroon] [red]{[/red]
   [black][b]my[/b][/black] [blue]$t[/blue] = [url=http://perldoc.perl.org/functions/shift.html][black][b]shift[/b][/black][/url][red];[/red]
   [blue]$t[/blue] =~ [red]s/[/red][purple]^[purple][b]\s[/b][/purple]+[/purple][red]/[/red][purple][/purple][red]/[/red][red];[/red]
   [blue]$t[/blue] =~ [red]s/[/red][purple][purple][b]\s[/b][/purple]+$[/purple][red]/[/red][purple][/purple][red]/[/red][red];[/red]   
   [url=http://perldoc.perl.org/functions/return.html][black][b]return[/b][/black][/url] [blue]$t[/blue][red];[/red]
[red]}[/red]
[tt]------------------------------------------------------------
Pragmas (perl 5.8.8) used :
[ul]
[li]strict - Perl pragma to restrict unsafe constructs[/li]
[li]warnings - Perl pragma to control optional warnings[/li]
[/ul]
Core (perl 5.8.8) Modules used :
[ul]
[li]Data::Dumper - stringified perl data structures, suitable for both printing and eval[/li]
[/ul]
[/tt]



------------------------------------------
- Kevin, perl coder unexceptional! [wiggle]
 
Thanks Kevin for reply too! I was waiting for it too :-)
It worked good!

Appreciate all posts.

Just FYI, Brigmar, testing hash structure using your code, I saw that last parameter "last_actual_import" per object class is never captured.


Code:
[url=http://perldoc.perl.org/functions/use.html][black][b]use[/b][/black][/url] [green]strict[/green][red];[/red]
[black][b]use[/b][/black] [green]warnings[/green][red];[/red]
[black][b]use[/b][/black] [green]Data::Dumper[/green][red];[/red]

[url=http://perldoc.perl.org/functions/my.html][black][b]my[/b][/black][/url] [blue]%PARAM[/blue][red];[/red]
[red]{[/red]
  [url=http://perldoc.perl.org/functions/local.html][black][b]local[/b][/black][/url] [blue]$/[/blue] = [red]"[/red][purple][purple][b]\n[/b][/purple][purple][b]\n[/b][/purple][/purple][red]"[/red][red];[/red]
  [olive][b]while[/b][/olive][red]([/red]<DATA>[red])[/red] [red]{[/red]
    [url=http://perldoc.perl.org/functions/chomp.html][black][b]chomp[/b][/black][/url][red];[/red]
    [red]s/[/red][purple]^[purple][b]\s[/b][/purple]+[/purple][red]/[/red][purple][/purple][red]/[/red][red];[/red]
    [red]s/[/red][purple][purple][b]\s[/b][/purple]+$[/purple][red]/[/red][purple][/purple][red]/[/red][red];[/red]
    [black][b]my[/b][/black] [red]([/red][blue]$objectclass[/blue],[blue]$params[/blue][red])[/red] = [red]/[/red][purple]^[purple][b]\[[/b][/purple](.+)[purple][b]\][/b][/purple][purple][b]\n[/b][/purple](.*)[/purple][red]/[/red][red]s[/red][red];[/red]
    [blue]$PARAM[/blue][red]{[/red][blue]$objectclass[/blue][red]}[/red][red]{[/red][blue]$1[/blue][red]}[/red]=[fuchsia]1[/fuchsia] [olive][b]while[/b][/olive][red]([/red][blue]$params[/blue] =~ [red]/[/red][purple](.+?)[purple][b]\n[/b][/purple][/purple][red]/[/red][red]sg[/red][red])[/red][red];[/red]
  [red]}[/red]
[red]}[/red]

[url=http://perldoc.perl.org/functions/print.html][black][b]print[/b][/black][/url] Dumper \[blue]%PARAM[/blue][red];[/red]

[teal]__DATA__[/teal]

[teal]# 3G Variable Parameters per Object Class[/teal]
[teal]# RF Datafill should not be compared against Default parameters for list below[/teal]



[teal][WBTS][/teal]
[teal]RncId[/teal]
[teal]WBTSId[/teal]
[teal]template_name[/teal]
[teal]template_set[/teal]
[teal]ATMInterfaceID[/teal]
[teal]COCOId[/teal]
[teal]VCI[/teal]
[teal]VPI[/teal]
[teal]siteId[/teal]
[teal]name[/teal]
[teal]BTSAdditionalInfo[/teal]
[teal]WBTSChangeOrigin[/teal]
[teal]ManagedBy[/teal]
[teal]UserDefinedState[/teal]
[teal]status[/teal]
[teal]last_modified[/teal]
[teal]last_actual_import[/teal]

[teal][WCEL][/teal]
[teal]RncId[/teal]
[teal]WBTSId[/teal]
[teal]LcrId[/teal]
[teal]template_name[/teal]
[teal]template_set[/teal]
[teal]CId[/teal]
[teal]FachDataAllowedTotal[/teal]
[teal]LAC[/teal]
[teal]SAC[/teal]
[teal]URAId1[/teal]
[teal]URAId2[/teal]
[teal]URAId3[/teal]
[teal]URAId4[/teal]
[teal]URAId5[/teal]
[teal]URAId6[/teal]
[teal]URAId7[/teal]
[teal]URAId8[/teal]
[teal]MaxNumbUECMcoverHO[/teal]
[teal]name[/teal]
[teal]CellAdditionalInfo[/teal]
[teal]SACB[/teal]
[teal]WCELChangeOrigin[/teal]
[teal]status[/teal]
[teal]last_modified	[/teal]
[teal]last_actual_import[/teal]

[teal][WCEL_AC][/teal]
[teal]RncId[/teal]
[teal]WBTSId[/teal]
[teal]LcrId[/teal]
[teal]template_name[/teal]
[teal]template_set[/teal]
[teal]status[/teal]
[teal]last_modified[/teal]
[teal]last_actual_import[/teal]

[teal][WCEL_PS][/teal]
[teal]RncId[/teal]
[teal]WBTSId[/teal]
[teal]LcrId[/teal]
[teal]template_name[/teal]
[teal]template_set[/teal]
[teal]status[/teal]
[teal]last_modified[/teal]
[teal]last_actual_import[/teal]

[teal][ADJS][/teal]
[teal]RncId[/teal]
[teal]WBTSId[/teal]
[teal]LcrId [/teal]
[teal]template_name[/teal]
[teal]template_set[/teal]
[teal]AdjsCI[/teal]
[teal]AdjsLAC[/teal]
[teal]AdjsRAC[/teal]
[teal]AdjsRNCid[/teal]
[teal]AdjsScrCode[/teal]
[teal]NrtHopsIdentifier[/teal]
[teal]RtHopsIdentifier[/teal]
[teal]TargetCellDN[/teal]
[teal]name[/teal]
[teal]ADJSChangeOrigin[/teal]
[teal]HSDPAHopsIdentifier[/teal]
[teal]RTWithHSDPAHopsIdentifier[/teal]
[teal]status[/teal]
[teal]last_modified[/teal]
[teal]last_actual_import[/teal]

[teal][ADJG][/teal]
[teal]RncId[/teal]
[teal]WBTSId[/teal]
[teal]LcrId[/teal]
[teal]template_name[/teal]
[teal]template_set[/teal]
[teal]AdjgBCC[/teal]
[teal]AdjgBCCH[/teal]
[teal]AdjgCI[/teal]
[teal]AdjgLAC[/teal]
[teal]NrtHopgIdentifier[/teal]
[teal]RtHopgIdentifier[/teal]
[teal]TargetCellDN[/teal]
[teal]name[/teal]
[teal]ADJGChangeOrigin[/teal]
[teal]status[/teal]
[teal]last_modified[/teal]
[teal]last_actual_import[/teal]

[teal][ADJW][/teal]
[teal]bscId[/teal]
[teal]bcfId	[/teal]
[teal]btsId[/teal]
[teal]template_name[/teal]
[teal]template_set[/teal]
[teal]AdjwCId[/teal]
[teal]lac [/teal]
[teal]rncId	[/teal]
[teal]scramblingCode	[/teal]
[teal]sac	[/teal]
[teal]targetCellDN	[/teal]
[teal]uarfcn	[/teal]
[teal]name [/teal]
[teal]status[/teal]
[teal]last_modified[/teal]
[teal]last_actual_import[/teal]
[tt]------------------------------------------------------------
Pragmas (perl 5.8.8) used :
[ul]
[li]strict - Perl pragma to restrict unsafe constructs[/li]
[li]warnings - Perl pragma to control optional warnings[/li]
[/ul]
Core (perl 5.8.8) Modules used :
[ul]
[li]Data::Dumper - stringified perl data structures, suitable for both printing and eval[/li]
[/ul]
[/tt]


Cheers



dmazzini
GSM/UMTS System and Telecomm Consultant

 
Needs an extra newline at the very end of the file to work as coded. Maybe this will correct that problem:

$PARAM{$objectclass}{$1}=1 while($params =~ /(.+?)\n?/sg);

but honestly, using split() is probably more efficient than the above "while" loop, which has to evaluate the condition before each loop. It would be interesting to benchmark the two methods.


------------------------------------------
- Kevin, perl coder unexceptional! [wiggle]
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top