Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Chriss Miller on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

parsing input 2

Status
Not open for further replies.

columb

IS-IT--Management
Joined
Feb 5, 2004
Messages
1,231
Location
EU
I'm using perl to report on users on an AIX system. The input is
Code:
foreach ( `lsuser -a id pgrp gecos ALL` )
For those who don't know AIX the input lines look something like
Code:
fblogs id=12345 pgrp=usergroup gecos=Fred Blogs
This is what I'm trying to parse but you should note that the 'gecos' field may, or may not exist.
My first thought was to use somethign like
Code:
my ( $username, @data ) = split /\s+/;
foreach ( @data )
  {
   my ( $key, $value ) = split /=/;
  $userdata{$key} = $value;
  }
This works fine except that the gecos field has a space in it. I've wracked my brains trying to find a way to parse it. Any ideas would be very welcome.

Thanks

Ceci n'est pas une signature
Columb Healy
 
A LIMIT can be specified while using split().LIMIT your split to only four so that gecos=Fred Blogs will be treated as one key value pair as a whole.
Code:
my ( $username, @data ) = split ( /\s+/,$_,4 ) ;


--------------------------------------------------------------------------
I never set a goal because u never know whats going to happen tommorow.
 
Thanks spookie, have a star!

Ceci n'est pas une signature
Columb Healy
 
Another approach:

Code:
my ( $username, @data ) = split /\s+/;
foreach ( @data ) {
   s/\s+(\w+=)/;;$1/g;
   my ( $key, $value ) = split /;;/;
   $userdata{$key} = $value;
}

replace ';;' with any character(s) not likely to be found in the string being processed.


------------------------------------------
- Kevin, perl coder unexceptional! [wiggle]
 
Or just use a regular expression and accomplish what you want in a single line:

Code:
my $line = 'fblogs id=12345 pgrp=usergroup gecos=Fred Blogs';

[COLOR=green]my %userdata = $line =~ m/(\w+)=([^=]*\w(?!\w|=))/g;[/color]

# And the output is:
use Data::Dumper;
print Dumper(\%userdata);

# $VAR1 = {
#          'gecos' => 'Fred Blogs',
#          'pgrp' => 'usergroup',
#          'id' => '12345'
#        };

- Miller
 
Miller
Thanks for a concise and elegant solution. However, there is one major problem, I can't deconstruct the regular expression and I refuse to use code I don't understand!

I can see that the first set of brackets are word characters before the equals sign, and then the second set starts with multiple characters which are not '=' but then I get lost. In particular the nested brackets - '(?!\w|=)' - just leaves me baffled. I've spent hours reading the perlre pod and I'm getting nowhere. Can I bother you for a little more help?

Ceci n'est pas une signature
Columb Healy
 
Hello Columb,

Actually, I can simplify my regex slightly, which might make it easier to understand:

Code:
my $line = 'fblogs id=12345 pgrp=usergroup gecos=Fred Blogs';

[COLOR=green]my %userdata = $line =~ m/(\w+)=([^=]*)(?=\s|$)/g;[/color]

# And the output is:
use Data::Dumper;
print Dumper(\%userdata);

# $VAR1 = {
#          'gecos' => 'Fred Blogs',
#          'pgrp' => 'usergroup',
#          'id' => '12345'
#        };

The main thing that you need to understand is zero width assertions. My first regex did a zero width negative lookahead assertion. This latest one does a zero width positive lookahead assertion. The documentation for this feature can be found here:


In english, my regex translates to:

Code:
[url=http://perldoc.perl.org/functions/my.html][black][b]my[/b][/black][/url] [blue]%userdata[/blue] = [blue]$line[/blue] =~ [red]m{[/red][purple][/purple]
[purple]	([b]\w[/b]+)			# One Word (Captured).[/purple]
[purple]	=				# Followed by an equal sign.[/purple]
[purple]	([^=]*)			# Followed by as many non "equal sign"[/purple]
[purple]					# characters [remember greedy matching][/purple]
[purple]					# (captured).[/purple]
[purple]	(?=[b]\s[/b]|$)		# That are then followed by either a[/purple]
[purple]					# space or end of string. [In other words,[/purple]
[purple]					# your record separator].[/purple]
[red]}[/red][red]xg[/red][red];[/red]

Anyway, if that doesn't make it clear enough, I can try to explain some more.

- Miller
 
Thanks Miller

I've got it now.

Ceci n'est pas une signature
Columb Healy
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top