parsing input 2

columb · Apr 2, 2007

I'm using perl to report on users on an AIX system. The input is

Code:

foreach ( `lsuser -a id pgrp gecos ALL` )

For those who don't know AIX the input lines look something like

Code:

fblogs id=12345 pgrp=usergroup gecos=Fred Blogs

This is what I'm trying to parse but you should note that the 'gecos' field may, or may not exist.
My first thought was to use somethign like

Code:

my ( $username, @data ) = split /\s+/;
foreach ( @data )
  {
   my ( $key, $value ) = split /=/;
  $userdata{$key} = $value;
  }

This works fine except that the gecos field has a space in it. I've wracked my brains trying to find a way to parse it. Any ideas would be very welcome.

Thanks

Ceci n'est pas une signature
Columb Healy

spookie · Apr 2, 2007

A LIMIT can be specified while using split().LIMIT your split to only four so that gecos=Fred Blogs will be treated as one key value pair as a whole.

Code:

my ( $username, @data ) = split ( /\s+/,$_,4 ) ;

http://perldoc.perl.org/functions/split.html

--------------------------------------------------------------------------
I never set a goal because u never know whats going to happen tommorow.

columb · Apr 2, 2007

Thanks spookie, have a star!

Ceci n'est pas une signature
Columb Healy

KevinADC · Apr 2, 2007

Another approach:

Code:

my ( $username, @data ) = split /\s+/;
foreach ( @data ) {
   s/\s+(\w+=)/;;$1/g;
   my ( $key, $value ) = split /;;/;
   $userdata{$key} = $value;
}

replace ';;' with any character(s) not likely to be found in the string being processed.

------------------------------------------
- Kevin, perl coder unexceptional! [wiggle]

MillerH · Apr 2, 2007

Or just use a regular expression and accomplish what you want in a single line:

Code:

my $line = 'fblogs id=12345 pgrp=usergroup gecos=Fred Blogs';

[COLOR=green]my %userdata = $line =~ m/(\w+)=([^=]*\w(?!\w|=))/g;[/color]

# And the output is:
use Data::Dumper;
print Dumper(\%userdata);

# $VAR1 = {
#          'gecos' => 'Fred Blogs',
#          'pgrp' => 'usergroup',
#          'id' => '12345'
#        };

- Miller

columb · Apr 4, 2007

Miller
Thanks for a concise and elegant solution. However, there is one major problem, I can't deconstruct the regular expression and I refuse to use code I don't understand!

I can see that the first set of brackets are word characters before the equals sign, and then the second set starts with multiple characters which are not '=' but then I get lost. In particular the nested brackets - '(?!\w|=)' - just leaves me baffled. I've spent hours reading the perlre pod and I'm getting nowhere. Can I bother you for a little more help?

Ceci n'est pas une signature
Columb Healy

MillerH · Apr 4, 2007

Hello Columb,

Actually, I can simplify my regex slightly, which might make it easier to understand:

Code:

my $line = 'fblogs id=12345 pgrp=usergroup gecos=Fred Blogs';

[COLOR=green]my %userdata = $line =~ m/(\w+)=([^=]*)(?=\s|$)/g;[/color]

# And the output is:
use Data::Dumper;
print Dumper(\%userdata);

# $VAR1 = {
#          'gecos' => 'Fred Blogs',
#          'pgrp' => 'usergroup',
#          'id' => '12345'
#        };

The main thing that you need to understand is zero width assertions. My first regex did a zero width negative lookahead assertion. This latest one does a zero width positive lookahead assertion. The documentation for this feature can be found here:

http://perldoc.perl.org/perlretut.html#Looking-ahead-and-looking-behind

In english, my regex translates to:

Code:

[url=http://perldoc.perl.org/functions/my.html][black][b]my[/b][/black][/url] [blue]%userdata[/blue] = [blue]$line[/blue] =~ [red]m{[/red][purple][/purple]
[purple]	([b]\w[/b]+)			# One Word (Captured).[/purple]
[purple]	=				# Followed by an equal sign.[/purple]
[purple]	([^=]*)			# Followed by as many non "equal sign"[/purple]
[purple]					# characters [remember greedy matching][/purple]
[purple]					# (captured).[/purple]
[purple]	(?=[b]\s[/b]|$)		# That are then followed by either a[/purple]
[purple]					# space or end of string. [In other words,[/purple]
[purple]					# your record separator].[/purple]
[red]}[/red][red]xg[/red][red];[/red]

Anyway, if that doesn't make it clear enough, I can try to explain some more.

- Miller

columb · Apr 4, 2007

Thanks Miller

I've got it now.

Ceci n'est pas une signature
Columb Healy

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

parsing input 2

columb

IS-IT--Management

spookie

Programmer

columb

IS-IT--Management

KevinADC

Technical User

MillerH

Programmer

columb

IS-IT--Management

MillerH

Programmer

columb

IS-IT--Management

Similar threads

Part and Inventory Search

Sponsor