How to Split Fields of Data in an input file 4

ljsmith91 · Jan 26, 2005

New to PERL, hopefully this is simple.

If I have a text file with lots of lines of data broken up into say 25 columns or fields. The fields are seperated by spaces and are fixed length across the file but some of the fields are blank(spaces).

I need to write a PERL script that creates a report on ONLY specific fields (ex. fields 1,2,5, 7,17, 19,22). How best can I define the fields to scalars in order to print them ?

Example Input file:

Server Location BkUp Date time mbytes files
serv1 Atl yes 01/22/05 10:01 569333 5621
serv2 Chi no 23:00 3218 112
serv3 LA yes 01/22/05 34233 1231

Ideally I would like to identify each field as I read it. I used SPLIT to break the fields up after I opened up the file but then recognized the blank fields and had to come up with another solution. And being a newby, I am not sure what that might be. Here is the SPLIT code:

while($line = <INPUTFILE>){
@fields = split(' ',$line);
$field_serv = $fields [0];
$field_loc = $fields [1];
$field_bkup = $fields [2];
$field_date = $fields [3];

How do I break the fields up and assign them to variables as each line is read ? What is the best way of doing this as SPLIT doesnt appear to provide a solution because of the occasional blank fields.

Any feedback would be great. Thanks.

PaulTEG · Jan 26, 2005

this data is more than likely tab delimeted

Code:

@data=split /\t/, $line;

consider this feedback

--Paul

cigless ...

ljsmith91 · Jan 26, 2005

PaulTEG,

Thanks...I tested and it is not tab delimeted. Again, the fields are fixed length although my example file does not show that. So, I could determine that columns 1- 8 are the server field etc etc but I know not how to PERL code for each field that way. I am hoping there is some easy way of doing that which I struggle so much with.

tchatzi · Jan 26, 2005

How many spaces are between 'serv1' and 'Atl'

how many between 'Atl' and 'yes'

............

how many between '569333' and '5621'

these are the things that have to be determined !

KevinADC · Jan 26, 2005

if you are unsure of how many spaces there are you can use the + quantifier, which means one or more:

@data=split /\s+/, $line;

it will also split on tabs

ljsmith91 · Jan 26, 2005

The problem is that there is no way of knowing how many spaces reside between the fields especially when some of the fields contain empty data(all spaces). The field length is fixed across the file for every field BUT the data residing in each field is variable length so there is no way of determining the space counts. Doesn't make sense to do so anyway when there may be empty fields, I would be pulling data from one field thinking it was for another.

So is there no way of parsing the data in a file via PERL by assigning columns 1-7 as $field1, 8-13 is $field2...etc etc ????

I guess this is what I am in need of, just have no clue as to whether its doable at this point.

Thanks all.

mikevh · Jan 26, 2005

ljsmith91,

If your dat fields are fixed-length, the substr() function is the way to go. You'll need to know the column where each field begins and the max length of each field.
(String indexing, like array indexing, is zero-based in Perl, so the first column is 0.) E.g.

Code:

while (my $line = <INPUTFILE>){
  my $field_serv = [b]substr($line, 0, ??)[/b];
  my $field_loc  = [b]substr($line, ??, ??)[/b];
  my $field_bkup = [b]substr($line, ??, ??)[/b];
  my $field_date = [b]substr($line, ??, ??)[/b];
  ...

Doing it this way will work, but it's very tedious. (I've done it many times, unfortunately.)

Here's a somewhat more streamlined way that uses an array of starting positions and field lengths and a hash.

Code:

#!perl
use strict;
#use warnings;

#start position, length for data fields
my @cols =([0,5], [7,3], [16,3], [23,8], [32,5], [40,6], [47,4]);

my %h;
my @headers;
while (my $line = <DATA>) {
    chomp $line;
    if ($. == 1) {
        @headers = split /\s+/, $line;
        next;
    }
    for (my $i=0; $i<@headers; $i++) {
        my ($start, $len) = @{$cols[$i]};
        $h{$headers[$i]} = substr($line, $start, $len); 
        $h{$headers[$i]} =~ s/\s+$//; #trim trailing blanks
    }
    for my $k (@headers) {
        print qq($k => $h{$k}\n);
    }
    print "\n";
}
    
__DATA__
Server Location BkUp   Date     time    mbytes files
serv1  Atl      yes    01/22/05 10:01   569333 5621
serv2  Chi      no              23:00   3218   112
serv3  LA       yes    01/22/05         34233  1231

Output with the data you posted:

Code:

Server => serv1
Location => Atl
BkUp => yes
Date => 01/22/05
time => 10:01
mbytes => 569333
files => 5621

Server => serv2
Location => Chi
BkUp => no
Date => 
time => 23:00
mbytes => 3218
files => 112

Server => serv3
Location => LA
BkUp => yes
Date => 01/22/05
time => 
mbytes => 34233
files => 1231

HTH

P.S. Next time you need to post fixed-length data, put itt inside code tags to preserve the column positions. If you don't know what I mean by "code tags," click on "Process TGML" to find out about this and other options you can use when posting here.

mikevh · Jan 27, 2005

Here's a slightly different version that factors the line-parsing and blank-trimming out into subroutines.

Code:

#!perl
use strict;
use warnings;

my @fieldwidths = (6, 9, 7, 9, 8, 7, 5);

my %h;
my @headers;
while (my $line = <DATA>) {
    chomp $line;
    my @data = map {trim($_)} parsefixed($line, @fieldwidths);
    if ($. == 1) {
        @headers = @data;
        next;
    }
    @h{@headers} = @data;
    for my $k (@headers) {
        print qq($k => $h{$k}\n);
    }
    print "\n";
}

sub parsefixed {
    #Parse a line of fixed-width data into an array based on @fieldwidths.
    my ($line, @fieldwidths) = @_;
    my @data;
    my $start = 0;
    for (my $i=0; $i<@fieldwidths; $i++) {
        my $len = $fieldwidths[$i];
        $data[$i] = substr($line, $start, $len);
        $start += $len;
    }
    return @data;
}

sub trim {
    #Trim leading/trailing whitespace;
    local $_ = shift;
    s/^\s+//;
    s/\s+$//;
    $_;
}
        
__DATA__
Server Location BkUp   Date     time    mbytes files
serv1  Atl      yes    01/22/05 10:01   569333 5621
serv2  Chi      no              23:00   3218   112
serv3  LA       yes    01/22/05         34233  1231

Same output as before.

mikevh · Jan 27, 2005

The final refinement. (I hope.) Here's an improved version of parsefixed which allows @fieldwidths to be a flat list with just fieldwidths (as in the last example posted), or a list of array references with [start, length] pairs. parsefixed uses the ref function to determine which it is.

Code:

sub parsefixed {
    # Parse a line of fixed-width data into an array based on @fieldwidths.
    # @fieldwidths may be a flat list with just widths, e.g. 
    # (6, 9, 7, 9, 8, 7, 5)
    # or a list of array refs with [start,len] elems, e.g.
    # ([0,6], [7,9], [16,7], [23,9], [32,8], [40,7], [47,5])

    my ($line, @fieldwidths) = @_;
    #is this a flat list or a list of array refs?
    my $isref = ref($fieldwidths[0]);
    my @data;
    my $start = 0;
    my $len;
    for (my $i=0; $i<@fieldwidths; $i++) {
        if ($isref) {
            ($start, $len) = @{$fieldwidths[$i]};
        } else {
            $len = $fieldwidths[$i];
        }
        $data[$i] = substr($line, $start, $len);
        $start += $len unless $isref;
    }
    return @data;
}

That's it. I'm gonna go read a book r something ...

ljsmith91 · Jan 27, 2005

mikevh,

It does look more tedious than I was hoping but at least there is a way and that's what's important. Thanks for showing me that way and for making it so easy.

I think I understand everything you are doing and that is a good thing. Sometimes I cannot learn until I actually see examples. It was great of you to place it all out there for me. Now I see why your rated as one of the best so often.

Have a great day. Thanks.

ljs

mikevh · Jan 27, 2005

You're welcome. Good luck.

mlibeson · Jan 27, 2005

instead of using substr(), you could use unpack() which is more efficient.

Michael Libeson

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

How to Split Fields of Data in an input file 4

ljsmith91

Programmer

PaulTEG

Technical User

ljsmith91

Programmer

tchatzi

Technical User

KevinADC

Technical User

ljsmith91

Programmer

mikevh

Programmer

mikevh

Programmer

mikevh

Programmer

ljsmith91

Programmer

mikevh

Programmer

mlibeson

Programmer

Similar threads

Part and Inventory Search

Sponsor