Strange split question . . .

youthman · May 5, 2005

I have a database that is not dilimeted in any way other than the size. (For example, Column 1 is 14 characters, Column 2 is 5 characters, Column 3 is 140, 4 is 2,5 is 1 etc.) Is there a split comand that will take the various lengths and put them into a different column variables?

Thanks for your help!
The Youthman

dmazzini · May 5, 2005

youthman

Can you be more specific?? I don't really understand what do you want to do??

Cheers!

dmazzini
GSM System and Telecomm Consultant

youthman · May 5, 2005

Basically, I have a database that has 950 characters in each line. I need the line split into 50 Columns based on the number of Characters. Column 1 is the first 14 characters. Column 2 is the next 5 Characters, Column 3 is the next 140 Characters Etc until you have reached the end of the row (Character 949) and then the 950th Character is a Line Feed. I need each "Column" to be placed into a different variable. Column 1 (The first 14 Characters) = $column1, Column2 (Characters 15-19) =$column2 etc.

Does that explain it better?

dmazzini · May 5, 2005

You should use the perl function "substr".

and assign it to the variable name.

Some examples:

substr("Once upon a time", 3, 4); # returns "e up"
substr("Once upon a time", 7); # returns "on a time"
substr("Once upon a time", -6, 5); # returns "a tim"

The bad thing is that you will need to define differents values for each substract.
e.g.

$column1= substr($line, 0, 14);

print "$column1\n";

$column2 = substr("$line", 14,5);

print "$column2";

$column3 = substr("$line", 19,140);

dmazzini
GSM System and Telecomm Consultant

KevinADC · May 5, 2005

use substr() like dmazzini suggests to make a list from the lines of the file. There is no comparable split() function for what you want to do that I am aware of.

mikevh · May 5, 2005

Here's something I wrote a while back that you may find useful. The heart of this is the parsefixed routine. You pass it a line of fixed-width data and a list with the starting positions and lengths of your data fields, and it returns the fields in an array.

This program assumes the first line of input is a header with the names of the data fields. The main while (<DATA>) loop builds an array of hashes using the field names as the hash keys.

Working with fixed-width data is very tedious. Maybe this will make it a bit less painful.

Code:

#!perl
use strict;
use warnings;

my @cols =([0,6], [7,9], [16,7], [23,9], [32,8], [40,7], [47,5]);
#my @fieldwidths = (6, 9, 7, 9, 8, 7, 5);

my @arr;
my %h;
my @headers;
while (my $line = <DATA>) {
    chomp $line;
    my @data = map {trim($_)} parsefixed($line, @cols);
    if ($. == 1) {
        @headers = @data;
        next;
    }
    @h{@headers} = @data;
    push @arr, { %h };
}

for my $href (@arr) {
    for my $h (@headers) {
        print qq($h => $href->{$h}\n);
    }
    print "\n";
}
    

sub parsefixed {
    # Parse a line of fixed-width data into an array based on @fieldwidths.
    # @fieldwidths may be a flat list with just widths, e.g. 
    # (6, 9, 7, 9, 8, 7, 5)
    # or a list of array refs with [start,len] elems, e.g.
    # ([0,6], [7,9], [16,7], [23,9], [32,8], [40,7], [47,5])

    my ($line, @fieldwidths) = @_;
    #is this a flat list or a list of array refs?
    my $isref = ref($fieldwidths[0]); 
    my @data;
    my $start = 0;
    my $len;
    for (my $i=0; $i<@fieldwidths; $i++) {
        if ($isref) {
            ($start, $len) = @{$fieldwidths[$i]};
        } else {
            $len = $fieldwidths[$i];
        }
        $data[$i] = substr($line, $start, $len);
        $start += $len unless $isref;
    }
    return @data;
}

sub trim {
    #Trim leading/trailing whitespace;
    local $_ = shift;
    s/^\s+//;
    s/\s+$//;
    $_;
}
        
__DATA__
Server Location BkUp   Date     time    mbytes files
serv1  Atl      yes    01/22/05 10:01   569333 5621
serv2  Chi      no              23:00   3218   112
serv3  LA       yes    01/22/05         34233  1231

Output:

Code:

Server => serv1
Location => Atl
BkUp => yes
Date => 01/22/05
time => 10:01
mbytes => 569333
files => 5621

Server => serv2
Location => Chi
BkUp => no
Date => 
time => 23:00
mbytes => 3218
files => 112

Server => serv3
Location => LA
BkUp => yes
Date => 01/22/05
time => 
mbytes => 34233
files => 1231

youthman · May 5, 2005

Thanks a Bunch to all of you GURU'S! These have BOTH solved my problem. I will have to decide which is most effective on the file later, but that works reguardless! No further replies are required for this thread!

TrojanWarBlade · May 6, 2005

I can't believe you guys keep recommending using "substr".
It is THE most unmaintainable way to write this kind of code.
To do this properly you should use "unpack".
The fundamental difference is that with "substr" you specify character positions whereas with "unpack", you specify field widths.
If you ever need to change the size of a field, with "substr", you must change EVERY position in EVERY "substr" from that position on. With "unpack", you merely change the definition of the width of that particular field.
Also, to make things even easier to maintain, you should create the unpack format string once only and reuse it through your code so you only ever have to change one line of code if anything alters. Indeed, you could even write the code to read the fields widths from a file or other external source and generate the "unpack" format field on the fly (again, once only).

Trojan.

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Strange split question . . .

youthman

Programmer

dmazzini

Programmer

youthman

Programmer

dmazzini

Programmer

KevinADC

Technical User

mikevh

Programmer

youthman

Programmer

TrojanWarBlade

Programmer

Similar threads

Part and Inventory Search

Sponsor