RegEx - get data from a point not at the beginning of the line.

JustKIDn · Sep 8, 2003

Hi all,

I've inherited this goofy file!

I can get everything else, but I can't seperate the address info.

Example;

Code:

lastname               Firstname M status 12Oct44   age
123 MapleGrove Lane    Firstname F status 12Oct44   age
New York, NY 12345     Firstname M status 12Oct44   age
123-555-4567

As you can see, it's mostly in columns. But the address is stuffed in there.

I have a line something like this;

my($col1, $col2, $col3, $col4, $col5, $col6) = split(/\s+/, $_,6);

This works great for the first row. But when it drops to the other lines everything changes.

You can't count on every address having the same number of spaces in them. And sometimes there is only an incomplete address or it's completely missing.

The firstname field starts at about the 40th column.
Is there some way I can grab everything up to the firstname as one whole column, and then each of the following columns after that?

tgus

____________________________
Families can be together forever...

usige · Sep 8, 2003

Assuming that the first column is fixed length you could do something like this:

Code:

my $col1 = substr($_, 0, 23, '');
my ($col2, $col3, $col4, $col5, $col6) = split(/\s+/);

Just set the third arg in substr to whatever the length of the first column is.

Hope that helps.

JustKIDn · Sep 8, 2003

Thanks usige,

You know, if I had this problem in some other languages, it wouldn't be a problem. I'd just use substr().
But in Perl you generally think in terms of a regex to get the job done. [hammer]

I was just thinking about the possibility of substr(), and I was just checking to see if I had received an answer yet. And if not, I was going to ask if using substr() would be a good idea.

I'm going to try it now. I'll let you know how it turns out.

Thanks again!

tgus

____________________________
Families can be together forever...

JustKIDn · Sep 8, 2003

usige,

That worked good. I had to modify it a bit.

my $col1 = substr($_, 0, 40, '');
my ($col0, $col2, $col3, $col4, $col5, $col6) = split(/\s+/);

The way it's written, it gobbles up the first 40 characters of $_ when it adds them to col1.

It has to do with the '' . But it didn't work well without it so I left them in. The real problem with the quotes is. Somehow it also gobbles up the characters that were to go into the $col2 field.

To remedy this I added another 'dummy' field at the beginning of the list called $col0. $col0 now ends up empty, but all the others end up with the correct data. Go figure!

Then the only other thing was to strip off all the blank spaces at the end of $col1.

$col1 =~ s/\s+$//;

Now my output looks like this;

lastname Firstname M status 12Oct44 age
lastname Firstname F status 01Apr45 age
lastname Firstname M status 10Jun95 age
lastname Firstname F status 31Nov97 age
123 MapleGrove Lane New York, NY 12345 123-555-4567

I'm thinking about adding commas after everything.

What I'm after is to get this data into a format I can import into a database.

Each grouping like this is one family. So I'll have a lot of repeating looking lines like this (except that the names will be real).

So my question is; do you think this is ok? Or is there a better way to format the data to make it easier to import?

Thanks!

tgus

____________________________
Families can be together forever...

PaulTEG · Sep 9, 2003

Assuming addresses are unique to identify families, you can use the address ID as a unique key for the address table, and select all the family members based on the addrID

[family_members]
addrID,lastname, Firstname, M, status, 12Oct44, age
addrID,lastname, Firstname, F, status, 01Apr45, age
addrID,lastname, Firstname, M, status, 10Jun95, age
addrID,lastname, Firstname, F, status, 31Nov97, age
[/family_members]
[address_table]
addrID,123 MapleGrove Lane, New York, NY 12345, 123-555-4567
[/address_table]

select * from family_members where addrID=[address_table.addrID]

HTH
--Paul

PaulTEG · Sep 9, 2003

I meant to say use DBD::CSV, or do a CSV import into a database
--Paul

JustKIDn · Sep 10, 2003

Thanks Paul,

I don't know if you posted this before I thought of it. But that's exactly what I did!

Except that instead of calling it addrID, I called it famID. Same thing.

It was easier than I thought it would be. And to add the number, I just created a counter ($famID) that was incremented every time the Lastname changed. Which was also easy because the file that was given to me also had dotted (---) lines between each family group.

I thought I was going to have to do a lot of manual data massaging. But once again, Perl made it easy!

Thanks guys! [penguin]

P.S. If anyone wants to see the code I used for this. Just post here and I'll put it up.

tgus

____________________________
Families can be together forever...

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

RegEx - get data from a point not at the beginning of the line.

JustKIDn

MIS

usige

Technical User

JustKIDn

MIS

JustKIDn

MIS

PaulTEG

Technical User

PaulTEG

Technical User

JustKIDn

MIS

Similar threads

Part and Inventory Search

Sponsor