INTELLIGENT WORK FORUMS
FOR COMPUTER PROFESSIONALS

Log In

Come Join Us!

Are you a
Computer / IT professional?
Join Tek-Tips Forums!
  • Talk With Other Members
  • Be Notified Of Responses
    To Your Posts
  • Keyword Search
  • One-Click Access To Your
    Favorite Forums
  • Automated Signatures
    On Your Posts
  • Best Of All, It's Free!

*Tek-Tips's functionality depends on members receiving e-mail. By joining you are opting in to receive e-mail.

Posting Guidelines

Promoting, selling, recruiting, coursework and thesis posting is forbidden.

Jobs

awk help with closing multiple named pipes

awk help with closing multiple named pipes

(OP)
Hi,
I'm new to awk, but confident it can do what I want. I just got to tell it the right thing. I hope I can get your help.

I'm trying to load a multi-record layout formatted file into my database. I'm using greenplum (a varient of postgres) and the loader will process the input based on the specific column layout. A loader can only specify 1 layout, so I'm trying to use awk to split the file into different streams - 1 for each record type - that will feed concurrently executing loaders. The solution works as long as I'm using regular files, but because of the variable file size I'd prefer to use named pipes.

When using named pipes, the solution seems to be stalling. the loaders run indefinitely and no data actually gets loaded. I think my issue is that the awk program is not closing the named pipes when all of the file data has been read.

here's the details
the file has 3 record types - indicated by the 1st byte of the record. The values are H (header), E (Trailer/ending), 0 (data).

The loaders use a feature called external table which specifies the location and layout of the source data. For each named pipe I've created an external table def. The named pipe is referenced in the location parameter eg
create external table ex_header (record_cd varchar(1), id varchar(20), agency varchar(40), filedate date, filetime time)
location ('gpfdist::localhost:9001/BN.H);


and the commands
mkfifo /tmp/BN.H
mkfifo /tmp/BN.0
mkfifo /tmp/BN.E

#start gpfdist
gpfdist -d /tmp/ -p 9001 -l gpfdist.log & ## gpfdist will read whatever file is specified by the external table def. It can read different files concurrently.

awk '{print $0 >"/tmp/BN."substr($0,1,1)} END {close("/tmp/BN.H") close("/tmp/BN.0") close("/tmp/BN.E") }' filename
# for each record in filename, print the record to a filename derived from the 1st byte of the record. At end close the explicit filenames

in sql I have
insert into header select * from ex_header;
insert into detail select * from ex_detail;
insert into trailer select * from ex_trailer;
the 3 inserts are running concurrently from different db sessions.

the inserts just sit there and run forever. eventually I have to kill the inserts, and the awk


In the awk guide for close(file|command) is says that
"Its value must exactly match the string that was used to open the file or start the command"
so I also tried awk '{print $0 >"/tmp/BN."substr($0,1,1)} END {close("/tmp/BN"substr($0,1,1) }' filename
with the same results. in this case, I believe that the substring won't work, since it's executing at the end when $0 has no data. even if it does, it would only be the last record.


also, the file is sorted, so that H record is first, followed by 0's, then finally E.

any suggestions would really be appreciated

doug L

RE: awk help with closing multiple named pipes

I have never used gpfdist, but with my "programmers hat" on, this is what I would look at.

First of all does the location gpfdist::localhost:9001 specified in the create external table command definitely point to /tmp on the server?

Also what happens when you try and read the named pipes with a regular unix utiliy instead of using gpfdist. DO you get any data back from that ?

In the gpfdist command should the path of the -d option be /tmp/ or just /tmp or does it matter ?

I'm from an Oracle background and in there we now have multi-table inserts, does your DB have that? - it will be under INSERT ALL command
if it does exist you could use that and not need any pipes. Probably not quite as fast as parallel load with gpfdist but in Oracle
its efficient and only one table scan required.

In order to understand recursion, you must first understand recursion.

Red Flag This Post

Please let us know here why this post is inappropriate. Reasons such as off-topic, duplicates, flames, illegal, vulgar, or students posting their homework.

Red Flag Submitted

Thank you for helping keep Tek-Tips Forums free from inappropriate posts.
The Tek-Tips staff will check this out and take appropriate action.

Reply To This Thread

Posting in the Tek-Tips forums is a member-only feature.

Click Here to join Tek-Tips and talk with other members!

Resources

Close Box

Join Tek-Tips® Today!

Join your peers on the Internet's largest technical computer professional community.
It's easy to join and it's free.

Here's Why Members Love Tek-Tips Forums:

Register now while it's still free!

Already a member? Close this window and log in.

Join Us             Close