INTELLIGENT WORK FORUMS
FOR COMPUTER PROFESSIONALS

Log In

Come Join Us!

Are you a
Computer / IT professional?
Join Tek-Tips Forums!
  • Talk With Other Members
  • Be Notified Of Responses
    To Your Posts
  • Keyword Search
  • One-Click Access To Your
    Favorite Forums
  • Automated Signatures
    On Your Posts
  • Best Of All, It's Free!

*Tek-Tips's functionality depends on members receiving e-mail. By joining you are opting in to receive e-mail.

Posting Guidelines

Promoting, selling, recruiting, coursework and thesis posting is forbidden.

Jobs

Break a file into multiple files

Break a file into multiple files

(OP)
I have a file that contains some headers at the first of the file starting with "H" that I want to write to the first of all of the following files that are created based on the non H records field 2.
Everything works like I want it to except for the multiple H records printed to each file.

Any help is appreciated.

Here's the data file:
H1
H 2
H 3
H
H 44
GROUP1 AA 1
GROUP1 AA 2
GROUP1 AA 10
GROUP1 BB 1
GROUP1 CC 1
GROUP2 AA 3
GROUP2 AA 4
GROUP3 AA 5
GROUP10 AA 5

and here's my script so far:
BEGIN {
# if (substr($0,1,1) == "H"){print $0 > $2".DAT"}
old=$2
}
{
new=$2
#if (substr($0,1,1) != "H")
{
if (new != old) { print $0 >> $2".DAT"}
if (new == old) { print $0 >> $2".DAT"}
old=new
}
}

RE: Break a file into multiple files

Hi

For that sample input how many output files should be created, how should they be called and what should they contain ?

Feherke.
http://feherke.github.com/

RE: Break a file into multiple files

(OP)
Thanks Feherke for responding.
The script would create 3 files - AA.DAT, BB.DAT AND CC.DAT.
Each file contains the records that relate to the field 2 values.

Thanks

RE: Break a file into multiple files

Hi

You mean something like this ?

CODE --> Awk

awk '!/^H/{print>$2".DAT"}' /input/file
Tested with gawk and mawk.

Feherke.
http://feherke.github.com/

RE: Break a file into multiple files

(OP)
Thanks Feherke.
this works except I still would like to pass the H records at the begining of the input file to each of the output files.

Thanks again.

RE: Break a file into multiple files

(OP)
Feherke/others
I changed the code and now I am able to get the "H" records printed out at the top of each output file but now I only get the last record for the ones per group based on field 2 of data.
Here is the input file I am using:
H1
H2
H3
H
H 44
GROUP1 aa 1
GROUP1 aa 2
GROUP1 aa 10
GROUP1 bb 1
GROUP1 bb 2
GROUP2 cc 3
GROUP2 cc 4
GROUP3 dd 5
GROUP10 ee 1
GROUP10 ee 2
GROUP10 ee 3
GROUP1 aa 11
GROUP1 aa 22
GROUP1 aa 100
and here is my script:

FNR==1 {
hdr_count = 0;
while (substr($1,1,1) == "H") {
header[++hdr_count] = $0;
getline;
}
}
substr($1,1,1) != "H" {
if (filename != "") close(filename);
filename = $2 ".dat";
for (h=1; h<=hdr_count; h++)
print header[h] > filename;

}
{
print $0 > filename;
{


here is the output I am currently geting - only the last record and not all for each group based on field 2 identifier aa for output file aa.dat
H1
H2
H3
H
H 44
GROUP1 aa 100

RE: Break a file into multiple files

Hi

Quote (mrr)

The script would create 3 files - AA.DAT, BB.DAT AND CC.DAT.
Each file contains the records that relate to the field 2 values.
From here I understand you not want the header lines anywhere.

Quote (mrr)

this works except I still would like to pass the H records at the begining of the input file to each of the output files.
From this I understand that you want the header lines everywhere.

To correct your latest code, just remove the call of the close() function.

But I am not sure whether it will give what you want. I suppose you want the header lines at the beginning of each file, while your code will insert the header lines before every record. So here is what I would do :

CODE --> Awk

awk '/^H/{h=h$0ORS;next}!f[$2]{printf"%s",h>$2".DAT";f[$2]=1}{print>$2".DAT"}' /input/file
Tested with gawk and mawk.

Feherke.
http://feherke.github.com/

RE: Break a file into multiple files

(OP)
Thanks Feherke,

This part of my code works properly with the H records being passed to all output files but I cant get the non H records to print entirely for each output file:

FNR==1 {
hdr_count = 0;
while (substr($1,1,1) == "H")
{
header[++hdr_count] = $0;
getline;
}
}
substr($1,1,1) != "H"
{
if (filename != "") close(filename);
filename = $2 ".dat";
for (h=1; h<=hdr_count; h++)
print header[h] > filename;
}

taking the input of:
H1
H2
H3
H
H 44
GROUP1 aa 1
GROUP1 aa 2
GROUP1 aa 10
GROUP1 bb 1
GROUP1 bb 1
ROUP1 aa 11
GROUP1 aa 22
GROUP1 aa 100

I want the output to have 2 files and they would look like:
file aa.dat:
H1
H2
H3
H
H 44
GROUP1 aa 1
GROUP1 aa 2
GROUP1 aa 10
GROUP1 aa 22
GROUP1 aa 100

and file bb.dat would look like:
H1
H2
H3
H
H 44
GROUP1 bb 1
GROUP1 bb 1

I just cant seem to get the code right to print the data records for each output file.

Thanks again.

RE: Break a file into multiple files

Hi

Quote (mrr)

I cant get the non H records to print entirely for each output file
That is why I wrote in my previous post :

Quote (Feherke)

To correct your latest code, just remove the call of the close() function.

Thanks to the output you posted, we got the answer to my next doubt :

Quote (Feherke)

I suppose you want the header lines at the beginning of each file, while your code will insert the header lines before every record.
So your code has one more glitch. It needs an additional array where to store the already opened files, so to be able to output the headers only before writing the first record. ( That is what the f array in my latest code serves for. )

Feherke.
http://feherke.github.com/

RE: Break a file into multiple files

CODE --> bash

awk '/GROUP/{print>$2".DAT"}' /input/file;for f in *.DAT; do grep ^H /input/file |cat - $f > tmp;cat tmp>$f;done

RE: Break a file into multiple files

(OP)
I would like to have this code in a file so I can re-run by the -f command rather than using command line statement.
I now have the code to work properly with printing the H records but they only print on the first file output and not the following files.

Here's my current code:
{
hdr_count = 0;
while (substr($1,1,1) == "H")
{ header[++hdr_count] = $0
getline
}
{ if (substr($1,1,1) != "H")
if (filename != "") close(filename)
filename = $2 ".dat"
for (h=1; h<=hdr_count; h++)
print header[h] > filename; print $0 >> filename}
}

I can't seem to get it to work if i remove the close function....
Here's my latest test data:
H1
H2
H 3
H
H 44
GROUP1 aa 1
GROUP1 aa 2
GROUP1 aa 10
GROUP1 bb 1
GROUP1 bb 10
GROUP1 aa 11
GROUP2 aa 22
GROUP2 aa 100

Here's my current output for file aa.dat and it looks good....
H1
H2
H 3
H
H 44
GROUP1 aa 1
GROUP1 aa 2
GROUP1 aa 10
GROUP1 aa 11
GROUP2 aa 22
GROUP2 aa 100

and here is bb.dat without the header h records i want to include...

GROUP1 bb 1
GROUP1 bb 10

Thanks to all for the assistance on this...

RE: Break a file into multiple files

Hi

Quote (mrr)

I would like to have this code in a file so I can re-run by the -f command rather than using command line statement.

Then put it in a file :

CODE --> mrr.awk

/^H/{h=h$0ORS;next}!f[$2]{printf"%s",h>$2".dat";f[$2]=1}{print>$2".dat"}
And run it :

CODE --> line

awk -f mrr.awk /input/file

Or give it execute permission :

CODE --> mrr.awk

#!/usr/bin/awk -f /^H/ { h=h $0 ORS next } ! f[$2] { printf "%s",h > ($2 ".dat") f[$2]=1 } { print > ($2 ".dat") }
And run it :

CODE --> line

./mrr.awk /input/file

Note that you may need to edit the shebang line in case your awk executable is installed elsewhere.

Feherke.
http://feherke.github.com/

Red Flag This Post

Please let us know here why this post is inappropriate. Reasons such as off-topic, duplicates, flames, illegal, vulgar, or students posting their homework.

Red Flag Submitted

Thank you for helping keep Tek-Tips Forums free from inappropriate posts.
The Tek-Tips staff will check this out and take appropriate action.

Reply To This Thread

Posting in the Tek-Tips forums is a member-only feature.

Click Here to join Tek-Tips and talk with other members!

Resources

Close Box

Join Tek-Tips® Today!

Join your peers on the Internet's largest technical computer professional community.
It's easy to join and it's free.

Here's Why Members Love Tek-Tips Forums:

Register now while it's still free!

Already a member? Close this window and log in.

Join Us             Close