Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations TouchToneTommy on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Break up large data file to smaller ones 1

Status
Not open for further replies.

new2unix

Programmer
Feb 5, 2001
143
US
Hi,

I have a large data file around 1.2 millions lines/records with the size of about 900 meg on the AIX server. The goal is to break up the file into smaller files with the name prefix of "data_" follow by sequence number, "01", "02".... and ended with ".dat". Each file should contain max of 50,000 lines. The last file may or may not have 50,000 lines. Other then just using a loop to count 50,000 times over and over, is there another more efficient and reliable method to break up the large file and at the same time keeping track of correct starting line for the next data file?

Thanks

Mike
 
If the records in the file are all the same length, you might be able to do something with the file size and seek. Other than that, I think you may be stuck with the loop. Tracy Dryden
tracy@bydisn.com

Meddle not in the affairs of dragons,
For you are crunchy, and good with mustard.
 
Thanks!

Each line will be in variable length, I guess I will just have to do it the hard way. :)

Mike
 
If your platform supports the split command, you can use this. The manual pages for the Sun version of split:

NAME
split - split a file into pieces

SYNOPSIS
split [-linecount | -l linecount ] [ -a suffixlength ] [
file [ name ] ]

split -b n [k | m ] [ -a suffixlength ] [ file [ name ]
]

DESCRIPTION
The split utility reads file and writes it in linecount-
line pieces into a set of output-files. The name of the
first output-file is name with aa appended, and so on lexi-
cographically, up to zz (a maximum of 676 files). The max-
imum length of name is 2 characters less than the maximum
filename length allowed by the filesystem. See statvfs(2).
If no output name is given, x is used as the default
(output-files will be called xaa, xab, and so forth).


Happy coding, NEIL
 
Good answer Neil! It had occurred to me that a split program might be useful, but I didn't know that there was a unix version of that type of program. Tracy Dryden
tracy@bydisn.com

Meddle not in the affairs of dragons,
For you are crunchy, and good with mustard.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top