split text file
split text file
(OP)
I've a text file with many lines where are many messages beginning with XX and ending with bracket ")" (each one)
I'd like to split it so each message is saved to separate file (e.g. file names with increasing number)
so file1 will have:
XX sfjsklf
sfsfsfsf
(sdfsdfsf
sfsfsf
gfdfgdgdghdh)
file2:
XX 902wriwirj
sdfs
sdfsf
(sdfsdfs
sfsf)
etc.
I'd like to split it so each message is saved to separate file (e.g. file names with increasing number)
so file1 will have:
XX sfjsklf
sfsfsfsf
(sdfsdfsf
sfsfsf
gfdfgdgdghdh)
file2:
XX 902wriwirj
sdfs
sdfsf
(sdfsdfs
sfsf)
etc.
RE: split text file
Is there unwanted content between the section ? If not, this will produce file01, file02 and so on ( also a ( potentially empty ) file00 with content before first section ) :
CODE
Feherke.
feherke.ga
RE: split text file
and the might also be some tailing unwanted stuff after closing ")" - at the same line
sample of wanted (green) and unwanted (red)
sfkjskfsdf
sfsfsf
sfsf
XX sdfsf
sfsf
sdfs
sdfsfsf
(sfsfsfsf
sdfsf
sfsf
sfsfsfs)fgkjsfsdflk
wrwrewr
werwerweer
wrwrwerwre
RE: split text file
In that case I would go with Awk :
CODE
Feherke.
feherke.ga
RE: split text file
by the way, if I would like to pipe each message to a command (instead of writing it to a file), would this be ok or something is superfluous there? It looks it is working but I'd like to be sure this approach is ok...
gawk -v RS='(^|\n)XX [^)]+\\)' 'RT{sub(/^\n/,"",RT);cmd="wc -l";print RT | cmd; close(cmd)}'
also, how to modify it so in awk I could use another command with redirect like:
cmd < message
(cmd could be given with some options)
RE: split text file
Yes, that is the way to run an external command and passing it input.
Not sure about what are you asking there, but I assume you would like bidirectional communication with the external command. That is GNU Awk only feature :
CODE
Though if you really want to just get the line count, then better solve it in Awk :
CODE
Feherke.
feherke.ga
RE: split text file
with your first command I can use files created for further processeing with for loop:
for i in file*;do somecommand -o sss < $i;done
I was thinking of implementing it directly into awk command not having to do the for loop at all.
RE: split text file
In such cases usually a \0 delimiter is used, hoping the text to process will not contain it :
CODE
Feherke.
feherke.ga
RE: split text file
I have tried also to add leading zeroes to the counter in filenames - could you tell me why _79 is twice? and how to start from file_01 (and not file_00)?
in my example there should be 80 files created from file_01
$ gawk -v RS='(^|\n)XX [^)]+\\)' 'RT{sub(/^\n/,"",RT);file=sprintf("%s_%02d","file",n++);print RT>file};{print file}' ddddd|head -5
file_00
file_01
file_02
file_03
file_04
$ gawk -v RS='(^|\n)XX [^)]+\\)' 'RT{sub(/^\n/,"",RT);file=sprintf("%s_%02d","file",n++);print RT>file};{print file}' ddddd|tail -5
file_76
file_77
file_78
file_79
file_79
$
RE: split text file
Then either initialize it with n=1 or preincrement it with ++n.
Such RT based solutions tend to produce an empty data too at one end of the processing. To stop that, I asked Awk to write only when RT is not empty. You asked it to {print file unconditionally, even if the block that writes to file was skipped.
Feherke.
feherke.ga