INTELLIGENT WORK FORUMS
FOR COMPUTER PROFESSIONALS

Log In

Come Join Us!

Are you a
Computer / IT professional?
Join Tek-Tips Forums!
  • Talk With Other Members
  • Be Notified Of Responses
    To Your Posts
  • Keyword Search
  • One-Click Access To Your
    Favorite Forums
  • Automated Signatures
    On Your Posts
  • Best Of All, It's Free!

*Tek-Tips's functionality depends on members receiving e-mail. By joining you are opting in to receive e-mail.

Posting Guidelines

Promoting, selling, recruiting, coursework and thesis posting is forbidden.

Jobs

split text file

split text file

(OP)
I've a text file with many lines where are many messages beginning with XX and ending with bracket ")" (each one)

I'd like to split it so each message is saved to separate file (e.g. file names with increasing number)

so file1 will have:
XX sfjsklf
sfsfsfsf
(sdfsdfsf
sfsfsf
gfdfgdgdghdh)

file2:
XX 902wriwirj
sdfs
sdfsf
(sdfsdfs
sfsf)


etc.

RE: split text file

Hi

Is there unwanted content between the section ? If not, this will produce file01, file02 and so on ( also a ( potentially empty ) file00 with content before first section ) :

CODE

csplit -f file /path/to/input '/^XX /' '{*}' 

Feherke.
feherke.ga

RE: split text file

(OP)
yes, there are some lines unwanted between (to be ignored)

and the might also be some tailing unwanted stuff after closing ")" - at the same line

sample of wanted (green) and unwanted (red)

sfkjskfsdf
sfsfsf
sfsf

XX sdfsf
sfsf
sdfs
sdfsfsf
(sfsfsfsf
sdfsf
sfsf
sfsfsfs)
fgkjsfsdflk
wrwrewr
werwerweer
wrwrwerwre




RE: split text file

Hi

In that case I would go with Awk :

CODE

awk -v RS='(^|\n)XX [^)]+\\)' 'RT{sub(/^\n/,"",RT);print RT>"file"++n}' /path/to/input 

Feherke.
feherke.ga

RE: split text file

(OP)
brilliant! thank you very much.

by the way, if I would like to pipe each message to a command (instead of writing it to a file), would this be ok or something is superfluous there? It looks it is working but I'd like to be sure this approach is ok...

gawk -v RS='(^|\n)XX [^)]+\\)' 'RT{sub(/^\n/,"",RT);cmd="wc -l";print RT | cmd; close(cmd)}'

also, how to modify it so in awk I could use another command with redirect like:

cmd < message

(cmd could be given with some options)

RE: split text file

Hi

Yes, that is the way to run an external command and passing it input.

Not sure about what are you asking there, but I assume you would like bidirectional communication with the external command. That is GNU Awk only feature :

CODE

gawk -v RS='(^|\n)XX [^)]+\\)' 'RT{sub(/^\n/,"",RT);cmd="wc -l"; print RT |& cmd; close(cmd,"to"); cmd |& getline c; close(cmd,"from"); print c}' /path/to/input 

Though if you really want to just get the line count, then better solve it in Awk :

CODE

gawk -v RS='(^|\n)XX [^)]+\\)' 'RT{gsub(/^\n/,"",RT);print split(RT,a,"\n")}' /path/to/input 

Feherke.
feherke.ga

RE: split text file

(OP)
sorry for bad explaining my goal

with your first command I can use files created for further processeing with for loop:

for i in file*;do somecommand -o sss < $i;done

I was thinking of implementing it directly into awk command not having to do the for loop at all.

RE: split text file

Hi

In such cases usually a \0 delimiter is used, hoping the text to process will not contain it :

CODE

gawk -v RS='(^|\n)XX [^)]+\\)' -v ORS='\0' 'RT{gsub(/^\n/,"",RT);print RT}' /path/to/input |
while IFS='' read -d $'\0' s; do
    echo "--=[$s]=--"
done 

Feherke.
feherke.ga

RE: split text file

(OP)
thank you.

I have tried also to add leading zeroes to the counter in filenames - could you tell me why _79 is twice? and how to start from file_01 (and not file_00)?
in my example there should be 80 files created from file_01

$ gawk -v RS='(^|\n)XX [^)]+\\)' 'RT{sub(/^\n/,"",RT);file=sprintf("%s_%02d","file",n++);print RT>file};{print file}' ddddd|head -5
file_00
file_01
file_02
file_03
file_04
$ gawk -v RS='(^|\n)XX [^)]+\\)' 'RT{sub(/^\n/,"",RT);file=sprintf("%s_%02d","file",n++);print RT>file};{print file}' ddddd|tail -5
file_76
file_77
file_78
file_79
file_79
$

RE: split text file

Hi

Then either initialize it with n=1 or preincrement it with ++n.

Such RT based solutions tend to produce an empty data too at one end of the processing. To stop that, I asked Awk to write only when RT is not empty. You asked it to {print file unconditionally, even if the block that writes to file was skipped.

Feherke.
feherke.ga

Red Flag This Post

Please let us know here why this post is inappropriate. Reasons such as off-topic, duplicates, flames, illegal, vulgar, or students posting their homework.

Red Flag Submitted

Thank you for helping keep Tek-Tips Forums free from inappropriate posts.
The Tek-Tips staff will check this out and take appropriate action.

Reply To This Thread

Posting in the Tek-Tips forums is a member-only feature.

Click Here to join Tek-Tips and talk with other members!

Resources

Close Box

Join Tek-Tips® Today!

Join your peers on the Internet's largest technical computer professional community.
It's easy to join and it's free.

Here's Why Members Love Tek-Tips Forums:

Register now while it's still free!

Already a member? Close this window and log in.

Join Us             Close