Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Rhinorhino on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Search and Replace

Status
Not open for further replies.

roneo

Programmer
Joined
Feb 6, 2002
Messages
4
Location
US
I am trying to write a Unix Shell Script command that would clean up the XML files by eliminating spaces between the nodes...example:
<PERSON>
<NAME>James</NAME>
</PERSON>

the command has to make:
<PERSON><NAME>James</NAME></PERSON>

anyone has clues ???
Thanks
 
the tr command may be useful, but I'm sure awk and sed can do it better

Some examples
To remove all end of lines: -
tr -d '\012' <infile>outfile

To remove all spaces: -
tr -d ' ' <infile>outfile

Please note the < > are required

see man pages
 
Hi Roneo,

The following awk script eliminates spaces between nodes like you want (i hope).

awk -f CleanUp.awk XMLfile

----- XMLfile -----
Your <B>identification</B> :
<IDENT>

<NAME>James Brown</NAME>
No more informations
</IDENT>
-------------------


----- Result -----
Your <B>identification</B> :
<IDENT><NAME>James Brown</NAME>
No more informations
</IDENT>
------------------


----- CleanUp.awk -----


# Supress spaces between consecutive nodes
# in the same line

{
gsub(&quot;>[ \t]*<&quot;,&quot;><&quot;,$0)
}

# After une line ending with a node,
# memorize empty line and go to to next line

AfterNode && /^[ \t]*$/ {
Memo[++MemoCnt]=$0
next
}

# After une line ending with a node,
# merge this previous line with current
# if starting with a node

AfterNode && /^[ \t]*</ {
gsub(&quot;^[ \t]*<&quot;,&quot;<&quot;,$0)
$0=Node $0
MemoCnt=0
AfterNode=0
}


# After une line ending with a node,
# print all memorized lines if the
# current line is not starting with a node

AfterNode && /^[ \t]*[^<]/ {
for (im=1; im<= MemoCnt; im++)
print Memo[im]
MemoCnt=0
AfterNode=0
}

# When a node is ending the line
# memorize it, it will be printed later
# Go to next line

/>[ \t]*$/ {
AfterNode=1
MemoCnt=0
Memo[++MemoCnt]=$0
gsub(&quot;>[ \t]*$&quot;,&quot;>&quot;,$0)
Node=$0
next
}

# No special case, print current line

{
print $0
}

# End of file, if we are after a line ending
# with a node, print all memorized lines

END {
if (AfterNode) {
for (im=1; im<= MemoCnt; im++)
print Memo[im]
}
}

-----------------------
Jean Pierre.
 
wow ! thanks man.. i hope you didnt just write all of that yourself... appreciate your help. I will test it out.
laters
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top