newline in pattern

cmeyers · Jul 6, 2001

I'm trying to snag multiple lines of data by including the newline character in my pattern:

Code:

gawk '/<NAME>.*<\/NAME>.*<SEQ>.*<\/SEQ>/' file.xml

NAME and SEQ are each on one line.

There may or may not be another line in between:
[tt]<DESC>Description Here</DESC>[/tt]

My hope was that ".*" would get the description line and the newlines on either side. My understanding was that "." matched even a newline. Arnold Robbins...where are you?

I spent some time (not a lot) searching the forum and the FAQ but came up empty.

Ultimately I would like to perform a global substitution once I can get the proper pattern to match.

Regards,
CM >

:O>

Krunek · Jul 7, 2001

Hello, CraigMan!

A few words about methacharacters in regExp:

. matches any single character except newline.

.* matches any number of any character.

That's a theory.

I try it; this is a file:

awk 1977

c 1971

Command awk '/./' file gives this result:

awk 1977
c 1971

Command awk '/.*/' file gives this result:

awk 1977

c 1971

Conclusion: .* maches any number of any character even newlines.

I hope this helps.

KP.

cmeyers · Jul 7, 2001

You are right. I misunderstood "."
But...".*" does not work as I would expect.

My file will either look like this:
[tt]
<NAME>Name Here</NAME>
<DESC>Description Here</DESC>
<SEQ>Sequence Here</SEQ>
[/tt]
or like this:
[tt]
<NAME>Name Here</NAME>
<SEQ>Sequence Here</SEQ>
[/tt]
This is the code again:

Code:

gawk '/<NAME>.*<\/NAME>.*<SEQ>.*<\/SEQ>/' file.xml

Whether I have a Description line or not, I expect to match either the two or three lines above. I'm not getting it.

However, I do appreciate you clearing up my major metacharacter misconception!

Regards,
CraigMan >

:O>

Krunek · Jul 9, 2001

CraigMan,

to match either the two or three lines above, try this example

awk '/<NAME>/, /<\/SEQ>/' file.xml

or this

awk '/NAME/, /SEQ/' file.xml

This is the pattern range. Pattern range /NAME/, /SEQ/ prints all lines between NAME and SEQ.

Bye!

KP.

Krunek · Jul 9, 2001

BTW, at the time I try to parse XML code with awk. I do it character by character (not line by line!).

KP.

pkiller · Jul 9, 2001

if tags are alone on a line, then this should work:

/<NAME>/ { collect=1 }
collect!=0 { x = x RS $0 }
/<\/SEQ>/ { collect=0 ; print x; x=""}
END {if (x) print x}

cya

--
pkiller

cmeyers · Jul 9, 2001

Thanks for the input.

I'm familiar with both methods (pattern ranges //,// and setting/unsetting flags).

I'll probably use a pattern range since it's more concise.

However, I'm still disappointed and mystified.

Why does ".*" fail to match what it ought to match?

It's a matter of principle.

CraigMan >

:O>

flogrr · Jul 9, 2001

Hi CraigMan,

To answer your question, awk in all it's forms is a line
oriented interpreter as you may know. Therefore, when
the second line is read, the previous line no longer has
focus and as a result is lost to further processing.

The newline is the mechanism used to terminate the
processing of any given line. Move off the line and
you are done whether you were finished or not!

To string multiple lines together you can concatenate
them into one as my example code will do, but you
cannot retain an embedded newline between the lines
thus joined. Such is the architecture of the awk programming language.

Example:

nawk '{
while ($0!~/^$/) {
line = line$0
getline
}

if ($0~/^$/) print line

line = ""

next

}' inputfile > outputfile

Hope this helps you!

flogrr
flogr@yahoo.com

cmeyers · Jul 10, 2001

I get it now. Thanks.

I also tried using Krunek's advice and set RS="" for character-by-character parsing.

What I find interesting is this method seems to choke on large files. The result is a core dump.

cmeyers · Jul 11, 2001

I had to abandon RS="". I should have realized the core dump I was getting was due to the limit awk has for the number of characters in an input line.

gawk just dumped on me. nawk gave me a meaningful error message.

Krunek: What is your method for parsing XML character-by-character?

CraigMan >

:O>

Krunek · Jul 12, 2001

Hi!

You are right, CraigMan. I forgot this. Sorry.
awk has some limitations, for example:
100 fields
3000 chars per input record
3000 chars per output record
1024 chars per field

I try to make simple and generic XML parser with
awk for small XML files. Valid XML document can
look like this:

<root><tag>data</tag><tag>data</tag></root>

For this XML document I suggest character by
character parsing.

You can also see this thread:

http://www.tek-tips.com/gviewthread.cfm/lev2/4/lev3/32/pid/426/qid/88595

But it's pretty easy to create an XML document
with awk. My gift for you and other awkers: a
program solution with awk for generating XML file
from text file with space as field separator:

# to_xml.awk - converts text data to xml format
# to_xml.awk - croatian: pohrana tekstovnih podataka u xml-zapis
# Kruno Peter, kruno_peter@yahoo.com
# awk, Public Domain, March 2001, last update: July 2001
# Jesus loves you.

BEGIN { print "<?xml version=\"1.0\"?>" }

NR == 1 { print "<file filename=\"" FILENAME "\">" }

! /^$/ {

print "<row>"

for (i = 1; i <= NF; i ++)
print " <data" i ">" $i "<\/data" i ">"

print "<\/row>"
}

END { print "<\/file>" }

If input file with name "data.txt" look like this:

Sinisa python 29
Krunek awk 30

Output will be:

<?xml version="1.0"?>
<file filename="data.txt">
<row>
<data1>Sinisa</data1>
<data2>python</data2>
<data3>29</data3>
</row>
<row>
<data1>Krunek</data1>
<data2>awk</data2>
<data3>30</data3>
</row>
</file>

Bye!

KP.

cmeyers · Jul 12, 2001

Krunek,

Thanks for the gift!

I appreciate the parsing advice as well as the XML generator.

CraigMan

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

newline in pattern

cmeyers

Programmer

Krunek

Programmer

cmeyers

Programmer

Krunek

Programmer

Krunek

Programmer

pkiller

Technical User

cmeyers

Programmer

flogrr

Programmer

cmeyers

Programmer

cmeyers

Programmer

Krunek

Programmer

cmeyers

Programmer

Similar threads

Part and Inventory Search

Sponsor