Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations bkrike on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

looping through file with unwanted \n's

Status
Not open for further replies.

emilybartholomew

Technical User
Aug 3, 2001
82
US
I have lots of files, which are in the format:

"date", "message"
"date", "message"
.
.
.

I am looping through all files (INFILE) and putting each line of the file into a database table with the following code:
while (my $line = <INFILE>){
my @line = split( /&quot;,&quot;/, $line);
$message_date = $line[ 0 ];
$message_text = $line[ 1 ];
#etc......
}

Unfortunately, sometimes the &quot;message&quot; part of the csv files has a newline in it, which makes it so that I lose the rest of the message. I'd like to somehow get rid of the unneccesary newlines and be able to capture the whole message for each date.
 
One suggestion first. If you don't use @line throughout the rest of the while loop then you can instead do
Code:
while (my $line = <INFILE>){
  my ($message_date,$message_text) = split( /&quot;,&quot;/, $line);
 
#etc......
}
Avoiding unnecessary temporary variables is always a plus.

As for the newlines, do you mean that the message spans more than one line? like
Code:
&quot;date&quot;, &quot;message&quot;
&quot;date&quot;, &quot;a really, really long message 
that spans many lines&quot;
so that the entire message is not contained in $line?

Or more that you get all of the message into $line but it contains newline chars? like
Code:
&quot;date&quot;, &quot;message&quot;
&quot;date&quot;, &quot;a long message\n with a newline&quot;
It makes a difference in how you deal with it. I don't know how csv data is stored.

jaa
 
It's the second way.

When I open the file in excel (which parses the csv at the comma) the part of the message after the newline is in a separate row.
 
It's the second way.

When I open the file in excel (which parses the csv at the comma) the part of the message after the newline is in a separate row.

btw-
thanks for the hint on the first part. I've taught myself perl, and I'm sure my style is all wrong.
 
That's pretty straightforward. Something like
Code:
$message_text =~ tr/\n/ /;
# Or
$message_text =~ s/\n/ /g;
will replace the newline with a space. You can replace it with any character you wish, depends on what you're doing with it afterward.

jaa
 
Actually, the problem is in the while(<INFILE>) part, because it doesn't pick up the next line of the message. I wasn't clear before. It's not the character &quot;\n&quot;. The rest of the message is actually on a different line.
 
Does the message text contain quotes around it like you have posted?

Or what it the format of date?

Basically, you need to check each line to see if its complete. You want to look for something that tells you that you have the entire message. If the message text is quoted, you can test to see if the current line ends with a &quot; (/&quot;$/) or perhaps (/&quot;\s*$/). If not then you need to get the next line and append it. If the text isn't quoted, you need to test the next line to see if it starts with a date perhaps.

There are always pitfalls. Your message text could contain &quot; as part of the text and that could happen to fall at the end of the current line. Or your message could contain a date in it that happens to start at the beginning of line. Chances of that are pretty small, though.

In short, find something that uniquely identifies a complete line and test each line as you read it in.

jaa

jaa
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top