Contact US

Log In

Come Join Us!

Are you a
Computer / IT professional?
Join Tek-Tips Forums!
  • Talk With Other Members
  • Be Notified Of Responses
    To Your Posts
  • Keyword Search
  • One-Click Access To Your
    Favorite Forums
  • Automated Signatures
    On Your Posts
  • Best Of All, It's Free!

*Tek-Tips's functionality depends on members receiving e-mail. By joining you are opting in to receive e-mail.

Posting Guidelines

Promoting, selling, recruiting, coursework and thesis posting is forbidden.

Students Click Here

Reading Character-Delimited Files That Contain New-Line Characters

Reading Character-Delimited Files That Contain New-Line Characters

Reading Character-Delimited Files That Contain New-Line Characters

I am writing a SAS program that must read in an Excel file.  My team has Perl code that converts the Excel file to a CSV file that our SAS programs are able to read.  Now, I've done this in the past with no problems.  However, this time around the program must read in an Excel file with cell values that may contain new-line characters.  

Although I still consider myself relatively new to SAS, I believe I have figured out the issue I am having.  When an INPUT statement is called, an entire line is read into a buffer.  It determines the end of each line by a new-line character.  Since some of the cells in the Excel input file may contain new-line characters, an observation is split.  When this occurs, the left-part of the observation is read in, but the right-part is read in on the next iteration as a seperate observation.  Let me illustrate with example data, where '◙' will represent a new-line character:

Example Excel File
Field 1   Field 2             Field 3                     Field 4
415       Smith, John G   troublesome◙field     orange
416       White, Bob M    another value           blue
417       Rogers, Bill J     a third record           green

If there is no new-line character in Field 3, normally the INPUT statement would read in all of the first obersvation ('415' through 'orange' values).  However, since this Excel file may have a new-line character in the field, what happens is the INPUT statement reads in only part of the first observation ('415' through 'troublesome').  Then for the second iteration, the INPUT statement continues reading values from 'field' as if it has already reached the second line.

Is there any way I can read in entire observations whether there is a new-line character in a field or not, without having to modify the Perl code to remove the character?  There has to be a way to let SAS know how many fields are in each observation and/or to ignore certain new-line characters.  Any help will be very much appreciated!!!  My current code is below, in case you would like to read/use it.  Thanks, everyone :)

data file_data; infile csv_file dlm=&delimiter_cmd dsd truncover missover firstobs=2;
     keep record_date_seconds record_date record_date_formatted;
     input id:10. employee_name:$8. employee_id:$10. job_title:$50. location:$50.
             location:$50. reporting_manager:$50. ticket_number:$15. record_date_seconds;
     record_date = mdy(01,01,1900) + record_date_seconds - 2;
     record_date_formatted = put(record_date, worddate18.);

RE: Reading Character-Delimited Files That Contain New-Line Characters

Hi Apoc,
  This is a problem you'll no doubt stumble over many times, it's a problem that's plagued me for years.  As far as I can tell, there's no way to overcome this in SAS.
My method for dealing with this is to use a smart text editor (such as VEdit) to strip these characters out.
Generally the character that Excel puts in is a line feed character (referred to as LF or OA in Hex), so what I do is a global find and replace to remove ALL LF characters.
Then I do a second find and replace to replace of carriage return characters (CR or 0D in hex) with CRLF (0A0D) to restore the end of lines correctly.
This seems to work prety robustly.

However, if you have some decent PERL programmer there that can update the PERL program, it might be easier for them to amend the program to remove LF characters from the fields before writing them out.  I'm afraid I can't help with that as I don't know PERL (beyond some exposure to regular expressions).

Sorry I can't be more help, if someone else has a better answer though, I'd love to hear it as I have to deal with this problem fairly frequently with data sent from clients.

Business Analyst, Code Monkey, Data Wrangler.
SAS Guru.

RE: Reading Character-Delimited Files That Contain New-Line Characters

Chris, I'm sorry to hear your frustrations, but at least I know I'm not the only person :)  For now, I've copied and updated the Perl program in a personal folder.  I've never really programmed in Perl, but it appears my additional code is working.  Unless someone else has another answer (that apparently will help both of us), perhaps I'll just end up overwriting the Perl program that is currently in use.  Hopefully it won't take too long to have it overwritten (moved to production).

I'm still willing to hear any other suggestions.  Thanks!

Red Flag This Post

Please let us know here why this post is inappropriate. Reasons such as off-topic, duplicates, flames, illegal, vulgar, or students posting their homework.

Red Flag Submitted

Thank you for helping keep Tek-Tips Forums free from inappropriate posts.
The Tek-Tips staff will check this out and take appropriate action.

Reply To This Thread

Posting in the Tek-Tips forums is a member-only feature.

Click Here to join Tek-Tips and talk with other members! Already a Member? Login

Close Box

Join Tek-Tips® Today!

Join your peers on the Internet's largest technical computer professional community.
It's easy to join and it's free.

Here's Why Members Love Tek-Tips Forums:

Register now while it's still free!

Already a member? Close this window and log in.

Join Us             Close