INTELLIGENT WORK FORUMS
FOR COMPUTER PROFESSIONALS

Log In

Come Join Us!

Are you a
Computer / IT professional?
Join Tek-Tips Forums!
  • Talk With Other Members
  • Be Notified Of Responses
    To Your Posts
  • Keyword Search
  • One-Click Access To Your
    Favorite Forums
  • Automated Signatures
    On Your Posts
  • Best Of All, It's Free!

*Tek-Tips's functionality depends on members receiving e-mail. By joining you are opting in to receive e-mail.

Posting Guidelines

Promoting, selling, recruiting, coursework and thesis posting is forbidden.

Jobs

Remove "extra" crlf in windows CSV file.

Remove "extra" crlf in windows CSV file.

(OP)
Hi,

I am doing a conversion from Sale Force to Oracle on a Windows platform. I saw where I can get a windows version of awk to run on windows from Sourceforge. I do not know anything about awk other than it is a Unix command that does string manipulation.

Background:
The Accounts (customer name and address) table that I was given has CRLF and LF buried in the data. CRLF is also used to terminate the record in Windows. Here are a couple of examples of my data:

20,"10","1449 Jupiter Park Dr., <LF>
Unit #19","","Jupiter","FL","33458"<CRLF>

49,"10","420 E. Aviation Dr., <CRLF>
Suite 110","","Tucson","AZ","85714"<CRLF>

The data should be:

20,"10","1449 Jupiter Park Dr.,Unit #19","","Jupiter","FL","33458"<CRLF>
49,"10","420 E. Aviation Dr.,Suite 110","","Tucson","AZ","85714"<CRLF>

Questions:
Given that I have NO knowledge of awk, (I should be able to install it and get it running):
1. Can awk remove the CRLF and LF that are imbedded in the data but not remove the CRLF that terminate the record? The CRLF and LF that need to be removed will be within quotes, if that helps.
2. If awk can do that above, can you please supply the awk command to do it. Please remember that I have not used awk before.

Thanks,

Dennis

RE: Remove "extra" crlf in windows CSV file.

(OP)
PH,

Ah - how obvious (not).
I will start playing with your command and see if I can figure out what you are doing and then I will try it.

Can your recommend a good manual / document / webpage that will help me learn awk?

Thanks for your help.

Dennis

RE: Remove "extra" crlf in windows CSV file.

Hi PHV,

It's very nice, but IMO this sequence

CODE

awk '{nq=gsub(/"/,"&");if(nq%2){x=$0;sub(/\r/,"");getline;$0=x $0}}1' input 

is to be changed to

CODE

awk '{nq=gsub(/"/,"&");if(nq%2){sub(/\r/,"");x=$0;getline;$0=x $0}}1' input 

RE: Remove "extra" crlf in windows CSV file.

Quote (PHV)


But I'm pretty sure that in a windows environment the \r is eaten by awk as the \n.

I'm not sure. I have awk version which comes with MSYS
and when the densolis' file is in format DOS (CRLF) then it outputs only this

CODE

$ awk '{nq=gsub(/"/,"&");if(nq%2){x=$0;getline;$0=x $0}}1' densolis.txt
 Unit #19","","Jupiter","FL","33458"
 Suite 110","","Tucson","AZ","85714" 

but when the densolis' file is in format UNIX (LF) the the output is OK:

CODE

$ awk '{nq=gsub(/"/,"&");if(nq%2){x=$0;getline;$0=x $0}}1' densolis.txt
20,"10","1449 Jupiter Park Dr.,  Unit #19","","Jupiter","FL","33458"
49,"10","420 E. Aviation Dr.,  Suite 110","","Tucson","AZ","85714" 

Maybe there are other Windows versions of awk...

I have one more question:

Could you explain please, what does mean the last 1 at the end of your script, i.e.

CODE

awk '{...{...}}1' input 

I tried out that it has the functionality like

CODE

awk '{...{...;print}}' input 
or

CODE

awk '{...{...}print}' input 

but I don't understand why the 1 is placed at the end after the block

CODE

awk '{...}1' input 

RE: Remove "extra" crlf in windows CSV file.

(OP)
Hi,

As I said earlier, I do not know how to use awk.

I downloaded gawk from SourceForge and install installed it. I entered the following and got the following error:


C:\Renco\VAX\TXT Files\awk>gawk '{nq=gsub(/"/,"&");if(nq%2){x=$0;getline;$0=x $0
}}1' input
gawk: '{nq=gsub(//,
gawk: ^ invalid char ''' in expression
'");if(nq%2){x=$0;getline;$0=x $0}}1' input ' is not recognized as an internal o
r external command,
operable program or batch file.

C:\Renco\VAX\TXT Files\awk>awk '{nq=gsub(/"/,"&");if(nq%2){x=$0;getline;$0=x $0}
}1' input opportunity_awk.csv
awk: '{nq=gsub(//,
awk: ^ invalid char ''' in expression
'");if(nq%2){x=$0;getline;$0=x $0}}1' input opportunity_awk.csv' is not recogniz
ed as an internal or external command,
operable program or batch file.


So my question is, what am I doing wrong? Like I said, I barely know how to spell awk.

Thanks,

Dennis

RE: Remove "extra" crlf in windows CSV file.

(OP)
I download gawk from http://gnuwin32.sourceforge.net/packages/gawk.htm.
here is some actual data. Even thought the data is on Windows, the data file terminate with a LF. Or at least that what I can figure out from displaying the data in hex in notepad++

Here is the header line and one line of bad data followed by a good line of data.

"Id","IsDeleted","AccountId","IsPrivate","Name","Description","StageName","StageSortOrder","Amount","Probability","ExpectedRevenue","TotalOpportunityQuantity","CloseDate","Type","NextStep","LeadSource","IsClosed","IsWon","ForecastCategory","ForecastCategoryName","CampaignId","HasOpportunityLineItem","Pricebook2Id","OwnerId","CreatedDate","CreatedById","LastModifiedDate","LastModifiedById","SystemModstamp","LastActivityDate","LastStageChangeDate","FiscalYear","FiscalQuarter","PrimaryPartnerAccountId","PartnerAccountId","EAU__c","Target_Price_at_EAU__c","Renco_Forecast__c","End_Product__c","Contract_Manufacturer__c","Sample_Approved__c","Reason_for_Lost__c","Opportunity_Notes__c","Outside_Rep2__c","Job_Suffix__c","Terr2__c","Terr3__c","OEM_Account__c","Next_Action_Due_By__c","Next_Action_Due__c","Next_Action__c","Previous_Action__c","SC__c","SLS__c","OEM_Rep__c","PL_Rep__c","RD_Rep__c","Commission__c","Approved_Sample__c","Order_Complete_Date__c","Notes__c"
"006A000000AuE5KIAV","0","001A000000TuWIGIA3","0","Component Distriburtors Inc.-Inductor-RL-1950 RL-1952","Ship to Customer - Zurvahn Corp, FL
End Customer - Accu-Sort Systems
Project - Danaher","Quote Request","4","","65","","","2011-01-13 00:00:00","New Project","","","0","0","Pipeline","Pipeline","000000000000000AAA","","","005A0000000TXpKIAW","2011-01-06 19:22:33","005A0000000TXpKIAW","2013-11-18 14:34:05","005A0000000TARMIA4","2013-11-18 14:34:12","","","2010","1","","","150","0.000","2. Pipeline (25%)","","","0","","Ship to Customer - Zurvahn Corp, FL
End Customer - Accu-Sort Systems
Project - Danaher","","","","","","","","","","","","","","","","","",""


The above is the header and a line of bad data. The data should be one complete line. Here is a good line of data:

"006A000000Av3sCIAR","0","001A000000TuVh5IAF","0","Aquacal-Transformer-RL-2260-130-24","Renco is raising the sales price.","Quote Request","4","","65","","","2011-01-19 00:00:00","Existing Project","Provide Quote","Outside Sales Rep","0","0","Pipeline","Pipeline","000000000000000AAA","","","005A0000000TXolIAG","2011-01-12 21:30:03","005A0000000TXolIAG","2013-11-18 14:34:05","005A0000000TARMIA4","2013-11-18 14:34:12","2011-03-26 00:00:00","","2010","1","","","4500","13.650","4. Best Case (75%)","Salt system for pool and spa","","0","","Renco is raising the sales price.","","","","","","","","","","","","","","","","","",""

Dennis

RE: Remove "extra" crlf in windows CSV file.

(OP)
Ok, I did what you said above. I then ran the command and all of the output come to the screen and my file was left untouched.

Is this what is supposed to happen? Or am I supposed to redirect the output to a file?

Like I said, I am just learning how to spell akw.

Thanks

RE: Remove "extra" crlf in windows CSV file.

(OP)
ok - did that and it output the data to a file. Interesting this is when I looked at the output, the new file was bigger in size according to the directory.
Original file 9,214k and had 18,548 lines
New file is 9,232k and has 18,411 lines.

Also, there are multiple CRLF in a given record. The above process only removed the first. So do I have to run this process multiple time, once on each new output file until I remove all of the CR LF?

Also, some lines have just a entry LF in the data. How does the process know which LF is extra and which is the actual "end of record marker"?

Thanks,

RE: Remove "extra" crlf in windows CSV file.

(OP)
One other question. How do I replace the extra Cr LF with a ", ". Can I do the same thing with the extra "LF"?

Thanks,

Red Flag This Post

Please let us know here why this post is inappropriate. Reasons such as off-topic, duplicates, flames, illegal, vulgar, or students posting their homework.

Red Flag Submitted

Thank you for helping keep Tek-Tips Forums free from inappropriate posts.
The Tek-Tips staff will check this out and take appropriate action.

Reply To This Thread

Posting in the Tek-Tips forums is a member-only feature.

Click Here to join Tek-Tips and talk with other members!

Resources

Close Box

Join Tek-Tips® Today!

Join your peers on the Internet's largest technical computer professional community.
It's easy to join and it's free.

Here's Why Members Love Tek-Tips Forums:

Register now while it's still free!

Already a member? Close this window and log in.

Join Us             Close