×
INTELLIGENT WORK FORUMS
FOR COMPUTER PROFESSIONALS

Log In

Come Join Us!

Are you a
Computer / IT professional?
Join Tek-Tips Forums!
  • Talk With Other Members
  • Be Notified Of Responses
    To Your Posts
  • Keyword Search
  • One-Click Access To Your
    Favorite Forums
  • Automated Signatures
    On Your Posts
  • Best Of All, It's Free!
  • Students Click Here

*Tek-Tips's functionality depends on members receiving e-mail. By joining you are opting in to receive e-mail.

Posting Guidelines

Promoting, selling, recruiting, coursework and thesis posting is forbidden.

Students Click Here

Jobs

CRLF in middle of pipe delimited file

CRLF in middle of pipe delimited file

CRLF in middle of pipe delimited file

(OP)
Hi,

I have a client who has sent in a pipe delimited file however one of the fields they sent seems to containg CRLF in the middle of the field. The end of record marker is also a CRLF.

What are your thoughts on the best way to handle this?

If we know number of columns from header could then read line by line and concatenate string until # of fields is hit and move on.

Any other way to do this that you can think of?

Thanks.

Swi

RE: CRLF in middle of pipe delimited file

I would think the simplest method would be to eliminate all the CRLF characters using the replace method, then re-add them where they need to go (if you're talking about string manipulation that is)

For example, a multi line text file, with each line being a record and the CRLF at the end of the line to mark the end:

open "mytextfile.txt" as input as 1
open "newtextfile.txt" as output as 2
while eof(1) = false

line input #1,origregord 'stored the line in a string
newline = replace(origrecord,"CRLF","") 'cuts out all instances of CRLF in the line
newline = newline + "CRLF"
print #2,newline

wend
close 1
close 2

This would only work if the records were individual lines in the file and could be read separately.
If it's all together in one string of characters, and the delimiter characters for each record are also in the records themselves, then I'm not sure how you'd distinguish them apart.

------------------------------------
yinyang Over 30 years of programming, and still learning every day! yinyang

RE: CRLF in middle of pipe delimited file

(OP)
Unfortunately it is all in one string of characters. So the end of record marker is a CRLF but also several fileds within the file also contain CRLFs. Seems like these file fields came from a free form text field from an online form or database field. Thanks.

Swi

RE: CRLF in middle of pipe delimited file

Yeah, that's a MESS.

I wonder if your client could do a find/replace in their software to weed out all instances of the CRLF characters in the fields and then re-export to file.... Hmmmm....

------------------------------------
yinyang Over 30 years of programming, and still learning every day! yinyang

RE: CRLF in middle of pipe delimited file

(OP)
Yeah, I've asked that but they refused. sad Thanks.

Swi

RE: CRLF in middle of pipe delimited file

If we could see an example of the file, that might help suggest a solution

RE: CRLF in middle of pipe delimited file

Is there a way to identify the first column (e.g. an ID field that can be matched to a Regex)?

"Knowledge is power. Information is liberating. Education is the premise of progress, in every society, in every family." (Kofi Annan)
Oppose SOPA, PIPA, ACTA; measures to curb freedom of information under whatever name whatsoever.

RE: CRLF in middle of pipe delimited file

Valid CSV can contain line breaks in the fields, when the fields are enclosed in the quotes
- for example like this:

line_breaks.csv

CODE

"aa"|"B
b"|"C
C"|"d
D"|"e
E"
foo|bar|baz|spam|eggs 

File like this can be handled well. You can write a small utility which turns your CSV with line breaks into a CSV without line breaks. There are modules for CSV processing for many languages available. Here a short script in python

CODE

import csv, re

# original CSV for reading
csv_file_inp = open("line_breaks.csv","r")
csv_reader = csv.reader(csv_file_inp, delimiter='|')

# corrected CSV for writing
csv_file_out = open("line_breaks_corrected.csv","w")
csv_writer = csv.writer(csv_file_out, delimiter="|")

for row in csv_reader:
  print "reading original row = %s" % row
  # remove line break from every field
  for field in row:
     idx = row.index(field)
     #print "field = '%s'" % field
     # remove line breaks
     field_corrected = re.sub('\n','', field)
     #print "field_corrected = '%s'" % field_corrected
     row[idx] = field_corrected
  print "writing corrected row = %s " % row
  csv_writer.writerow(row)

# at end
csv_file_inp.close()
csv_file_out.close() 

Now when I run this script

CODE

$ python line_breaks.py
reading original row = ['aa', 'B\nb', 'C\nC', 'd\nD', 'e\nE']
writing corrected row = ['aa', 'Bb', 'CC', 'dD', 'eE'] 
reading original row = ['foo', 'bar', 'baz', 'spam', 'eggs']
writing corrected row = ['foo', 'bar', 'baz', 'spam', 'eggs'] 

I get as result the CSV file without line breaks:

line_breaks_corrected.csv

CODE

aa|Bb|CC|dD|eE
foo|bar|baz|spam|eggs 

RE: CRLF in middle of pipe delimited file

This is one of the reasons I asked for an example of the CSV file

RE: CRLF in middle of pipe delimited file

Yesterday I didn't have access to windows, so I could try it only with Linux and Python.
In windows for example powershell have the import-csv cmdlet with the CSV parsing ability.
Now I tried this:

line_breaks.ps1

CODE

# -- functions
function remove_line_break($input_string) {
  $output_string = [RegEx]::replace($input_string,"\r\n","")
  return $output_string
}

function create_csv_line($input_array) {
  $output_string = [String]::join( "|", $input_array)
  return $output_string  
}

# -- main
$csv_input = "line_breaks.csv"
$csv_output = "line_breaks_corrected.csv"

$csv_header = "c1", "c2", "c3", "c4", "c5"
$csv_lines = import-csv -path $csv_input -delimiter "|" -header $csv_header

$line_num = 0
foreach($line in $csv_lines){
  $line_num++
  #echo $line.c1, $line.c2, $line.c3, $line.c4, $line.c5
  # remove line breaks from columns
  $col1 = remove_line_break($line.c1)
  $col2 = remove_line_break($line.c2)
  $col3 = remove_line_break($line.c3)
  $col4 = remove_line_break($line.c4)
  $col5 = remove_line_break($line.c5)
  #echo $col1, $col2, $col3, $col4, $col5

  # create corrected cvs line
  $csv_line_corrected = create_csv_line($col1, $col2, $col3, $col4, $col5)
  #echo $csv_line_corrected

  # write to file
  if ($line_num -eq 1) {
    # replace file
    $csv_line_corrected | out-file $csv_output -encoding default 
  }
  else {
    # append to file
    $csv_line_corrected | out-file $csv_output -encoding default -append
  } 
} 

running this script from powershell

CODE

PS C:\mikrom\Work> .\line_breaks.ps1 
or from windows command line running in my directory

CODE

C:\Windows\System32\WindowsPowerShell\v1.0\powershell.exe  -Windowstyle Hidden -ExecutionPolicy Bypass -STA -File ".\line_breaks.ps1" 
I get the similar result as before with python

RE: CRLF in middle of pipe delimited file

A few days ago I started to learn the R programming language.
The solution of the problem given above is in R very straightforward.

line_breaks.R

CODE

# run:
# Rscript line_breaks.R 

# read CSV input file
csv <- read.csv(file = "line_breaks.csv", sep="|", header=FALSE)

# create matrix from CSV data
csv_matrix <- as.matrix(csv)

print("Original CSV data:")
print(csv_matrix) 

for (i in 1:dim(csv_matrix)[1]) {
  for (j in 1:dim(csv_matrix)[2]) {
    # correct every matrix element
    csv_matrix[i,j] <- gsub("\n","",csv_matrix[i,j])
  }
}

print("Corrected CSV data:")
print(csv_matrix)

# write corrected data into CSV output file
write.table(csv_matrix, file = 'line_breaks_corrected.csv',
  quote = FALSE, sep = "|", row.names = FALSE, col.names = FALSE) 

Running the script

CODE

$ Rscript line_breaks.R
[1] "Original CSV data:"
     V1    V2     V3     V4     V5    
[1,] "aa"  "B\nb" "C\nC" "d\nD" "e\nE"
[2,] "foo" "bar"  "baz"  "spam" "eggs"
[1] "Corrected CSV data:"
     V1    V2    V3    V4     V5    
[1,] "aa"  "Bb"  "CC"  "dD"   "eE"  
[2,] "foo" "bar" "baz" "spam" "eggs" 
creates the CSV file with line breaks removed.


RE: CRLF in middle of pipe delimited file

Ok, since no response on the "If we could see an example of the file" request perhaps you could tell us which version of Excel you have - since Excel 2016 can actually import files like the one mikrom is using perfectly OK … (I suspect Excel 2010 can as well, but I no longer have a copy readily to hand that I can test this on)

RE: CRLF in middle of pipe delimited file

@strongm,
It looks like OP is not interested anymore, maybe the client has sent a simpler CSV file in the meantime.
But still, this thread can be a good guide for someone else with a similar problem.

Red Flag This Post

Please let us know here why this post is inappropriate. Reasons such as off-topic, duplicates, flames, illegal, vulgar, or students posting their homework.

Red Flag Submitted

Thank you for helping keep Tek-Tips Forums free from inappropriate posts.
The Tek-Tips staff will check this out and take appropriate action.

Reply To This Thread

Posting in the Tek-Tips forums is a member-only feature.

Click Here to join Tek-Tips and talk with other members!

Close Box

Join Tek-Tips® Today!

Join your peers on the Internet's largest technical computer professional community.
It's easy to join and it's free.

Here's Why Members Love Tek-Tips Forums:

Register now while it's still free!

Already a member? Close this window and log in.

Join Us             Close