How to handle hidden characters?

lcs01 · Nov 30, 2006

Hi, Experts,

I am having trouble in parsing a client ascii file, because I don't know how to hanle hidden characters in the file.

Following are the detail info about two files, named as file1 & file2, respectively. File1 is a client file. File2 is edited in house using vi on linux (Ubuntu).

Code:

[% 569] => file file1 file2
text/plain; charset=us-ascii
text/plain; charset=us-ascii
[% 570] => ls -l file1 file2
-rw-r--r-- 1 nobody nobody 24 2006-11-30 10:41 file1
-rw-r--r-- 1 nobody nobody 22 2006-11-30 10:41 file2
[% 571] => cat file1
name=test
endCaseInfo
[% 572] => cat file2
name=test
endCaseInfo

So, as you can see, file1 has two more characters than file2.

I wrote a small and simple perl program named as 'tt.pl' to parse these two files:

Code:

#! /usr/bin/perl

my $srce = "file1";
open(FH, "$srce") || die "Can not open file '$srce': $!\n";
while(<FH>) {
  chomp($_);
  my $len = length($_);
  print "\$len = $len\n";
  print "$_\n";
  print "$_##\n";
}
close(FH);

print "\n";
my $srce = "file2";
open(FH, "$srce") || die "Can not open file '$srce': $!\n";
while(<FH>) {
  chomp($_);
  my $len = length($_);
  print "\$len = $len\n";
  print "$_\n";
  print "$_##\n";
}
close(FH);
exit;

Here is the output:

Code:

[% 573] => ./tt.pl
$len = 10
name=test
##me=test
$len = 12
endCaseInfo
##dCaseInfo

$len = 9
name=test
name=test##
$len = 11
endCaseInfo
endCaseInfo##

Could someone please tell me what kind of hidden variables are in file1 and how to handle them? Many thanks!

stevexff · Nov 30, 2006

Looks like file1 is created on Windows, with CRLF at the end of each line. file2 is created on your linux box with vi, and hence uses the LF character to mark the end of the lines. To check, make a copy of file1 and run it through dos2unix, which you ought to have on your distro, to see if it fixes it...

Steve

[small]"Every program can be reduced by one instruction, and every program has at least one bug. Therefore, any program can be reduced to one instruction which doesn't work." (Object:erlDesignPatterns)[/small]

KevinADC · Nov 30, 2006

yea, probaly one file has '\r\n' line endings and the other has '\n'.

- Kevin, perl coder unexceptional!

lcs01 · Nov 30, 2006

It's indeed has '\r\n' line endings. I just learnt that I could use 'hexdump -c {$filename}' to see hidden characters.

lcs01 · Nov 30, 2006

dos2unix would also fix it. Thank you both.

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

How to handle hidden characters?

lcs01

Programmer

stevexff

Programmer

KevinADC

Technical User

lcs01

Programmer

lcs01

Programmer

Similar threads

Part and Inventory Search

Sponsor