Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations TouchToneTommy on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Perl with CSV file 3

Status
Not open for further replies.

Cybershmuck

Technical User
Jan 21, 2005
21
Hi,

I’ve got a CSV file with the following information that I want to cycle through each line and split the values up by commas. The first line is the header record

BSB,Account Number, Amount (Gross),Account Name,Memo

62919,10213139,$203.40,A Wiesner,
112879,121473373,$125.55,AJ Bagala,
244000,106685045,"$7,605.40",Dell Australia Pty Ltd,
83091,466155928,$287.10,Digiguard.net Pty Ltd,
,,
,,
,,

How do I skip lines with blanks and how do I strip off trailing commas of each record?
Here’s my attempt of the code:

Code:
my $fieldnames = 1;
my $header = <IN> if $fieldnames;
@data =  <IN>;
foreach my $currentline (@data)
{

#Need some code her to strip off trailing commas

next if (index($currentline, ",,") > -1); #Skip any lines with just ,, in it	

($bsb, $acct_num, $amt, $acct_name, $memo) = split /,/,  
$currentline;

Thanks in advance :)
 
Take a look at the Text::CSV_XS module. Most of the work for you has been taken care of in that module.

- Rieekan
 
Routine got from perl cookbook:
It works very well

#!/opt/nokianms/bin/perl
open (CHECKBOOK, "test.csv");
open( OUTFILE, "> text.txt" )
or die( "Can not open test.txt: $!" );

while ($file = <CHECKBOOK>) {
@fields = parse_csv($file);
$first_field=$fields[0];
$second_field=$fields[1];
$third_field=$fields[2];
$four_field=$fields[3];
print OUTFILE "FIRSTFIELD=$first_field,SECONDFIELD:$second_field,THIRDFIELD:$third_field\n";
}


sub parse_csv {
my $text = shift; # record containing comma-separated values
my @new = ();
push(@new, $+) while $text =~ m{
"([^\"\\]*(?:\\.[^\"\\]*)*)",?
| ([^,]+),?
| ,
}gx;
push(@new, undef) if substr($text, -1,1) eq ',';
return @new; # list of values that were comma-separated
}


close(CHECKBOOK);
close(OUTFILE);
 
To remove leading commas:

$_ =~ s/^[\s*\,]+//;

To remove trailing commas:

$_ =~ s/[\s*\,]+$//;

To skip blank lines:

next if (/^[\s*\,]*$/);


Michael Libeson
 
Awk:
Code:
# Set field-separator.
BEGIN { FS=OFS="," }

1==NR { header=$0; next }

# Remove trailing commas.
{ sub(/,+$/, "") }
# Skip empty lines.
0==NF { next } 

{ # To ease parsing, quote all fields.
  quote_all()
  gsub(/^"|"$/, "")
  split($0, a, /","/ )
  bsb = a[1]
  acct_num = a[2]
  amt = a[3]
  acct_name = a[4]
  memo = a[5]
}

function quote_all(     i,q )
{ q = "\""
  while (++i <= NF)
  { if ( $i ~ /^"/ )  q = ""
    $i = q $i q
    if ( $i ~ /"$/ )  q = "\""
  }
}
 
Cool thanks for your input ppl. I'll chekc it out :)
 
bluegroper,

Yeah if they're not 6 digits then I automatically add a leading 0 to the number
 
Ok now I’ve got something here that’s sort of working but not really that reliable. Need an effective way to strip all trailing commas.
At the moment I use :

next if (index($currentline, ",,") == 0 );

to skip each record that starts with “,,”. I need a way to skip a record if any number of commas appear in anywhere in the line not just at index = 0.
However, I can not just use:

Next if (index($currentline, “,,”) > -1)

incase it skips records with traling commas after four valid field values.

Code:
my $fieldnames = 1;
my $header = <IN> if $fieldnames;

@data = <IN>;

foreach my $currentline (@data)
{
	next if $currentline < 1; #Skip blank lines

	
	# Skips the record if it starts with ";;"
	next if (index($currentline, ",,") == 0 );

	
	($bsb, $acct_num, $amt, $acct_name, $memo) = split /,/, $currentline;
}
 
Code:
#!perl
use strict;
use warnings;

my @data = <DATA>;
[b]chomp @data;[/b]
my $fieldnames = 1;
[b]my $header = shift @data if $fieldnames;[/b]

foreach my $currentline (@data)
{
    [b]$currentline =~ s/,+$//; #get rid of trailing commas
    next unless $currentline; #Skip blank lines[/b]
    my ($bsb, $acct_num, $amt, $acct_name, $memo) = split /,/, $currentline;
}

__DATA__
BSB,Account Number, Amount (Gross),Account Name,Memo

62919,10213139,$203.40,A Wiesner,
112879,121473373,$125.55,AJ Bagala,
244000,106685045,"$7,605.40",Dell Australia Pty Ltd,
83091,466155928,$287.10,Digiguard.net Pty Ltd,
,,
,,
,,
HTH

 
Does mike's code handle quoted fields that contain commas?

A Ruby version that does:
Code:
class String
  def parse_csv
    a = self.scan(
      /(?:"((?:(?:\\.)*[^"\\]*)*)"|([^,]*))(?:,|$)/
      ).flatten
    a.delete(nil)
    a
  end
end
$, = '|'
ARGF.each_line { | line |
  line.chomp!
  # First line is header.
  if 1 == $.
    puts line;  next
  end
  ary = (line+",,,,").parse_csv
  # If none of the 1st 4 items are empty strings...
  if ary.index("") > 3
    bsb, acct_num, amt, acct_name, memo = ary
    print bsb, acct_num, amt, acct_name, memo + "\n"
  end
}


 
Fair point, cntr. This handles quoted fields that contain commas:
Code:
#!perl
use strict;
use warnings;
use Text::CSV;

my $header;
my $fieldnames = 1;

my $csv = Text::CSV->new();
while (<DATA>) {
    chomp;
    s/,+$//; #get rid of trailing commas
    /^\s*$/ && next; #Skip blank lines
    unless ($csv->parse($_)) {
        warn qq(Error trying to parse "$csv->error_input"\n);
        next;
    }
    if ($. == 1 && $fieldnames) {
        $header = $csv->string();
    } else {
        my ($bsb, $acct_num, $amt, $acct_name, $memo) = $csv->fields();
    }
}

__DATA__
BSB,Account Number, Amount (Gross),Account Name,Memo

62919,10213139,$203.40,A Wiesner,
112879,121473373,$125.55,AJ Bagala,
244000,106685045,"$7,605.40",Dell Australia Pty Ltd,
83091,466155928,$287.10,Digiguard.net Pty Ltd,
,,
,,
,,
I see you now post Ruby solutions as well as awk. Any chance you'll ever post anything in Perl in this forum?

 
Code:
;
$Perl?;
@What is $that?;
;
A <language>?;
;;
 
cntr,

come on in, the water's lovely

--Paul B-)

cigless ...
 
cntr, I think the variable a isn't needed
in String#parse_csv:

Code:
class String
  def parse_csv
    self.scan(
      /(?:"((?:(?:\\.)*[^"\\]*)*)"|([^,]*))(?:,|$)/
      ).flatten.compact
  end
end
 
langi said:
I think the variable a isn't needed
You're right! And it's good to meet a Ruby-user here.

PaulTEG said:
come on in, the water's lovely
Paul, that looks more like Awk or Ruby than Perl. It should be something like:

;$come on <IN>,; $$the @water's $lovely.;
 
cntr,

you really need to get out more

--Paul

cigless ...
 
Lol this thread has gone of the rails..

Ok here's something I need to do. I need to get rid of all commas in a currency value

eg

244000,106685045,"$7,605.40",Dell Australia Pty Ltd,

At the moment I can only script out one comma but if the figure reaches > 1000000 then another comma will be added.
Need to an effective way to script out all commes for a currency figure.

Thanks in advance :)
 
Code:
gsub(/,/, "", currency)
And here's an improved csv parser in Awk. The only thing it doesn't handle is records that contain linefeeds.
Code:
{
  parse_csv( $0, rec )
  printf "["
  sep = ""
  for (i=1;i in rec; i++)
  { printf "%s<%s>", sep, rec[i]
    sep = ", "
  }
  print "]" 
}

function parse_csv( str, array,    field,i )
{ split( "", array )
  str = str ","
  while ( match(str,
    /[ \t]*("[^"]*(""[^"]*)*"|[^,]*)[ \t]*,/) )
  { field = substr( str, 1, RLENGTH )
    gsub( /^[ \t]*"?|"?[ \t]*,$/, "", field )
    gsub( /""/, "\"", field )
    array[++i] = field
    str = substr( str, RLENGTH + 1 )
  }
}
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top