Perl with CSV file 3

Cybershmuck · Jan 21, 2005

Hi,

I’ve got a CSV file with the following information that I want to cycle through each line and split the values up by commas. The first line is the header record

BSB,Account Number, Amount (Gross),Account Name,Memo

62919,10213139,$203.40,A Wiesner,
112879,121473373,$125.55,AJ Bagala,
244000,106685045,"$7,605.40",Dell Australia Pty Ltd,
83091,466155928,$287.10,Digiguard.net Pty Ltd,
,,
,,
,,

How do I skip lines with blanks and how do I strip off trailing commas of each record?
Here’s my attempt of the code:

Code:

my $fieldnames = 1;
my $header = <IN> if $fieldnames;
@data =  <IN>;
foreach my $currentline (@data)
{

#Need some code her to strip off trailing commas

next if (index($currentline, ",,") > -1); #Skip any lines with just ,, in it	

($bsb, $acct_num, $amt, $acct_name, $memo) = split /,/,  
$currentline;

Thanks in advance

Rieekan · Jan 21, 2005

Take a look at the Text::CSV_XS module. Most of the work for you has been taken care of in that module.

- Rieekan

dmazzini · Jan 21, 2005

Routine got from perl cookbook:
It works very well

#!/opt/nokianms/bin/perl
open (CHECKBOOK, "test.csv");
open( OUTFILE, "> text.txt" )
or die( "Can not open test.txt: $!" );

while ($file = <CHECKBOOK>) {
@fields = parse_csv($file);
$first_field=$fields[0];
$second_field=$fields[1];
$third_field=$fields[2];
$four_field=$fields[3];
print OUTFILE "FIRSTFIELD=$first_field,SECONDFIELD:$second_field,THIRDFIELD:$third_field\n";
}

sub parse_csv {
my $text = shift; # record containing comma-separated values
my @new = ();
push(@new, $+) while $text =~ m{
"([^\"\\]*(?:\\.[^\"\\]*)*)",?
| ([^,]+),?
| ,
}gx;
push(@new, undef) if substr($text, -1,1) eq ',';
return @new; # list of values that were comma-separated
}

close(CHECKBOOK);
close(OUTFILE);

mlibeson · Jan 21, 2005

To remove leading commas:

$_ =~ s/^[\s*\,]+//;

To remove trailing commas:

$_ =~ s/[\s*\,]+$//;

To skip blank lines:

next if (/^[\s*\,]*$/);

Michael Libeson

cntr · Jan 21, 2005

Awk:

Code:

# Set field-separator.
BEGIN { FS=OFS="," }

1==NR { header=$0; next }

# Remove trailing commas.
{ sub(/,+$/, "") }
# Skip empty lines.
0==NF { next } 

{ # To ease parsing, quote all fields.
  quote_all()
  gsub(/^"|"$/, "")
  split($0, a, /","/ )
  bsb = a[1]
  acct_num = a[2]
  amt = a[3]
  acct_name = a[4]
  memo = a[5]
}

function quote_all(     i,q )
{ q = "\""
  while (++i <= NF)
  { if ( $i ~ /^"/ )  q = ""
    $i = q $i q
    if ( $i ~ /"$/ )  q = "\""
  }
}

bluegroper · Jan 21, 2005

Be careful because I recall that BSB's MUST be 6 digits.

Cybershmuck · Jan 21, 2005

Cool thanks for your input ppl. I'll chekc it out

Cybershmuck · Jan 21, 2005

bluegroper,

Yeah if they're not 6 digits then I automatically add a leading 0 to the number

Cybershmuck · Jan 22, 2005

Ok now I’ve got something here that’s sort of working but not really that reliable. Need an effective way to strip all trailing commas.
At the moment I use :

next if (index($currentline, ",,") == 0 );

to skip each record that starts with “,,”. I need a way to skip a record if any number of commas appear in anywhere in the line not just at index = 0.
However, I can not just use:

Next if (index($currentline, “,,”) > -1)

incase it skips records with traling commas after four valid field values.

Code:

my $fieldnames = 1;
my $header = <IN> if $fieldnames;

@data = <IN>;

foreach my $currentline (@data)
{
	next if $currentline < 1; #Skip blank lines

	
	# Skips the record if it starts with ";;"
	next if (index($currentline, ",,") == 0 );

	
	($bsb, $acct_num, $amt, $acct_name, $memo) = split /,/, $currentline;
}

mikevh · Jan 22, 2005

Code:

#!perl
use strict;
use warnings;

my @data = <DATA>;
[b]chomp @data;[/b]
my $fieldnames = 1;
[b]my $header = shift @data if $fieldnames;[/b]

foreach my $currentline (@data)
{
    [b]$currentline =~ s/,+$//; #get rid of trailing commas
    next unless $currentline; #Skip blank lines[/b]
    my ($bsb, $acct_num, $amt, $acct_name, $memo) = split /,/, $currentline;
}

__DATA__
BSB,Account Number, Amount (Gross),Account Name,Memo

62919,10213139,$203.40,A Wiesner,
112879,121473373,$125.55,AJ Bagala,
244000,106685045,"$7,605.40",Dell Australia Pty Ltd,
83091,466155928,$287.10,Digiguard.net Pty Ltd,
,,
,,
,,

HTH

cntr · Jan 22, 2005

Does mike's code handle quoted fields that contain commas?

A Ruby version that does:

Code:

class String
  def parse_csv
    a = self.scan(
      /(?:"((?:(?:\\.)*[^"\\]*)*)"|([^,]*))(?:,|$)/
      ).flatten
    a.delete(nil)
    a
  end
end
$, = '|'
ARGF.each_line { | line |
  line.chomp!
  # First line is header.
  if 1 == $.
    puts line;  next
  end
  ary = (line+",,,,").parse_csv
  # If none of the 1st 4 items are empty strings...
  if ary.index("") > 3
    bsb, acct_num, amt, acct_name, memo = ary
    print bsb, acct_num, amt, acct_name, memo + "\n"
  end
}

mikevh · Jan 22, 2005

Fair point, cntr. This handles quoted fields that contain commas:

Code:

#!perl
use strict;
use warnings;
use Text::CSV;

my $header;
my $fieldnames = 1;

my $csv = Text::CSV->new();
while (<DATA>) {
    chomp;
    s/,+$//; #get rid of trailing commas
    /^\s*$/ && next; #Skip blank lines
    unless ($csv->parse($_)) {
        warn qq(Error trying to parse "$csv->error_input"\n);
        next;
    }
    if ($. == 1 && $fieldnames) {
        $header = $csv->string();
    } else {
        my ($bsb, $acct_num, $amt, $acct_name, $memo) = $csv->fields();
    }
}

__DATA__
BSB,Account Number, Amount (Gross),Account Name,Memo

62919,10213139,$203.40,A Wiesner,
112879,121473373,$125.55,AJ Bagala,
244000,106685045,"$7,605.40",Dell Australia Pty Ltd,
83091,466155928,$287.10,Digiguard.net Pty Ltd,
,,
,,
,,

I see you now post Ruby solutions as well as awk. Any chance you'll ever post anything in Perl in this forum?

cntr · Jan 23, 2005

Code:

;
$Perl?;
@What is $that?;
;
A <language>?;
;;

mikevh · Jan 23, 2005

perldoc perl

PaulTEG · Jan 23, 2005

cntr,

come on in, the water's lovely

--Paul B-)

cigless ...

langi · Jan 29, 2005

cntr, I think the variable a isn't needed
in String#parse_csv:

Code:

class String
  def parse_csv
    self.scan(
      /(?:"((?:(?:\\.)*[^"\\]*)*)"|([^,]*))(?:,|$)/
      ).flatten.compact
  end
end

cntr · Jan 29, 2005

langi said:
I think the variable a isn't needed

You're right! And it's good to meet a Ruby-user here.

PaulTEG said:
come on in, the water's lovely

Paul, that looks more like Awk or Ruby than Perl. It should be something like:

;$come on <IN>,; $$the @water's $lovely.;

PaulTEG · Jan 29, 2005

cntr,

you really need to get out more

--Paul

cigless ...

Cybershmuck · Feb 10, 2005

Lol this thread has gone of the rails..

Ok here's something I need to do. I need to get rid of all commas in a currency value

eg

244000,106685045,"$7,605.40",Dell Australia Pty Ltd,

At the moment I can only script out one comma but if the figure reaches > 1000000 then another comma will be added.
Need to an effective way to script out all commes for a currency figure.

Thanks in advance

cntr · Feb 10, 2005

Code:

gsub(/,/, "", currency)

And here's an improved csv parser in Awk. The only thing it doesn't handle is records that contain linefeeds.

Code:

{
  parse_csv( $0, rec )
  printf "["
  sep = ""
  for (i=1;i in rec; i++)
  { printf "%s<%s>", sep, rec[i]
    sep = ", "
  }
  print "]" 
}

function parse_csv( str, array,    field,i )
{ split( "", array )
  str = str ","
  while ( match(str,
    /[ \t]*("[^"]*(""[^"]*)*"|[^,]*)[ \t]*,/) )
  { field = substr( str, 1, RLENGTH )
    gsub( /^[ \t]*"?|"?[ \t]*,$/, "", field )
    gsub( /""/, "\"", field )
    array[++i] = field
    str = substr( str, RLENGTH + 1 )
  }
}

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Perl with CSV file 3

Technical User

Programmer

Programmer

Programmer

Instructor

Technical User

Technical User

Technical User

Technical User

Programmer

Instructor

Programmer

Instructor

Programmer

Technical User

Programmer

Instructor

Technical User

Technical User

Instructor

Similar threads

Log in

Part and Inventory Search

Sponsor