Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations TouchToneTommy on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Parsing CSV file 2

Status
Not open for further replies.

stevio

Vendor
Jul 24, 2002
78
AU
Have script which parses a CSV file which looks like the following

Date and Time Number Total I/O /sec More columns
08 09 2007 18:00:00 00:00 2.2733333
08 09 2007 18:00:00 00:01 0.033333335
08 09 2007 18:00:00 00:02 0.036666665
08 09 2007 18:00:00 00:03 1.1433333
08 09 2007 18:00:00 00:04 0
08 09 2007 18:00:00 00:05 0
08 09 2007 18:00:00 00:06 10.453333
08 09 2007 18:00:00 00:07 2.26
08 09 2007 18:00:00 00:08 1.0166667
08 09 2007 18:00:00 00:09 441.14
08 09 2007 18:00:00 00:0A 0.006666667
.....
08 09 2007 18:05:00 00:00 2.9333334
08 09 2007 18:05:00 00:01 0.39333335
08 09 2007 18:05:00 00:02 0.096666664
08 09 2007 18:05:00 00:03 1.2966666
08 09 2007 18:05:00 00:04 0.093333334
08 09 2007 18:05:00 00:05 1.8733333
08 09 2007 18:05:00 00:06 8.63
08 09 2007 18:05:00 00:07 2.9
08 09 2007 18:05:00 00:08 1.27
08 09 2007 18:05:00 00:09 1.1766666
08 09 2007 18:05:00 00:0A 0.33
.....
etc etc

The thing that changes is the time stamp, the number column has the format xx:yy, where xx is fixed starting at 00 until yy(hex) gets to FF in hex, and then it turns to 01 and so on. It then starts from 01:00


Code:
 #!/usr/bin/perl
    use Text::CSV_XS;
    my $file = 'mycsv.csv';
    my $csv = Text::CSV_XS->new();

    open (CSV, "<", $file) or die $!;
    
    while (<CSV>) {       
    next if ($. == 1);
    if ($csv->parse($_)) {
    @columns =  $csv->fields();                                 
        $i = 0;
        while ($i <= 10) {
        
        $hexval = uc(sprintf("%x",$i)); #uppercase hex
        if (length($hexval) eq 1){ # if $hexval is length  of 1, then pad with '0'
        $hexval = uc(sprintf("0%x",$i));
        }                  
         $val = "00".":".$hexval; #concat 00:$hexval
         if ($columns[1] eq $val){
             print "$columns[0]\t$columns[1]\t$columns[2]\n";
         # want to process some data here
          }
         $i++;                                                             
        }
               
      }
      else {
      my $err = $csv->error_input;
      print "Failed to parse line: $err";
      }
         
   
   }
    
   close CSV;

The script works, but not in the format that I want. Currently, it is producing the data in following format with only the columns I want

Date and Time Number Total I/O /sec
08 09 2007 18:00:00 00:00 2.2733333
08 09 2007 18:00:00 00:01 0.033333335
08 09 2007 18:00:00 00:02 0.036666665
08 09 2007 18:00:00 00:03 1.1433333
08 09 2007 18:00:00 00:04 0
08 09 2007 18:00:00 00:05 0
08 09 2007 18:00:00 00:06 10.453333
08 09 2007 18:00:00 00:07 2.26
08 09 2007 18:00:00 00:08 1.0166667
08 09 2007 18:00:00 00:09 441.14
08 09 2007 18:00:00 00:0A 0.006666667
.....
08 09 2007 18:05:00 00:00 2.9333334
08 09 2007 18:05:00 00:01 0.39333335
08 09 2007 18:05:00 00:02 0.096666664
08 09 2007 18:05:00 00:03 1.2966666
08 09 2007 18:05:00 00:04 0.093333334
08 09 2007 18:05:00 00:05 1.8733333
08 09 2007 18:05:00 00:06 8.63
08 09 2007 18:05:00 00:07 2.9
08 09 2007 18:05:00 00:08 1.27
08 09 2007 18:05:00 00:09 1.1766666
08 09 2007 18:05:00 00:0A 0.33
.....

But what I want is to have all the same number values together eg

Date and Time Number Total I/O /sec
08 09 2007 18:00:00 00:00 2.2733333
08 09 2007 18:05:00 00:00 2.9333334
08 09 2007 18:10:00 00:00 3.2300034
....
08 09 2007 18:00:00 00:01 0.033333335
08 09 2007 18:05:00 00:01 0.39333335
08 09 2007 18:10:00 00:01 0.5500034

Any ideas how I can do this?

Any help would be appreciated as it is frustrating me no end.
 
Here's a quick example that may give you a way to go.
Code:
my %data;
while (<DATA>) {
	chomp(my @columns = split /\s{2,}/, $_); # @columns =  $csv->fields();
	push @{$data{$columns[1]}}, [@columns[0..2]];
}

# After parsing the text file 
my @hex_list = map { sprintf '%02X', $_ } 0..255;
foreach my $x (@hex_list) {
	foreach my $y (@hex_list) {
		if (defined $data{"$x:$y"}) {
			foreach my $record (@{$data{"$x:$y"}}) {
				print join("\t", @$record), "\n";
			}
			print "\n";
		}
	}
}
 
You need either use your code, but doing repetitive opens and reads of the file at every new value of [tt]$i[/tt] (not efficient), or you must read the whole file into an array indexed with the number column as the key and collecting all the lines with the same number column into subarrays.
How big are your files?
There are other (minor) performance issues with your code:
-to skip the first line just say [tt]$_=<CSV>;[/tt] after the [tt]open[/tt]
-to format [tt]$i[/tt] into the number column simply use
[tt]$val=sprintf("%04X",$i);
substr($val,2,0,':');[/tt]
-to reverse
[tt]substr($val,2,1,'');
$i=hex$val;[/tt]
BTW can't see the commas in the original data?

Franco
: Online engineering calculations
: Magnetic brakes for fun rides
: Air bearing pads
 
rharsh, I've tried your sample code and I can't get it to work. It returns nothing.

prex1, the files range from 500KB to 1MB, so not huge. yeah I realize I need to index by the number column, just not sure how to do it.

commas in original data - the original data was a csv file, I just posted it wrong.

Date and Time,Number,Total I/O /sec,....
08 09 2007 18:00:00,00:00,2.2733333
08 09 2007 18:00:00,00:01,0.033333335

 
See if this helps:
Code:
use Text::CSV_XS;
open IN, "< sample.csv" or die "cannot open input file.\n$!";
my $header = <IN>;
my $csv = Text::CSV_XS->new;
my %data;

while (my $line = <IN>) {
	if ($csv->parse($line)) {
		my @fields = $csv->fields();
		push @{$data{$fields[1]}}, [@fields[0..2]];
	} else {
		warn "Failed to parse line: $line";
	}
}

my @hex_list = map { sprintf '%02X', $_ } 0..255;
foreach my $x (@hex_list) {
	foreach my $y (@hex_list) {
		if (defined $data{"$x:$y"}) {
			foreach my $record (@{$data{"$x:$y"}}) {
				print join("\t", @$record), "\n";
			}
			print "\n";
		}
	}
}
And sample.csv looks like:
Code:
Date and Time,Number,Total I/O /sec
08 09 2007 18:00:00,00:00,2.2733333
08 09 2007 18:00:00,00:01,0.033333335
... and so on.
 
rharsh, awesome, works like a charm. Have a star :)
 
Great, I'm glad it worked for you and thanks for the star.
 
Hi all,

Just revisiting this again, but need some rework

The required out now needs to be like:

Original data

Code:
Date and Time          Number    Total I/O /sec  Reads/Sec more n columns
08 09 2007 18:00:00    00:00    2.2733333        0
08 09 2007 18:00:00    00:01    0.033333335      123
08 09 2007 18:00:00    00:02    0.036666665      65
08 09 2007 18:00:00    00:03    1.1433333        34
08 09 2007 18:00:00    00:04    0                34
08 09 2007 18:00:00    00:05    0                12
08 09 2007 18:00:00    00:06    10.453333        0
08 09 2007 18:00:00    00:07    2.26             0
08 09 2007 18:00:00    00:08    1.0166667        0
08 09 2007 18:00:00    00:09    441.14           0
08 09 2007 18:00:00    00:0A    0.006666667      0
.....
08 09 2007 18:05:00    00:00    2.9333334        23
08 09 2007 18:05:00    00:01    0.39333335       5
08 09 2007 18:05:00    00:02    0.096666664      etc
08 09 2007 18:05:00    00:03    1.2966666
08 09 2007 18:05:00    00:04    0.093333334
08 09 2007 18:05:00    00:05    1.8733333
08 09 2007 18:05:00    00:06    8.63
08 09 2007 18:05:00    00:07    2.9
08 09 2007 18:05:00    00:08    1.27
08 09 2007 18:05:00    00:09    1.1766666
08 09 2007 18:05:00    00:0A    0.33
.....
etc etc

Required data
Code:
Total I/O /Sec
Date and Time          00:00      00:01       00:02 ...
08 09 2007 18:00:00    2.2733333  0.033333335
08 09 2007 18:05:00    2.9333334  0.39333335
....etc

Reads/Sec
Date and Time          00:00      00:01   00:02 ...
08 09 2007 18:00:00    0          123
08 09 2007 18:05:00    23         5
...etc

next column

So, all I'm doing is using the number(or whatever is in the second column) as the column headings. The data itself needs to come from each column of the original data.

Hope this makes sense.

Any help would be much appreciated. Stevio
 
Did you try modifying the code yourself? What are you having probelms with?
 
I've tried modifying the code. The only thing I managed to get working was

Code:
push @{$data{$fields[1]}}, [@fields[0..5]];

to include more columns

but I can't work out how to transpose the individual column data and organise by the column 2 (whatever it is)

So far I've put some pseudocode:
Code:
foreach my $records(@{$data{$fields[1]}}){
    #make array of column 2
    #print column 2 as heading separated by tabs
    #print each column row - stuck here
  
}

Can't wrap my head around re-org that data into the right format.
 
You need to create a hash of hashes, where the first key is the first column, and the subkey the second one. Column headings need be stored in a separate hash, that can be used also to determine column widths.
The following code will handle the column widths to correctly align them. You'll need to add some headings and determine how many data to print.
Code:
my($key1,$key2,%data,%collen);
while(<DATA>){
  $key1=substr($_,0,19);
  $key2=substr($_,23,5);
  $_=substr($_,32);
  $data{$key1}{$key2}=[split];
  my@lengths=map{length}@{$data{$key1}{$key2}};
  for(@lengths){$_=5 if$_<5}
  if(exists$collen{$key2}){
    for(my$i=0;$i<@lengths;$i++){
      $collen{$key2}[$i]=$lengths[$i]if$lengths[$i]>$collen{$key2}[$i];
    }
  }else{
    $collen{$key2}=\@lengths;
  }
}
for(my$i=0;$i<@{$collen{$key2}};$i++){
  print"Date and Time          ";
  for(sort keys%collen){
    printf"%-$collen{$_}[$i]s ",$_;
  }
  print"\n";
  for $key1(sort keys%data){
    print$key1,'    ';
    for(sort keys%collen){
      printf"%-$collen{$_}[$i]s ",$data{$key1}{$_}[$i];
    }
    print"\n";
  }
  print"\n";
}
__DATA__
08 09 2007 18:00:00    00:00    2.2733333        0
08 09 2007 18:00:00    00:01    0.033333335      123
08 09 2007 18:00:00    00:02    0.036666665      65
08 09 2007 18:00:00    00:03    1.1433333        34
08 09 2007 18:00:00    00:04    0                34
08 09 2007 18:00:00    00:05    0                12
08 09 2007 18:00:00    00:06    10.453333        0
08 09 2007 18:00:00    00:07    2.26             0
08 09 2007 18:00:00    00:08    1.0166667        0
08 09 2007 18:00:00    00:09    441.14           0
08 09 2007 18:00:00    00:0A    0.006666667      0
08 09 2007 18:05:00    00:00    2.9333334        23
08 09 2007 18:05:00    00:01    0.39333335       5
08 09 2007 18:05:00    00:02    0.096666664      etc
08 09 2007 18:05:00    00:03    1.2966666
08 09 2007 18:05:00    00:04    0.093333334
08 09 2007 18:05:00    00:05    1.8733333
08 09 2007 18:05:00    00:06    8.63
08 09 2007 18:05:00    00:07    2.9
08 09 2007 18:05:00    00:08    1.27
08 09 2007 18:05:00    00:09    1.1766666
08 09 2007 18:05:00    00:0A    0.33

Franco
: Online engineering calculations
: Magnetic brakes for fun rides
: Air bearing pads
 
hi prex1,

thanks, awesome job. Eventually want to take each block of data and create Excel tabs for each one (probably Perl OLE) so I can graph the result, but this has gone a long way to help.

Code:
Date and Time          00:00 00:01 00:02 00:03 00:04 00:05 00:06 00:07 00:08 00:09 00:0A
08 09 2007 18:00:00    0     123   65    34    34    12    0     0     0     0     0
08 09 2007 18:05:00    23    5     etc
 
my mistake, the original format was a csv file:

Code:
Date and Time,Number,Total I/O /sec,Reads/Sec
08 09 2007 18:00:00,00:00,2.2733333,0
08 09 2007 18:00:00,00:01,0.033333335,123
08 09 2007 18:00:00,00:02,0.036666665,65
08 09 2007 18:00:00,00:03,1.1433333,34
08 09 2007 18:00:00,00:04,0,34

so will have to integrate rharsh's original script to read the csv with this script....
 
You just need to replace the first 5 lines after [tt]while(<DATA>){[/tt] by
Code:
  chomp;
  my@row=split/,/;
  $key1=shift@row;
  $key2=shift@row;
  $data{$key1}{$key2}=\@row;
  my@lengths=map{length}@row;
Note however that this won't work if in the file you have quoted strings possibly containing commas. Otherwise you can go on with it.

Franco
: Online engineering calculations
: Magnetic brakes for fun rides
: Air bearing pads
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top