Parsing CSV file 2

stevio · Nov 26, 2008

Have script which parses a CSV file which looks like the following

Date and Time Number Total I/O /sec More columns
08 09 2007 18:00:00 00:00 2.2733333
08 09 2007 18:00:00 00:01 0.033333335
08 09 2007 18:00:00 00:02 0.036666665
08 09 2007 18:00:00 00:03 1.1433333
08 09 2007 18:00:00 00:04 0
08 09 2007 18:00:00 00:05 0
08 09 2007 18:00:00 00:06 10.453333
08 09 2007 18:00:00 00:07 2.26
08 09 2007 18:00:00 00:08 1.0166667
08 09 2007 18:00:00 00:09 441.14
08 09 2007 18:00:00 00:0A 0.006666667
.....
08 09 2007 18:05:00 00:00 2.9333334
08 09 2007 18:05:00 00:01 0.39333335
08 09 2007 18:05:00 00:02 0.096666664
08 09 2007 18:05:00 00:03 1.2966666
08 09 2007 18:05:00 00:04 0.093333334
08 09 2007 18:05:00 00:05 1.8733333
08 09 2007 18:05:00 00:06 8.63
08 09 2007 18:05:00 00:07 2.9
08 09 2007 18:05:00 00:08 1.27
08 09 2007 18:05:00 00:09 1.1766666
08 09 2007 18:05:00 00:0A 0.33
.....
etc etc

The thing that changes is the time stamp, the number column has the format xx:yy, where xx is fixed starting at 00 until yy(hex) gets to FF in hex, and then it turns to 01 and so on. It then starts from 01:00

Code:

 #!/usr/bin/perl
    use Text::CSV_XS;
    my $file = 'mycsv.csv';
    my $csv = Text::CSV_XS->new();

    open (CSV, "<", $file) or die $!;
    
    while (<CSV>) {       
    next if ($. == 1);
    if ($csv->parse($_)) {
    @columns =  $csv->fields();                                 
        $i = 0;
        while ($i <= 10) {
        
        $hexval = uc(sprintf("%x",$i)); #uppercase hex
        if (length($hexval) eq 1){ # if $hexval is length  of 1, then pad with '0'
        $hexval = uc(sprintf("0%x",$i));
        }                  
         $val = "00".":".$hexval; #concat 00:$hexval
         if ($columns[1] eq $val){
             print "$columns[0]\t$columns[1]\t$columns[2]\n";
         # want to process some data here
          }
         $i++;                                                             
        }
               
      }
      else {
      my $err = $csv->error_input;
      print "Failed to parse line: $err";
      }
         
   
   }
    
   close CSV;

The script works, but not in the format that I want. Currently, it is producing the data in following format with only the columns I want

Date and Time Number Total I/O /sec
08 09 2007 18:00:00 00:00 2.2733333
08 09 2007 18:00:00 00:01 0.033333335
08 09 2007 18:00:00 00:02 0.036666665
08 09 2007 18:00:00 00:03 1.1433333
08 09 2007 18:00:00 00:04 0
08 09 2007 18:00:00 00:05 0
08 09 2007 18:00:00 00:06 10.453333
08 09 2007 18:00:00 00:07 2.26
08 09 2007 18:00:00 00:08 1.0166667
08 09 2007 18:00:00 00:09 441.14
08 09 2007 18:00:00 00:0A 0.006666667
.....
08 09 2007 18:05:00 00:00 2.9333334
08 09 2007 18:05:00 00:01 0.39333335
08 09 2007 18:05:00 00:02 0.096666664
08 09 2007 18:05:00 00:03 1.2966666
08 09 2007 18:05:00 00:04 0.093333334
08 09 2007 18:05:00 00:05 1.8733333
08 09 2007 18:05:00 00:06 8.63
08 09 2007 18:05:00 00:07 2.9
08 09 2007 18:05:00 00:08 1.27
08 09 2007 18:05:00 00:09 1.1766666
08 09 2007 18:05:00 00:0A 0.33
.....

But what I want is to have all the same number values together eg

Date and Time Number Total I/O /sec
08 09 2007 18:00:00 00:00 2.2733333
08 09 2007 18:05:00 00:00 2.9333334
08 09 2007 18:10:00 00:00 3.2300034
....
08 09 2007 18:00:00 00:01 0.033333335
08 09 2007 18:05:00 00:01 0.39333335
08 09 2007 18:10:00 00:01 0.5500034

Any ideas how I can do this?

Any help would be appreciated as it is frustrating me no end.

rharsh · Nov 26, 2008

Here's a quick example that may give you a way to go.

Code:

my %data;
while (<DATA>) {
	chomp(my @columns = split /\s{2,}/, $_); # @columns =  $csv->fields();
	push @{$data{$columns[1]}}, [@columns[0..2]];
}

# After parsing the text file 
my @hex_list = map { sprintf '%02X', $_ } 0..255;
foreach my $x (@hex_list) {
	foreach my $y (@hex_list) {
		if (defined $data{"$x:$y"}) {
			foreach my $record (@{$data{"$x:$y"}}) {
				print join("\t", @$record), "\n";
			}
			print "\n";
		}
	}
}

prex1 · Nov 26, 2008

You need either use your code, but doing repetitive opens and reads of the file at every new value of [tt]$i[/tt] (not efficient), or you must read the whole file into an array indexed with the number column as the key and collecting all the lines with the same number column into subarrays.
How big are your files?
There are other (minor) performance issues with your code:
-to skip the first line just say [tt]$_=<CSV>;[/tt] after the [tt]open[/tt]
-to format [tt]$i[/tt] into the number column simply use
[tt]$val=sprintf("%04X",$i);
substr($val,2,0,':');[/tt]
-to reverse
[tt]substr($val,2,1,'');
$i=hex$val;[/tt]
BTW can't see the commas in the original data?

Franco

http://www.xcalcs.com

: Online engineering calculations

http://www.megamag.it

: Magnetic brakes for fun rides

http://www.levitans.com

: Air bearing pads

stevio · Dec 1, 2008

rharsh, I've tried your sample code and I can't get it to work. It returns nothing.

prex1, the files range from 500KB to 1MB, so not huge. yeah I realize I need to index by the number column, just not sure how to do it.

commas in original data - the original data was a csv file, I just posted it wrong.

Date and Time,Number,Total I/O /sec,....
08 09 2007 18:00:00,00:00,2.2733333
08 09 2007 18:00:00,00:01,0.033333335

rharsh · Dec 1, 2008

See if this helps:

Code:

use Text::CSV_XS;
open IN, "< sample.csv" or die "cannot open input file.\n$!";
my $header = <IN>;
my $csv = Text::CSV_XS->new;
my %data;

while (my $line = <IN>) {
	if ($csv->parse($line)) {
		my @fields = $csv->fields();
		push @{$data{$fields[1]}}, [@fields[0..2]];
	} else {
		warn "Failed to parse line: $line";
	}
}

my @hex_list = map { sprintf '%02X', $_ } 0..255;
foreach my $x (@hex_list) {
	foreach my $y (@hex_list) {
		if (defined $data{"$x:$y"}) {
			foreach my $record (@{$data{"$x:$y"}}) {
				print join("\t", @$record), "\n";
			}
			print "\n";
		}
	}
}

And sample.csv looks like:

Code:

Date and Time,Number,Total I/O /sec
08 09 2007 18:00:00,00:00,2.2733333
08 09 2007 18:00:00,00:01,0.033333335
... and so on.

stevio · Dec 1, 2008

rharsh, awesome, works like a charm. Have a star

rharsh · Dec 1, 2008

Great, I'm glad it worked for you and thanks for the star.

stevio · May 5, 2009

Hi all,

Just revisiting this again, but need some rework

The required out now needs to be like:

Original data

Code:

Date and Time          Number    Total I/O /sec  Reads/Sec more n columns
08 09 2007 18:00:00    00:00    2.2733333        0
08 09 2007 18:00:00    00:01    0.033333335      123
08 09 2007 18:00:00    00:02    0.036666665      65
08 09 2007 18:00:00    00:03    1.1433333        34
08 09 2007 18:00:00    00:04    0                34
08 09 2007 18:00:00    00:05    0                12
08 09 2007 18:00:00    00:06    10.453333        0
08 09 2007 18:00:00    00:07    2.26             0
08 09 2007 18:00:00    00:08    1.0166667        0
08 09 2007 18:00:00    00:09    441.14           0
08 09 2007 18:00:00    00:0A    0.006666667      0
.....
08 09 2007 18:05:00    00:00    2.9333334        23
08 09 2007 18:05:00    00:01    0.39333335       5
08 09 2007 18:05:00    00:02    0.096666664      etc
08 09 2007 18:05:00    00:03    1.2966666
08 09 2007 18:05:00    00:04    0.093333334
08 09 2007 18:05:00    00:05    1.8733333
08 09 2007 18:05:00    00:06    8.63
08 09 2007 18:05:00    00:07    2.9
08 09 2007 18:05:00    00:08    1.27
08 09 2007 18:05:00    00:09    1.1766666
08 09 2007 18:05:00    00:0A    0.33
.....
etc etc

Required data

Code:

Total I/O /Sec
Date and Time          00:00      00:01       00:02 ...
08 09 2007 18:00:00    2.2733333  0.033333335
08 09 2007 18:05:00    2.9333334  0.39333335
....etc

Reads/Sec
Date and Time          00:00      00:01   00:02 ...
08 09 2007 18:00:00    0          123
08 09 2007 18:05:00    23         5
...etc

next column

So, all I'm doing is using the number(or whatever is in the second column) as the column headings. The data itself needs to come from each column of the original data.

Hope this makes sense.

Any help would be much appreciated. Stevio

rharsh · May 5, 2009

Did you try modifying the code yourself? What are you having probelms with?

stevio · May 5, 2009

I've tried modifying the code. The only thing I managed to get working was

Code:

push @{$data{$fields[1]}}, [@fields[0..5]];

to include more columns

but I can't work out how to transpose the individual column data and organise by the column 2 (whatever it is)

So far I've put some pseudocode:

Code:

foreach my $records(@{$data{$fields[1]}}){
    #make array of column 2
    #print column 2 as heading separated by tabs
    #print each column row - stuck here
  
}

Can't wrap my head around re-org that data into the right format.

prex1 · May 6, 2009

You need to create a hash of hashes, where the first key is the first column, and the subkey the second one. Column headings need be stored in a separate hash, that can be used also to determine column widths.
The following code will handle the column widths to correctly align them. You'll need to add some headings and determine how many data to print.

Code:

my($key1,$key2,%data,%collen);
while(<DATA>){
  $key1=substr($_,0,19);
  $key2=substr($_,23,5);
  $_=substr($_,32);
  $data{$key1}{$key2}=[split];
  my@lengths=map{length}@{$data{$key1}{$key2}};
  for(@lengths){$_=5 if$_<5}
  if(exists$collen{$key2}){
    for(my$i=0;$i<@lengths;$i++){
      $collen{$key2}[$i]=$lengths[$i]if$lengths[$i]>$collen{$key2}[$i];
    }
  }else{
    $collen{$key2}=\@lengths;
  }
}
for(my$i=0;$i<@{$collen{$key2}};$i++){
  print"Date and Time          ";
  for(sort keys%collen){
    printf"%-$collen{$_}[$i]s ",$_;
  }
  print"\n";
  for $key1(sort keys%data){
    print$key1,'    ';
    for(sort keys%collen){
      printf"%-$collen{$_}[$i]s ",$data{$key1}{$_}[$i];
    }
    print"\n";
  }
  print"\n";
}
__DATA__
08 09 2007 18:00:00    00:00    2.2733333        0
08 09 2007 18:00:00    00:01    0.033333335      123
08 09 2007 18:00:00    00:02    0.036666665      65
08 09 2007 18:00:00    00:03    1.1433333        34
08 09 2007 18:00:00    00:04    0                34
08 09 2007 18:00:00    00:05    0                12
08 09 2007 18:00:00    00:06    10.453333        0
08 09 2007 18:00:00    00:07    2.26             0
08 09 2007 18:00:00    00:08    1.0166667        0
08 09 2007 18:00:00    00:09    441.14           0
08 09 2007 18:00:00    00:0A    0.006666667      0
08 09 2007 18:05:00    00:00    2.9333334        23
08 09 2007 18:05:00    00:01    0.39333335       5
08 09 2007 18:05:00    00:02    0.096666664      etc
08 09 2007 18:05:00    00:03    1.2966666
08 09 2007 18:05:00    00:04    0.093333334
08 09 2007 18:05:00    00:05    1.8733333
08 09 2007 18:05:00    00:06    8.63
08 09 2007 18:05:00    00:07    2.9
08 09 2007 18:05:00    00:08    1.27
08 09 2007 18:05:00    00:09    1.1766666
08 09 2007 18:05:00    00:0A    0.33

Franco

http://www.xcalcs.com

: Online engineering calculations

http://www.megamag.it

: Magnetic brakes for fun rides

http://www.levitans.com

: Air bearing pads

stevio · May 6, 2009

hi prex1,

thanks, awesome job. Eventually want to take each block of data and create Excel tabs for each one (probably Perl OLE) so I can graph the result, but this has gone a long way to help.

Code:

Date and Time          00:00 00:01 00:02 00:03 00:04 00:05 00:06 00:07 00:08 00:09 00:0A
08 09 2007 18:00:00    0     123   65    34    34    12    0     0     0     0     0
08 09 2007 18:05:00    23    5     etc

stevio · May 7, 2009

my mistake, the original format was a csv file:

Code:

Date and Time,Number,Total I/O /sec,Reads/Sec
08 09 2007 18:00:00,00:00,2.2733333,0
08 09 2007 18:00:00,00:01,0.033333335,123
08 09 2007 18:00:00,00:02,0.036666665,65
08 09 2007 18:00:00,00:03,1.1433333,34
08 09 2007 18:00:00,00:04,0,34

so will have to integrate rharsh's original script to read the csv with this script....

prex1 · May 7, 2009

You just need to replace the first 5 lines after [tt]while(<DATA>){[/tt] by

Code:

  chomp;
  my@row=split/,/;
  $key1=shift@row;
  $key2=shift@row;
  $data{$key1}{$key2}=\@row;
  my@lengths=map{length}@row;

Note however that this won't work if in the file you have quoted strings possibly containing commas. Otherwise you can go on with it.

Franco

http://www.xcalcs.com

: Online engineering calculations

http://www.megamag.it

: Magnetic brakes for fun rides

http://www.levitans.com

: Air bearing pads

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Parsing CSV file 2

stevio

Vendor

rharsh

Technical User

prex1

Programmer

stevio

Vendor

rharsh

Technical User

stevio

Vendor

rharsh

Technical User

stevio

Vendor

rharsh

Technical User

stevio

Vendor

prex1

Programmer

stevio

Vendor

stevio

Vendor

prex1

Programmer

Similar threads

Part and Inventory Search

Sponsor