Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations TouchToneTommy on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Parse Excel File with Dates and Non-english characters

Status
Not open for further replies.

svar

Programmer
Aug 12, 2001
350
GR
The task here is reading an excell file, process some fields and export with Write::Excel. I have two problems with this:
1) Dates These are read as a number and there is no $oWkC attribute that would signal it should be read as a date
Indeed, although it appears correctly as say 20/11/2007 in the excel file, the 'Type' attribute is not a Date, it is
2)Non-english characters. How can these be read and written to in a consistent manner?
The code:
Code:
use strict;
use Storable;
use Spreadsheet::WriteExcel;
require('dumpvar.pl');
 $main::vers=Spreadsheet::WriteExcel->VERSION;
my $fh;my $str;
my $excfile=shift;
if(-e $excfile){
    # Requires perl 5.8 or later


my $outfile='NEW'.$excfile;

  $main::workbook = Spreadsheet::WriteExcel->new("$outfile");
 #$main::workbook = Spreadsheet::WriteExcel->new(\*STDOUT);

   $main::format      = $main::workbook->addformat();
   $main::format0      = $main::workbook->addformat();
    $main::format0->set_bold();
 $main::format1 = $main::workbook->addformat(num_format => '@');
&excparse($excfile);
}else{die "nonexistent file $excfile\n";}

sub excparse{
use strict;
use Spreadsheet::ParseExcel;
my $file=shift ;
my $iR;my $oWkS; my $oWkC;my $debugflag=1;#set to 0  for production runs
 my $oExcel = new Spreadsheet::ParseExcel;
my $el;
my $oBook = $oExcel->Parse($file ) ;
if($debugflag){    print "=========================================\n";
    print "FILE  :", $oBook->{File} , "\n";
    print "COUNT :", $oBook->{SheetCount} , "\n";
    print "AUTHOR:", $oBook->{Author} , "\n";
print "filename-= $file obook=$oBook\n"; } ;
my @format_check=();
my %headers;
my $lastsheet=-1;
#binmode STDOUT, ":utf8";


 for(my $iSheet=0; $iSheet < $oBook->{SheetCount} ; $iSheet++){ 
if($iSheet >$lastsheet){
if($main::vers>0.40){
$main::worksheet{$iSheet} = $main::workbook->add_worksheet($iSheet);}
else{$main::worksheet{$iSheet}=$main::workbook->addworksheet($iSheet);}
$main::worksheet{$iSheet}->activate();}


  $oWkS = $oBook->{Worksheet}[$iSheet];


for(my $iR = $oWkS->{MinRow} ;
            defined $oWkS->{MaxRow} && $iR <= $oWkS->{MaxRow} ; $iR++) {
print "ir=$iR\n";

   for(my $iC = $oWkS->{MinCol} ;
             defined $oWkS->{MaxCol} && $iC <= $oWkS->{MaxCol} ;
$iC++) {
my $oWkC = $oWkS->{Cells}[$iR][$iC];

if($oWkC){ $el=$oWkC->{'Val'}; 

$el=$oWkC->{'_Value'} if (  $oWkC->{'Type'} eq 'Date'
 or $el=~/[\r\n]/ );#### check here!!!
if($el=~/[\r\n]/){print "el=$el\n";

}

$el=~s/^\s+//;
$el=~s/[\r\n]/ /g;#get rid of new lines ;

           }else{$el='';}#if oWkC  
#print "ic=$iC owkc= $oWkC ir=$iR el=$el\n" if $debugflag;


if($iR){ 
#if($iC>13){
#$main::worksheet{$iSheet}->write_utf16be_string($iR,$iC,$el,$main::format0);
 
#$main::worksheet{$iSheet}->write_utf1be($iR,$iC,$el,$main::format0);

 




 
unless($el=~/(.*)\/(.*)\/(.*)/){
$main::worksheet{$iSheet}->write_string($iR,$iC,$el,$main::format1);}
else{  my $format=$main::format1; $format->set_num_format('mm/dd/yy'); 
$main::worksheet{$iSheet}->write_string($iR,$iC,$el,$format);     }
}
#}
else{
 $main::worksheet{$iSheet}->write($iR,$iC,$el,$main::format0);}

goto I10 if $iR==15;
 }#foreach iC
  }#for iR

$lastsheet=$iSheet; 
    }#foreach sheet

I10:$main::workbook->close() or print "Error closing file: $!";
 
If you expect dates in certain fields, the [tt]ExcelLocaltime()[/tt] method from Spreadsheet::parseExcel::Utility ( will convert them into a useful form. If you don't know where they are but know that they are displayed correctly from within excel, you could look at the formatted values ([tt]Value()[/tt] rather than [tt]Val()[/tt]) and use a regex to recognise dates.

Yours,

fish

[&quot;]As soon as we started programming, we found to our surprise that it wasn't as easy to get programs right as we had thought. Debugging had to be discovered. I can remember the exact instant when I realized that a large part of my life from then on was going to be spent in finding mistakes in my own programs.[&quot;]
--Maur
 
Can you be a bit more explicit on the 'non english' characters bit, I'm not sure what you mean by that.

For the dates a more specific regex might be better than the type
Code:
unless($el=~/(\d\d\)/(\d\d\)/\d\d(\d\d/){
$el1="$1/$2/$3";
$main::worksheet{$iSheet}->write_string($iR,$iC,$el1,$main::format1);}
...

Not tested, just clutching ...

Paul
------------------------------------
Spend an hour a week on CPAN, helps cure all known programming ailments ;-)
 
Ok, thanks: Now
-Non-english characters, I mean a different alphabet. The file was written by using both english and the local language character set.

Paul, if I can get the parsed entry value in a date format, I can surely write it anyway I choose. The problem is that $el for something that reads 22/12/2006 on the sheet itself is 39073,57431(which is Val)
Value and Val show the same thing, by the way.
 
why not just write out the value and let excel look after the formatting? Or this might be what you're looking for?

On the local language, if it's in there, it should be recoverable, what problems are you getting?

Paul
------------------------------------
Spend an hour a week on CPAN, helps cure all known programming ailments ;-)
 
Because it does not look after the formatting.
I am reading the value and then printing it to a different excel file using Spreadsheet::WriteExcel
Since $el is a number, that's all that gets printed, unless WriteExcel has some quirks on writing that i am unaware of.
I will look at Datetime, both this and the Utility solution
look like they can do the job.
The problem with the local language are similar: Parse excel reads a string of (ASCII and) non-ASCII characters and that's also what gets printed when I call the write method with WriteExcel.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top