I am processing an Excel file for input and searching for matching substrings in two other Excel files using OLE calls. Currently I am reading each Excel input row (similar to the "Excel - Perl?" post by safra on 4/24) individually (from 1 to a defined upper limit based on observed current number of rows in the file plus growing room) and discarding empty rows. I have used this method as I can not assume that the first empty row is the end of a file, plus I do not know a call that returns the last populated row of an Excel file.
The problem with this approach is lack of efficiency: it's taking 97 minutes to run through an input file of ~220 rows and checking it against the two other files (~2000 and ~12000 rows respectively). It only takes 46 seconds when the files are converted and processed as CSV text! I am intending to change the program to read in the entire content of each Excel file into memory during program initialization to reduce the processing time. If anyone knows of a method or has developed a nice algorithm that returns the largest non-empty column and row number of an Excel file (or knows of a better approach than what I am planning as far as reading the files into memory at initialization), please respond.
The problem with this approach is lack of efficiency: it's taking 97 minutes to run through an input file of ~220 rows and checking it against the two other files (~2000 and ~12000 rows respectively). It only takes 46 seconds when the files are converted and processed as CSV text! I am intending to change the program to read in the entire content of each Excel file into memory during program initialization to reduce the processing time. If anyone knows of a method or has developed a nice algorithm that returns the largest non-empty column and row number of an Excel file (or knows of a better approach than what I am planning as far as reading the files into memory at initialization), please respond.