Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations TouchToneTommy on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

MS Word and PERL

Status
Not open for further replies.

vne147

Technical User
May 10, 2006
6
US
Hello everyone. I am attempting to write a PERL script that will open a .doc file and parse it based on certain criteria. So far I am able to open the document, split it up into paragraphs, and write each paragraph to a file using the code below. My problem is that within the paragraph there are portions that are bolded and/or italicized and I want to flag those somehow. I can't figure that part out. I have no experience with VB so the MS Word help files didn't help me very much. I'm really drowning here, someone please help! Thanks in advance.

My Code:

use Win32::OLE;
use Win32::OLE::Enum;

$document = Win32::OLE->GetObject("file.doc") or die;
open (FH,">output.txt");

$paragraphs = $document->Paragraphs();
$enumerate = new Win32::OLE::Enum($paragraphs);

while(defined($paragraph = $enumerate->Next()))
{
$text = $paragraph->{Range}->{Text};
$text =~ s/[\n\r]//g;
$text =~ s/\x0b/\n/g;
print FH "$text\n\n":
}
 
You seem to be only pulling the raw text, are there format placeholders, or more documentation on Win32::OLE you can point us to ...

--Paul

Paul
------------------------------------
Spend an hour a week on CPAN, helps cure all known programming ailments ;-)
 
I haven't done this type of thing too much, but what about adding a block like this in your while loop:

Code:
if ( $paragraph->{Range}->{Bold} < 0 ) {
  $text="BOLD: $text";
}
if (  $paragraph->{Range}->{Italic} < 0){
  $text="ITALIC: $text";
}
if ($paragraph->{Range}->{Underline} > 0){
  $text="UL: $text";
}

Basically, if you reach a Bolded flag, the "Bold" value will be -1. Italic will be -1 if Italicized, and Underline will be 1 if Underlined.

Not sure if that's what you're looking for or not... let me know.

Brian
 
PaulTEG, thanks for the reply. The Win32::OLE documentation that comes with the standard PERL distribution is all that I have. Actually, $paragraph or $word in the script below stores all text and associated properties. I figured out a way to do it last night after much headache but it's still very inefficient because it looks at one word at a time. I'm working on making it quicker. If anyone has any ideas they would be much appreciated. Thanks. Here's the code.

use Win32::OLE;
use Win32::OLE::Enum;

$document = Win32::OLE->GetObject("file.doc") or die;
open (FH,">output.txt");

$words = $document->Words();
$enumerate = new Win32::OLE::Enum($words);

while(defined($word = $enumerate->Next()))
{
$bold = $word->{Font}->{Bold};
$italic = $word->{Font}->{Italic};
$word = $word->{Text};

$print FH ($bold?"<B>":"").
($italic?"<I>":"")."$word\n\n":
}
 
BrianAtWork,

Thanks for the input. I did consider that approach but since I originally parsed the document into paragraphs part of the paragraph could be italicized and part not, etc. If that's the case $paragraph->{Range}->{Italic} would return a value of 999999999 or something similar. I got around that by parsing the document into words but its incredibly slow that way. Thanks again.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top