Hi friends,
I am having a special problem again.
Background: I need to handle Word documents with loads of comments. I need to remove these comments for further processing, but I also need these comments back in the finished document, exactly the way they were; i.e. same relative position, same relative scope, same ID, same author etc. (Doc 2 help comments, denominate conditional text).
This challenge has got me nearly insane; I've tried various approaches until I finally had the rescuing inspiration:
I save as RTF, identify the comments via RegExp, store the comment's RTF code in a database & replace the comment with a placeholder in the RTF code.
So far, so good.
Alas: The RTF will open fine in Word Pad or in a RTF box control, but not in MS Word nor in Open Office.
Word fires a "an error occured while opening the file" message; OO tells me "format error at pos abc,xyz" with abc,xyz being the exact position of the very concluding "}".
This is what I do:
Typical starting anchor of a comment (="annotation" in RTF) looks like this:
I use a regular expression to locate and to extract the number (=ID for database storage) and replace with a placeholder:
This I do in order to have an identifiable placeholder in the file as displayed in Word, as I need to assign a special style to it.
I then do the same for the code closing the annotationk, which generally looks like this (I've only removed some customer-specific details):
Same here: I apply a RegExp and replace with a placeholder.
The text in RTF code before replacement:
And after my treatment:
I've tried with two/three commented paragraphs masked this way and it functioned nicely. The problem seems to occur first, when I process the entire document.
I also checked my database entries whether more than the desired code portions have been extracted - negative...
Don't know what would corrupt the RTF. Do Word/OO "see" the Tags and assume the file to be XML??
![[3eyes] [3eyes] [3eyes]](/data/assets/smilies/3eyes.gif)
P.S: The document contains 177 comments spread over ~ 90 pages.
Thanks for any hint!
![[wavey] [wavey] [wavey]](/data/assets/smilies/wavey.gif)
Cheers,
MiS
[navy]"We had to turn off that service to comply with the CDA Bill."[/navy]
- The Bastard Operator From Hell
I am having a special problem again.
Background: I need to handle Word documents with loads of comments. I need to remove these comments for further processing, but I also need these comments back in the finished document, exactly the way they were; i.e. same relative position, same relative scope, same ID, same author etc. (Doc 2 help comments, denominate conditional text).
This challenge has got me nearly insane; I've tried various approaches until I finally had the rescuing inspiration:
I save as RTF, identify the comments via RegExp, store the comment's RTF code in a database & replace the comment with a placeholder in the RTF code.
So far, so good.
Alas: The RTF will open fine in Word Pad or in a RTF box control, but not in MS Word nor in Open Office.
Word fires a "an error occured while opening the file" message; OO tells me "format error at pos abc,xyz" with abc,xyz being the exact position of the very concluding "}".
This is what I do:
Typical starting anchor of a comment (="annotation" in RTF) looks like this:
Code:
{\*\atrfstart 234643803}
Code:
opStr = Replace(opStr, mat.Value, "{<COMID=" & mat.SubMatches(2) & ">}")
I then do the same for the code closing the annotationk, which generally looks like this (I've only removed some customer-specific details):
Code:
{\cs74\fs16\thestyle5784459 {\*\atrfend 234643803}{\*\atnid CText}{\*\atnauthor Conditional Text}\chatn
{\*\annotation{\*\atnref 234643803}{\*\atndate 651637481}\pard\plain \s75\ql \li0\ri0\widctlpar\aspalpha\aspnum\faauto\adjustright\rin0\lin0\itap0 \f1\fs20\lang1033\langfe1033\cgrid\langnp1033\langfenp1033 {\field\fldlock{\*\fldinst {
\cs74\fs16\someid9595509 \{[blue]serial-number-like combination 1234-ABC4-B222 etc.[/blue]\}}}{\fldrslt }}{\cs74\fs16\someid9595509\chatn }{\thestyle9595509 Pla}{\someid9595509t}{\someid9595509 form: Printed Manual}}}
Same here: I apply a RegExp and replace with a placeholder.
The text in RTF code before replacement:
Code:
{\*\atrfstart 234643803}[blue]Paragraph text[/blue]{\cs74\fs16\thestyle5784459 {\*\atrfend 234643803}[blue]bla yadda...[/blue]
...Printed Manual}}}
And after my treatment:
Code:
{<COMID=234643803>}[blue]Paragraph text[/blue]{</234643803>}
I've tried with two/three commented paragraphs masked this way and it functioned nicely. The problem seems to occur first, when I process the entire document.
I also checked my database entries whether more than the desired code portions have been extracted - negative...
Don't know what would corrupt the RTF. Do Word/OO "see" the Tags and assume the file to be XML??
![[3eyes] [3eyes] [3eyes]](/data/assets/smilies/3eyes.gif)
P.S: The document contains 177 comments spread over ~ 90 pages.
Thanks for any hint!
![[wavey] [wavey] [wavey]](/data/assets/smilies/wavey.gif)
Cheers,
MiS
[navy]"We had to turn off that service to comply with the CDA Bill."[/navy]
- The Bastard Operator From Hell