Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations derfloh on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Direct RTF code manipulation - format error

Status
Not open for further replies.

MakeItSo

Programmer
Joined
Oct 21, 2003
Messages
3,316
Location
DE
Hi friends,

I am having a special problem again.
Background: I need to handle Word documents with loads of comments. I need to remove these comments for further processing, but I also need these comments back in the finished document, exactly the way they were; i.e. same relative position, same relative scope, same ID, same author etc. (Doc 2 help comments, denominate conditional text).

This challenge has got me nearly insane; I've tried various approaches until I finally had the rescuing inspiration:

I save as RTF, identify the comments via RegExp, store the comment's RTF code in a database & replace the comment with a placeholder in the RTF code.
So far, so good.
Alas: The RTF will open fine in Word Pad or in a RTF box control, but not in MS Word nor in Open Office.
Word fires a "an error occured while opening the file" message; OO tells me "format error at pos abc,xyz" with abc,xyz being the exact position of the very concluding "}".

This is what I do:
Typical starting anchor of a comment (="annotation" in RTF) looks like this:
Code:
{\*\atrfstart 234643803}
I use a regular expression to locate and to extract the number (=ID for database storage) and replace with a placeholder:
Code:
opStr = Replace(opStr, mat.Value, "{<COMID=" & mat.SubMatches(2) & ">}")
This I do in order to have an identifiable placeholder in the file as displayed in Word, as I need to assign a special style to it.

I then do the same for the code closing the annotationk, which generally looks like this (I've only removed some customer-specific details):
Code:
{\cs74\fs16\thestyle5784459 {\*\atrfend 234643803}{\*\atnid CText}{\*\atnauthor Conditional Text}\chatn 
{\*\annotation{\*\atnref 234643803}{\*\atndate 651637481}\pard\plain \s75\ql \li0\ri0\widctlpar\aspalpha\aspnum\faauto\adjustright\rin0\lin0\itap0 \f1\fs20\lang1033\langfe1033\cgrid\langnp1033\langfenp1033 {\field\fldlock{\*\fldinst {
\cs74\fs16\someid9595509 \{[blue]serial-number-like combination 1234-ABC4-B222 etc.[/blue]\}}}{\fldrslt }}{\cs74\fs16\someid9595509\chatn }{\thestyle9595509 Pla}{\someid9595509t}{\someid9595509 form: Printed Manual}}}

Same here: I apply a RegExp and replace with a placeholder.
The text in RTF code before replacement:
Code:
{\*\atrfstart 234643803}[blue]Paragraph text[/blue]{\cs74\fs16\thestyle5784459 {\*\atrfend 234643803}[blue]bla yadda...[/blue]
...Printed Manual}}}

And after my treatment:
Code:
 {<COMID=234643803>}[blue]Paragraph text[/blue]{</234643803>}

I've tried with two/three commented paragraphs masked this way and it functioned nicely. The problem seems to occur first, when I process the entire document.
I also checked my database entries whether more than the desired code portions have been extracted - negative...


Don't know what would corrupt the RTF. Do Word/OO "see" the Tags and assume the file to be XML??
[3eyes]

P.S: The document contains 177 comments spread over ~ 90 pages.

Thanks for any hint!
[wavey]

Cheers,
MiS

[navy]"We had to turn off that service to comply with the CDA Bill."[/navy]
- The Bastard Operator From Hell
 
Update: Problem solved.
It had nothing to do with my using tip brackets for creating identifiable "tags".
The problem was rather that I replaced too big a portion. I have left out the "{cs" portion and replace only this part:
Code:
{\*\atrfend 234643803}{\*\atnid CText}{\*\atnauthor Conditional Text}\chatn
{\*\annotation{\*\atnref 234643803}{\*\atndate 651637481}\pard\plain \s75\ql \li0\ri0\widctlpar\aspalpha\aspnum\faauto\adjustright\rin0\lin0\itap0 \f1\fs20\lang1033\langfe1033\cgrid\langnp1033\langfenp1033 {\field\fldlock{\*\fldinst {
\cs74\fs16\someid9595509 \{serial-number-like combination 1234-ABC4-B222 etc.\}}}{\fldrslt }}{\cs74\fs16\someid9595509\chatn }{\thestyle9595509 Pla}{\someid9595509t}{\someid9595509 form: Printed Manual}}
Thus ending up with this:
Code:
{\cs74\fs16\thestyle5784459 {<COMID:123456>}}
Now the code works just fine.
:-)

[navy]"We had to turn off that service to comply with the CDA Bill."[/navy]
- The Bastard Operator From Hell
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top