Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations bkrike on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

RichTextBox - RTF-Header ignored 2

Status
Not open for further replies.

MakeItSo

Programmer
Oct 21, 2003
3,316
DE
Hi friends,

once again I'm stuck with a simple (?) problem:
I am doing a lot of processing of Word files. Mostly simple stuff, macro programming, applying styles to certain sections, iterating over all files in folder/subfolders, etc.

One process however is quite... work-intensive and is being done on sometimes thousands of files.
Doing this with a Word macro is rather... inefficient...
It DOES work, but my Word is then blocked for half an hour or more.

This is why I'm trying to move this process entirely away from Word. The process involves three different types of files; two are text formats, one are rich text files.
I got a grip one the text files, and am now transforming them to XMLs which works quite nicely.
The rich text files however pose problems, as I need to apply MS Word formatting to these - but don't want that to be done in Word as it's too damn slow.

I am trying with a RichTextBox in VB6. Alas, it seems the RTFBox won't let me define my RTF header the way I like it. This is what I try ( i only removed a few formattings for readability's sake:
Code:
Dim rtf As String

rtf = "[maroon]{\rtf1\ansi\ansicpg1252\uc1\deff1 [/maroon][navy]{\fonttbl {\f1 \fmodern\fcharset0\fprq1 Courier New;}{\f2 \fswiss\fcharset0\fprq2 Arial;}{\f3 \froman\fcharset0\fprq2 Times New Roman;}}[/navy]" & _
[olive]{\colortbl \red0\green0\blue0; ... }[/olive]" & _
    "[purple]{\stylesheet {\s0 \sb80\slmult1\widctlpar\fs20\f1 \snext0 Normal{\s0 \sb80\slmult1\widctlpar\fs20\f1\highlight7 \snext0 Normal;} ... }}[/purple][teal]\viewkind4\viewscale100\pard\plain[/teal]" & vbNewLine
outRTF.SelRTF = rtf & vbNewLine & "\cs5 [blue]Cuckoo![/blue]\par}}}"
Now, this is my debug.Print of the TextRTF after this code has run:
Code:
{\rtf1\ansi\ansicpg1252\deff0\deflang1031{\fonttbl{\f0\fmodern\fprq1\fcharset0 Courier New;}{\f1\fnil\fcharset0 MS Sans Serif;}}
\viewkind4\uc1\pard\f0\fs24 Cuckoo!
\par \pard\f1\fs17 
\par }

As you can see, the RTFBox only took the first font of my definition (MS Sans Serif is the box's default).
The colortable, the stylesheet and the character style of my "Cuckoo!" are gone!

Why?
[3eyes]

Does any of you have an idea what I am doing wrong?
Or does a RichTextBox simply not ALLOW such RTF coding?
I haven't found anything satisfactory on Googling this issue.

Thanks!!
Makey

[navy]"We had to turn off that service to comply with the CDA Bill."[/navy]
- The Bastard Operator From Hell
 
The RichTextBox control only handles a small subset of the full RTF specification.

There are later versions of the underlying DLL used to support RTF, but (a.) RichTextBox only uses the oldest one, (b.) Word may use a "richer" subset of the full spec than any of the DLLs can handle.
 
You might want to have a look at accessing TOM directly. This allows you to use RTF 4.1, which is a lot more feature-rich than RTF 1.0, the latest (and only) version the VB6 RTF control supports. For more info, check the lower half of thread222-1383606. While strongm is the expert on this, I'll have a go at summarizing.

TOM (Text Object Model) is the underlying object model implemented by a RTF control. (For reference, see VB6's RTF control implements v1.0. There have been several revisions since then, with added functionality that isn't directly available to the VB6 RTF control. For some background, check
The latest system version (as opposed to Office-specific versions) of RTF is 4.1, whose dll is msftedit.dll. This version is exposed through the IRichEditOle interface, which you can read about in and "exposes the COM functionality of a rich edit control." Unfortunately, you can't directly create a reference to this interface in VB: "Dim x as IRichEditOle" will fail whether you have set a reference to the RTF control or not. This is because IRichEditOle doesn't support a dual interface (a "dual interface" implements both iUnknown and iDispatch) as is required in VB. Looking at the reference, you will see that it only implements iUnknown. So, you have to go about it using the API, sending a message to the control to return the interface polymorphically to an iUnknown object.

strongm's code does this with these lines:
Code:
    Dim myIUnknown As IUnknown '
    Dim tomDoc As ITextDocument
    SendMessage rtbText.hwnd, EM_GETOLEINTERFACE, 0&, myIUnknown
    Set tomDoc = myIUnknown
The first line defines an IUnknown object. This object can point to any COM object, since by definition all COM objects inherit from IUnknown.

The second line defines an ITextDocument object. This is the top-level object in the TOM.

In the third line, the SendMessage function is getting the rich text control's COM interface and having the iUnknown object reference point to it. The IUnknown object is now referencing an instance of the iRichTextOle interface, the one exposed (although not to the VB6 environment) by the RTF control.

The last line sets the ITextDocument object reference to be equal to the IUnknown object reference. This causes the latter to call its QueryInterface method, which will return ITextDocument as one of the interfaces it supports and therefore allow the reference to be added, exposing the functionality of the ITextDocument interface through the tomDoc variable.

In this way the variables give access to the RTF control's hidden (from VB) ITextDocument interface as defined in msftedit.dll, and thereby to version 4.1 of RTF.
 
Hi Bob,

thanks for that hint. It will sure be very helpful for future "migrations". :)
For this special project, I have meanwhile managed to circumvent the problem for now by creating an XML file from the RTF file with placeholders remaining in the RTF file.

So now my RTF file looks like this:
Code:
<placeholder id="1">
<placeholder id="2">
etc.
While my "XML" looks like this:
Code:
<?xml version="1.0" encoding="UTF-8"?>
<document>
	<file original="blabla.rtf">
		<placeholder id="1">here comes my text</placeholder>
		<placeholder id="2">etc.</placeholder>
	</file>
</document>

As I do not create this XML via DOM but as a plain text file via fso, I now need to transform it into true UTF-8 encoding.
For that I use - tadaa: TOM.
:)

Alas, it don't work.
:-(

1) My RTFBox is on Form1, the processing code is in a module
2) In another module I have a ENUM with the proper Internet encoding values:
Code:
Public Enum Codings
    utf8 = 65001
    koi8 = 21866
    shiftjis = 932
    gb2312 = 936
    latin = 1252
End Enum

as well as a procedure "ToOtherCodepage", like this:
Code:
Public Sub ToOtherCodepage(Datei As String, FromCP As Long, ToCP As Long)
    Dim myIUnknown As IUnknown
    Dim TextRange As ITextRange
    Dim tomDoc As ITextDocument
    
    SendMessage Form1.inRTF.hwnd, EM_GETOLEINTERFACE, 0&, myIUnknown
    Set tomDoc = myIUnknown
    [b]tomDoc.open Datei, tomText, FromCP[/b]
    tomDoc.save Datei, tomText + tomCreateAlways, ToCP
End Sub
From my processing module, I call the function:
Code:
ToOtherCodepage sf.Path & "\RTFs.xml", Codings.latin, Codings.utf8

But I get a runtime error 5 "invalid procedure call or invalid argument" at the bolded line.
Hovering over it in debug mode shows the values proper (filename, tomText as 2, FromCP as 1252).

Why the error? Why doesn't it work?
Reference to TOM is set of course.

[navy]"We had to turn off that service to comply with the CDA Bill."[/navy]
- The Bastard Operator From Hell
 
Ok, the first thing is that you have succeeded in accessing the TOM library. :)

As for the Open method, I would start with this:

Code:
Public Sub ToOtherCodepage(Datei As [COLOR=red]Variant[/color], FromCP As Long, ToCP As Long)

Here's the signature for the Open method:
Code:
STDMETHODIMP Open(          VARIANT *pVar,
    long Flags,
    long CodePage
);
The variant argument doesn't mean that you can send an argument of any type, such as string. Rather, it's a way of overloading the method to accept multiple file types. You create a pointer that can point to any type of file, and pass another parameter telling which type of file you're pointing to.

See if that works. If not, I would try setting the last argument to 0 and see if the codepage argument is looking for something unusual.


 
Bob, you're da man!
strongm introduced me to TOM and I used it successfully several times now. I always called it directly though, not as a separate function. I never knew!
Code:
Public Sub ToOtherCodepage(Datei [b][red]As Variant[/red][/b], FromCP As Long, ToCP As Long
This was all it took!!


Works fine now, thanks a million!
[thumbsup2]

[navy]"We had to turn off that service to comply with the CDA Bill."[/navy]
- The Bastard Operator From Hell
 
Yes, thanks to your promotion of TOM in these realms, strongm!

TOM ranks amongst the best TIPs I ever got here in these forums.
For text encoding conversions under VB6, it is literally invaluable - and extremely fast!
[thumbsup]

[navy]"We had to turn off that service to comply with the CDA Bill."[/navy]
- The Bastard Operator From Hell
 
I could just feel strongm waiting to see if I would figure this out before he spoke up. :) Glad to see you got it working, makeitso. It's nice to see this in action, looks very elegant. Thanks also to strongm for "banging on" about this until people started actually using it!

I suspect that there are situations where this will be useful in the .Net world as well...
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top