David,
Thanks for the input. I appreciate it!
I was hoping to avoid the cutting/pasting thing because I am trying to build a tool that will be able to convert the 100 or so Word docs I have been given in a batch. The tool will also be used for future one-time conversions. I am also trying to avoid using very much of the office API commands, as part of the development goals is to not have to have the Word api available for the conversion (not likely, I am finding). I am having quite a bit of success with everything but the table looking for patterns in formatting, i.e. converting:
Title: This document
Purpose: The purpose is something useful.
Into something like:
<Title>This document</Title>
<Purpose>The purpose is something useful.</Purpose>
This xml format can be used for various applications. This conversion looks for patterns in the text, i.e. look for a newline, followed by text, followed by the character ":", use that as the element name. Then I take all the text to the next match in the pattern and use it for the "inner text" of the element.
If you understand regular expressions, the above explanation probably seems very basic, if not, it probably seems a little rough. Anyway, the point is, I am doing this by reading in the "text" of the document and looking for the patterns. However, when I get to the tables, I am getting lost because I don't know what "invisible characters" i.e. what unicode or ansi characters, word is using to separate columns and rows (probably newline) so I don't know what pattern to look for.
Hence, that is why I am looking for a resource that can tell me what MSWord uses to identify the different parts of a table, or any formatting, for that matter. These come out as the infamous little "blocks" when you look at a word document in a text editor before converting it.
If anyone can tell me of an easier way to do this that utilizes less of the office programs itself, I would really appreciate it! Right now, because I don't know the encoding pattern, I can see two choices:
1. Reverse engineer and "figure it out"
2. Use the "table" objects in the word API. (not desirable, since I want to avoid the need for having word installed on the machine running the conversion program.)
Anyway, I hope this verbose explaination of what I am trying to do inspires somone! Thank you so very much in advance!
B.J.