Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Chriss Miller on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Large number of Word files read 1

Status
Not open for further replies.

ericksoda

Programmer
Oct 22, 2001
60
US
Hello! I have a large number of Word documents (> 100,000) that have to be processed.
1. Read the .doc file
2. Extract account number, name and date
3. convert to RTF format
4. write HL7 messages around the rtf component
5. write the file

I have read the previous notes and tried using Word automation but it is very slow. Is there a faster way to read a DOC file into a string variable, and then read it and convert it from there? Is there a tool kit I can use?

Any help or pointers gratefully accepted.

Thanks!

David
 
It is very slow probably because the PC has little memory or disk space. Opening large word documents can consume a lot of space especially if these documents contain graphics.

Also, be ware, if a word document has been composed from bit and pieces from different versions of word or even other word processors, this can result in falsely big size documents, so you better deal with this problem first, if it is the case.

One way to deal with such matter is to have a macro (within the template used in all these documents if possible) that does what you need and saves the result as RTF and close. Like this your VB code can just trigger the macro and jumpt to the next document. The danger here is MEMORY JAM. But you can give some waiting time every (say) fifth document, for example.

If it is possible to have the account number systematically stored in a document property upon creation/editing, this can accelerate the process a lot. But if the documents are there already and the account numbers and dates are within the document body already, then this suggestion is not useful :-(

If converting the documents to RTF is independent from extracting the account numbers and date, you can perform the exractions first and then go again through the documents and save as rtf. Just in case you need the first result more urgently.

Sorry, but I did not understand 4, and 5 properly.


Eman_2005
Technical Communicator
 
Eman_2005,

Thanks for your reply! The documents I am opening are mostly small (100K or so). The whole process only takes a couple of seconds so it is not long, unless you are processing 100,000 file or more. And every time I talk to them there are more files. Not it looks like maybe 200,000 (still not sure.)

The documents were all created by a single application - they were created by a transcription service from dictated tapes.

I like your concept of a macro stored - that could extract the data and then store in a document property. I had not thought of that. I will look into it. That makes tons of sense, and actually makes the rest easy.

My fourth item is that I have to put HL7 wrapper - the wrapper looks like this:
MSH|^~\&|SendingApp|0013001^DENDRITE|MIK MIK GUID|DENDRITE|20021011164000||MDM^M01|1349973632567480|P|2.3|||NE|NE
PID|1|5588|128-0013001|2288|
PV1|0001|O|^^^RVM^||||derick|||||||||||||||||||||||||||||||||||||||||||
TXA|0001|TN|TX|200501010000^D|||200501121512||||||||||DO|UC|||||
OBX|0001|TX|NO_REF||RTF-File Inserted Here.~|||||||||||

I need the HL7 so the Medical records application knows how to put it in the patient's chart.

Again, thanks for all the help. I think I can do this now - although if you think of anything more, I will appreciate any pointers.

David
 
Another hint could be to group documents in master documents and run the code in the master document.
Of course, I do not mean you put 100'000 documents in one master document, but you could probably create groups of documents following some criteria, or even just by quantity.

I'm afraid all this is more relevant to Word than it is to VB, so we are a little in the wrong place ;-)

Eman_2005
Technical Communicator
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top