Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Chriss Miller on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Removing FRAMES around Text in a scanned document

Status
Not open for further replies.

WildHare

MIS
Mar 1, 2002
1,801
US
Hi - a coworker of mine has been given a SCANNED word document full of frames around all the chunks of text. Is there any way to remove the frames without losing the 'position' of the text - normally, when you remove a frame, the text in it simply shoots over to the left margin. In a government contract document, this is not a good thing.

If anyone can offer some assistance, both she and I will be very appreciative.

Thanks!

Jim Me? Ambivalent? Well, yes and no....
Another free Access forum:
More Access stuff at
 

Scanned Document you say. What format? Tiff? Are you wanting to do this automatically or with user intervention? Have a programmer handy, because your gonna need one!

Or do you mean that you have a word document that you want to modify?

 
It's all screwed up..apparently, a printed version of this particular document (about 10 pages or so) is the only available version, because the person who wrote it, and has an electronic copy, is not in the good graces of the firm for whom she used to work, and they don't want to call her and ask her to send them a diskette. Yeah.

So somebody scanned in the pages, and created 10 documents (one per page) and gave the files to an admin/secretary and said "Here - fix this by Friday."

The documents are nothing but frames (NOT text boxes) scattered all over, with typical government-contract-ese gibberish, wacky outlining

A.1.B Contractor shall supply etc etc. and, oddly enough, a "c" almost every place where there should be an "e". And in Times Roman 12, a lower case "c" looks almost exactly like a lower case "e", so I had to magnify the damn page to 150% in order to even see the difference. I couldn't understand why nearly every word was red-lined as misspelled.. :) And is there such a word as "definitized"? This contract had it all over the place, but it sounds like government gobbledy-gook to me.

I do NOT know what scanning software was used, or what version of WORD the OCR software prepared the scanned stuff for, but since Frames were replaced by Text boxes in Office/97, I'm assuming the OCR software is probably fairly out of date.

There is NO plain text in the documents. Every word is in one frame or another. Last night I tried just about everything I could think of to remove the frames and leave the text, and nothing seemed to work - even tried saving as RTF, WP for Windows, Plain Text, etc etc etc - either the save converted everything to unformatted Courier 12 text, or retained the frames.

I have a feeling we're looking at a re-typing job, which will ultimately probably take less time than messing with all those documents.

Thanks anyway...

Jim Me? Ambivalent? Well, yes and no....
Another free Access forum:
More Access stuff at
 

Open the origional tiffs for editing with imaging for windows. Using the selection tool drag over part of the frame. Like only the top of the frame so no words are within the selection zone. Click on cut (CTRL+X) and repeat until you have removed all of the frame sides. Once done save then if necessary re-OCR.

Good Luck

 
I was not present at the scanning operation, not sure if there are any of the original TIFF files left over, but I will note your response back in my little book of useful hints.. thanks much.

Jim Me? Ambivalent? Well, yes and no....
Another free Access forum:
More Access stuff at
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top