Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations wOOdy-Soft on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

OCR PDF vb6 advice needed

Status
Not open for further replies.

royalcheese

Technical User
Dec 5, 2005
111
GB
Hi all

I would like to OCR PDF documents as part of a program.

Is it possible to do this using vb.

Thank you very much

Chris
 
It doesn't matter what format that the doc is in when you OCR it, but as for actually writing an OCR app, that would be very difficult to do. I haven't seen any free ones that work very well. I suppose that you get what you pay for, though.

-David
2006 Microsoft Valued Professional (MVP)
 
Essentially i have written a document uploader to our ftp site. this in turn makes these available on a web site, PDF is a large part of our data. If they are word searchable our web guys says that he can search on them.

I want the upload program to OCR pdfs before they get sent to the ftp site. I have the ftp working - took a while but satisfying :) , also have ocr of tif files.

I wont mind buying software but i want it to be seamless, as in a user wont have to open adobe and OCR. I just want a button click or schedules upload/ocr within the app.

Chris
 
There are plenty of tools, many free, that can word search PDFs
 
but do they save the changes and make them word searchable just opening in acrobat reader ?

Can you give an example of a free OCRer ? I cant see any
 
What changes? You haven't mentioned changes before. And I don't see how OCR would meet that requirement. I still think you need to explain exactly what it is you are trying to achieve before anyone here can really begin to provide help.
 
I want to OCR a PDF , save it so that the ocr is saved on the document ( this is what i ment by changes ) in my vb6 program .
 
We have various .PDF documents, which are not OCRed (Optical Character Reciginition) I wish my program to put them through a OCR process so that they will become word searchable, but i do not want to use a external program like adobe etc to do this i just want to be able to code it and press a button (which will take a file path) and OCR that pdf doc
 
I am guessing that these are somewhat older PDF files that don't already have the text stored in them? These days, most PDF files store the text for searching, etc.

You say that you already are OCRing tif files, what does your code for that look like?
 
This is the Microsoft Office Document Imaging

which was lifted from here


but this only does tifs

Code:
         Dim miDoc As MODI.Document
          Dim miWord As MODI.Word
          Dim miRects As MODI.miRects
          Dim miRect As MODI.miRect
          Dim strRectInfo As String
          
          Set miDoc = New MODI.Document
          miDoc.Create frmupload.txtDirectory + fil
          
          miDoc.Images(0).OCR
          miDoc.Save
          
    
          Set miRect = Nothing
          Set miRects = Nothing
          Set miWord = Nothing
          Set miDoc = Nothing

The pdfs that we use are new and come from a scanner that runs them throgh and converts to PDF ( non OCRed )
 
Is there a particular reason that you have to use PDF's?

It may make more sense to keep these documents in a text format, and then use one of the many free PDF converters to make the document a PDF for download purposes.

 
ok went with adobe capture and a server . . . thanks for the help
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top