OCR to get data in VB.

rssql · Mar 8, 2010

I'm trying to find a way to use a JPG or other image and OCR code in VB.NET.
I want to get data from the image which will have a part number in a specific location.
I want this part number placed in a VB variable so I can use it.

any ideas?

vb5prgrmr · Mar 8, 2010

Using your friends (yahoo, google, ask, answers, bing) you should be able to find several different, good to excellent, ocr engines that will work with WIA (See M$ for more details on WIA).

Good Luck

PRPhx · Mar 11, 2010

They are out there. The company I got laid off from last summer used a couple of them (Don't remember the names tho). They aren't excatly cheap and you'll go through a lot of trial and error in order to get good, reliable data.

PJA · Mar 23, 2010

Hi - I did the same thing - recognising invoice numbers on PDF's & then storing them away.

I got good results using Tessaract. It's open source (used to be an HP research thing, releaassed to open source, looked after by Google). Google tessaract

I used it as a stand alone app - passed a trimmed tif to it & read the result text file. I believe there is an intergrated .NET version of it.

Some words of wisdom. By default - Tessaract only recognises basic text in a basic font. You can train it to recognise other fonts.

For me - I did some experimenting & then changed our default invoice to print the invoice number in a basic font. I then trimmed a section of the PDF - saved it as a tif to present to tessaract.

With a successful result - I stored the PDF away. With an unsuccesful result - e-mail the adnministrator the pdf & tell him to manually look at it.

We do 20,000 printed invoices a month. In 18 months - not one has been e-mailed - and so far - when we reference the stored invoices - they are all named correctly.

Also - did you know there was an OCR tool that came with Office 2003 & 2007 - AND did you know as long as you own the relevant version of Office on the PC you run your code on - it's license legal.

It works very well - fully automated - aligns the image, auto adjust the image for best scan results. You can use it from within .NET code. I got demo code from the Code Project web site & went from there.

I got the results I needed for this project using Tessaract - but the MS Office OCR tool was very interesting...

Hope my rambuling helps...

Regards

Paul

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

OCR to get data in VB.

rssql

Programmer

vb5prgrmr

Programmer

PRPhx

MIS

PJA

Technical User

Similar threads

Part and Inventory Search

Sponsor