Hi - I did the same thing - recognising invoice numbers on PDF's & then storing them away.
I got good results using Tessaract. It's open source (used to be an HP research thing, releaassed to open source, looked after by Google). Google tessaract
I used it as a stand alone app - passed a trimmed tif to it & read the result text file. I believe there is an intergrated .NET version of it.
Some words of wisdom. By default - Tessaract only recognises basic text in a basic font. You can train it to recognise other fonts.
For me - I did some experimenting & then changed our default invoice to print the invoice number in a basic font. I then trimmed a section of the PDF - saved it as a tif to present to tessaract.
With a successful result - I stored the PDF away. With an unsuccesful result - e-mail the adnministrator the pdf & tell him to manually look at it.
We do 20,000 printed invoices a month. In 18 months - not one has been e-mailed - and so far - when we reference the stored invoices - they are all named correctly.
Also - did you know there was an OCR tool that came with Office 2003 & 2007 - AND did you know as long as you own the relevant version of Office on the PC you run your code on - it's license legal.
It works very well - fully automated - aligns the image, auto adjust the image for best scan results. You can use it from within .NET code. I got demo code from the Code Project web site & went from there.
I got the results I needed for this project using Tessaract - but the MS Office OCR tool was very interesting...
Hope my rambuling helps...
Regards
Paul