×
INTELLIGENT WORK FORUMS
FOR COMPUTER PROFESSIONALS

Contact US

Log In

Come Join Us!

Are you a
Computer / IT professional?
Join Tek-Tips Forums!
  • Talk With Other Members
  • Be Notified Of Responses
    To Your Posts
  • Keyword Search
  • One-Click Access To Your
    Favorite Forums
  • Automated Signatures
    On Your Posts
  • Best Of All, It's Free!

*Tek-Tips's functionality depends on members receiving e-mail. By joining you are opting in to receive e-mail.

Posting Guidelines

Promoting, selling, recruiting, coursework and thesis posting is forbidden.

Students Click Here

OCR Zoning - Field data for auto-sorting is not being considered in 1 of 2 possible cases.

OCR Zoning - Field data for auto-sorting is not being considered in 1 of 2 possible cases.

OCR Zoning - Field data for auto-sorting is not being considered in 1 of 2 possible cases.

(OP)
Hi there,

Explanation about the problem, what the script is supposed to do and what it actually does.

We use doc scanners to automatically digitalize & sort, via OCR Zoning and a vbs to get pdfs on the correct subfolders on the server.

The script is supposed to verify if there is a valid content in that zone or not - if so - classify the file in the folder, if not, put it in a "to be sorted" folder.

When the document is scanned, 99% gets sorted perfectly.
For those documents that were rejected, they fall in specific "to be sorted" folder as TIFF an are re-injected manually. This is done when the initial process got interrupted because of timeouts, network errors etc.

My guess - some steps between the "receiving Text Zone" and the "Text zone after space trim". It seems to have different impact on the paper scan and the processed tiff file.

I'll attach you the rar with the txt files (The post only allows one file but it's helpful to see the difference between both cases)

The same script is used for paper-scans and the re-processing for the tiff-files that fall into a "rejected" folder for unknown failure reasons.

Oh, for explanation : our references are built like this : 22-08-0155-01 meaning year, month, filenumber and version. our server folders look like this
\\blablabla\missions\2022\08\0155\ <=== the file would be supposed to land here. 99% of the files land here after paper-scanning.
\\blablabla\missions\to be sorted\ <=== here's where literally everything lands wich is re-processed. I mean 10% would be okay because no input in OCR zoning etc, but currently, nothing gets sorted - BUT - see the logfile - you can see that the trimming works and the reference number is found but... "match not found."

Sorry for the tonns of text - as it is very specific I tought you might wanna have a lot of infos "isolating" the potential error source.

For the explanation of the rar:
Error log tiff scan - the log i ged when i re-process the document - see for the "no match found !" message. That's the issue I'm trying to figure out.
Success log paper scan - same script running - document scan - works like a bliss.
Script - the magic happens here

Many thanks to those who took the time reading my post and still have the courage to look into the script & log files ! ;)

TF

RE: OCR Zoning - Field data for auto-sorting is not being considered in 1 of 2 possible cases.

Quote:


My guess - some steps between the "receiving Text Zone" and the "Text zone after space trim". It seems to have different impact on the paper scan and the processed tiff file.
Have you looked into the script?
Between these 2 steps there is only trimming of the string ocrzone and removing spaces from it:

CODE

EKOManager.StatusMessage(">>> Received text zone: " & ocrzone)
	
	Dim sTxt
	Dim sTxt2
	Dim sTxt3
	Dim sTxt4
	
	ocrzone = Trim(ocrzone)
	ocrzone = Replace(ocrzone, " ", "")
	EKOManager.StatusMessage(">>> Text zone after space trim: " & ocrzone) 

Quote:


BUT - see the logfile - you can see that the trimming works and the reference number is found but... "match not found."
if you look into the files, the result match found comes when the pattern like 22-08-0091-01 is found in the string ocrzone:

CODE

Information	>>> Received text zone: 22-08-0091-01	08/18/2022 14:55:58
Information	>>> Text zone after space trim: 22-08-0091-01	08/18/2022 14:55:58
Information	        - match found: 22-08-0091-01	08/18/2022 14:55:58 
but the result no match found! comes when there isn't pattern like 22-08-0091-01 in the string ocrzone, but instead of it the string contains this ~FRO::%OCRZone%.1~

CODE

Information	>>> Received text zone: ~FRO::%OCRZone%.1~	08/18/2022 15:01:58
Information	>>> Text zone after space trim: ~FRO::%OCRZone%.1~	08/18/2022 15:01:58
Information	        - no match found!	08/18/2022 15:01:58 
I guess, that elsewhere in your process this variable ~FRO::%OCRZone%.1~ should be replaced by the pattern like this 22-08-0091-01, but the replacing failed. Therefore your script gives the result pattern not found!.
In the code you provided there is only one subroutine, but the error is not in this subroutine.

RE: OCR Zoning - Field data for auto-sorting is not being considered in 1 of 2 possible cases.

In the successful case with the result pattern found your log seems like this

CODE

Information	OCR:  Performing zoned OCR...	08/18/2022 14:55:58
Information	OCR:  Adding zone RRT ~FRO::%OCRZone%.1~=22-08-0091-01...	08/18/2022 14:55:58 

In the case with pattern not found your log seems like this

CODE

Information	OCR:  Performing zoned OCR...	08/18/2022 15:01:58
Information	OCR:  Adding zone RRT ~FRO::%OCRzone%.1~=LL—UO—UU7 1—ti...	08/18/2022 15:01:58 

This implies, that you should search for the error in the processing step which writes into log this:

CODE

Information	OCR:  Adding zone 

This processing step comes prior to VBscript processing - see the logs:

CODE

...
...
Information	OCR:  Extracting data from document page 1...	08/18/2022 15:01:58
Information	OCR:  Performing zoned OCR...	08/18/2022 15:01:58
Information	OCR:  Adding zone RRT ~FRO::%OCRzone%.1~=LL—UO—UU7 1—ti...	08/18/2022 15:01:58
...
...
Information	>>> Entering VB script ===============================================>	08/18/2022 15:01:58
Information	>>> Received text zone: ~FRO::%OCRZone%.1~	08/18/2022 15:01:58
Information	>>> Text zone after space trim: ~FRO::%OCRZone%.1~	08/18/2022 15:01:58
Information	        - no match found!	08/18/2022 15:01:58
...
... 

RE: OCR Zoning - Field data for auto-sorting is not being considered in 1 of 2 possible cases.

(OP)
Hi Mikrom, thank you for the feedback !

I did some quick tutorials and read a ebook for some basic notions on vbs as this is not my element at all, I have some comprehension now for loops and essentials like dim, fso, do while else etc but this script here goes a bit further so my understanding is... well... moderate ;) Added to this comes that english is my 4th lango and the one i learned last and ... not so long, wich adds some "challenge" to the entire situation. Therefore I joined the forum to look for some external knowledge and wisdom to help me out :)

I saw these lines, thats why i wrote that the paper scan results into a "match found!" and whan i use this exact file for the "re-processing" test, i get the second logfile with, now - doesn't seem to obtain the same result in the Ocr Zone trimming and, as a consequence, gives a "no match found!" result. But... why ? it's the same file ? The same Ocr zoning, the same script ?

I'll check for the "adding zone" part and see what happens here and why.

Thank you for your precious time and help ! :)

TF

RE: OCR Zoning - Field data for auto-sorting is not being considered in 1 of 2 possible cases.

(OP)
Short update to this post ==> I think I might have found a track.

When the Image is scanned, it's digitalized into a tiff first. OCR Zoning is done, File is sorted properly.

For tests, I take the final file (wich is now a pdf file) and I wonder, if i re-process it, if there are now "margins" redefined wich would lead to a slight "transportation" of the ocr zoning, wich initially is on the bottom right of the page. If only 1cm is "cropped" during the first process, this would mean on re-processing, that another cm is being cropped and the zoning is "cut in half" - therefore the PDF would bring a bad result where the initial process worked perfectly.

My conclusion - if this is really the reason - that in the re-processing, i would need another script, leaving apart the section where the document is first scanned, transformed, cropped etc.

I'll have to check this in the software application itself (Nuance Kofax Autostore) to see if there is some "image processing" i can remove in the re-processing sequence.

Some tests will be made and I'll let you know about the results.

RE: OCR Zoning - Field data for auto-sorting is not being considered in 1 of 2 possible cases.

(OP)
update: I removed the "image processing" part from the "re-processing" to avoid cutting edges etc.

Result in the log file :

CODE --> file

Information	OCR:  Extracting data from document page 1...	08/19/2022 08:13:32
Information	OCR:  Performing zoned OCR...	08/19/2022 08:13:32
Information	OCR:  Adding zone RRT ~FRO::%OCRzone%.1~=22-08-0091-01...	08/19/2022 08:13:32
Information	OCR:  Extracting data from document page 2...	08/19/2022 08:13:32
Information	OCR:  Performing zoned OCR...	08/19/2022 08:13:32
Information	OCR:  Replacing Zone RRTs...	08/19/2022 08:13:32
Information	OCR:  Replacing RRT ~FRO::%OCRzone%.*~=22-08-0091-01...	08/19/2022 08:13:32
Information	OCR:  Replacing RRT ~FRO::%OCRzone%..pages~=1...	08/19/2022 08:13:32
Information	OCR:  Replacing RRT ~FRO::PagesCount~=2...	08/19/2022 08:13:32
Information	OCR:  Completed processessing of file C:\AC\TEMP\{CB7FE1F4-802F-4c5f-9A56-BA0B3EFDD244}\recovered scan.pdf	08/19/2022 08:13:32
Information	OCR:  1 of 1 documents are successfully processed	08/19/2022 08:13:32
Information	OCR:  Exiting...	08/19/2022 08:13:32
Information	Memory status  WorkingSetS: 94425088, PageFileUsage: 54054912, PageFaultCount: 41300, PeakWSS: 94429184, PeakPfU: 54054912, HandleC: 1036.	08/19/2022 08:13:32 
But then,

CODE --> file

Information	>>> Entering VB script ===============================================>	08/19/2022 08:13:32
Information	>>> Received text zone: ~FRO::%OCRZone%.1~	08/19/2022 08:13:32
Information	>>> Text zone after space trim: ~FRO::%OCRZone%.1~	08/19/2022 08:13:32
Information	        - no match found!	08/19/2022 08:13:32 

I mean.. how... why.. what ? It tells me in part one that it finds the OCRZone wich is 22-08-0091-01. That's the match. And then it "enters vbs script" again? and this time it replaces the value by the initial text wich is ~FRO::&OCRZone%.1~

Like there would be a syntax issue, with too many "" or ~~ , making it use the raw text instead of the value stored inside the variable. hammer

TL;DR - wasn't the cropping margins situation, still looking for the epiphany.

RE: OCR Zoning - Field data for auto-sorting is not being considered in 1 of 2 possible cases.

Hi TomFra,

The script you posted contains only 2 subroutines:
1) MyScript_OnLoad
2) MyScript_OnUnload - but this is empty.

The subroutine MyScript_OnLoad works with the variable ocrzone. However, it is not clear from the code where the subroutine MyScript_OnLoad takes the variable ocrzone from ...

There should be other part of the VBscript, which sets the variable ocrzone and then calls the subroutine MyScript_OnLoad, or i cannot think of a different way, how the subroutine MyScript_OnLoad could be called from an other program during the scanning process.

RE: OCR Zoning - Field data for auto-sorting is not being considered in 1 of 2 possible cases.

(OP)
Hi Mikrom,

I'll see if I can find out somehow.

Thank you for your time and efforts !

Tom

RE: OCR Zoning - Field data for auto-sorting is not being considered in 1 of 2 possible cases.

Hi TomFra,

I found some doc here ... maybe it's the system you are using:
https://docshield.kofax.com/ControlSuite/en_US/1.2...
In the documentation, there are screenshots, the script is defined in the tab General
and the fields for the script are defined in the tab Fields. You can try to look there, how your field ocrzone is defined.

But IMO, the best would be to ask the support of your software, or maybe there is a specialized forum (for example here https://community.kofax.com/s/?language=en_US ) where you can ask your questions, or maybe they have mailing list.

RE: OCR Zoning - Field data for auto-sorting is not being considered in 1 of 2 possible cases.

(OP)
Hi Mikrom,

thank you very much!

I was looking here because It felt like a script-problem but that's not even sure.

I'll check the docs you linked and see if the community around that software might help, then.

Thank you !

Tom

Red Flag This Post

Please let us know here why this post is inappropriate. Reasons such as off-topic, duplicates, flames, illegal, vulgar, or students posting their homework.

Red Flag Submitted

Thank you for helping keep Tek-Tips Forums free from inappropriate posts.
The Tek-Tips staff will check this out and take appropriate action.

Reply To This Thread

Posting in the Tek-Tips forums is a member-only feature.

Click Here to join Tek-Tips and talk with other members! Already a Member? Login


Close Box

Join Tek-Tips® Today!

Join your peers on the Internet's largest technical computer professional community.
It's easy to join and it's free.

Here's Why Members Love Tek-Tips Forums:

Register now while it's still free!

Already a member? Close this window and log in.

Join Us             Close