Tesseract OCR

The Tesseract Handwriting OCR (optical character recognition) is the process of automatically extracting handwritten information from paper, scans, and other low-quality digital documents. The raw text, without any layout information, will be extracted, and is stored in the Result property of a document.

Supported Content Types

Images
Pdf

If you need to use a different type of content, you can use the AiForged Pdf Converter.

Possible use cases

Extract data from receipts
Visualize your receipts data
Apply RegEx Patterns on the raw text on a document.

Service Setup

Open the Project Detail View of the project you would like to add the service to.
Click on the Add Service button in the command bar.\ (2).png>)
Select the Tesseract OCR Service.
A new Service Configuration Wizard will open:\ (When navigating the Wizard, please make sure to use the Next Step button in the command bar to save any changes made).
Step 1 - Service Config (Displays the service setting, you can change it as required).
Click the “Next” Action in the Action Bar to save the Service.
Step 2 - Verification. No verification required.

Service Configuration Settings

The Tesseract OCR Service can be configured by the user as a flexible solution. The following Settings are available:

Setting	Required Type	Description
ArchivingStrategy	Optional	Days before documents get deleted.
AccessKey	Optional	Override the Access Key to the configured Tesseract cloud service.
BaseURL	Optional	Override the URL to the configured Tesseract cloud service.
BatchSize	Hidden	Processing batch size.
CheckElectronic	Optional	Check if the document is electronic and extract text from there.
DocumentProcessedStatus	Optional	Document status used to denote that a document has been processed.
Enabled	Hidden	Enable or disable the service.
ExecuteBeforeProcess		When set up as a child service, specify whether this service should be executed before the parent service gets executed.
ExecuteAfterProcess		When set up as a child service, specify whether this service should be executed after the parent service gets executed.
IsVerification	Optional	Save the lines and words for verification.
Language	Optional	Hint at the language to use when performing OCR.
MinSize	Optional	Minimum size of the document to OCR.
Password	Optional	Used for service authentication. Custom Code can be used to set the password. Can be set per document.
RemoveComments	Optional	Remove human comments from a document.

Add and Process Documents

In the Tesseract OCR Service click on Inbox button.
Select the Status you want to upload and use Status None or Received for new documents that have not been processed yet.
Select an optional category if you know the category for the document, if you don’t want to select one just click on “No selection”.
Find the files on your Local machine and upload them. The demo's test files can be found at the following link: Click here.
After all the documents have been uploaded you can check the documents to be processed, click on Processed Checked to process the documents.

It is recommended to only process a few documents at a time, especially if it is a new service to properly test if you receive the results you want before processing everything.

View Processed Documents

In the Tesseract OCR Service click on the Outbox button.
You can view your Processing results by opening a processed doc for verification.

Tesseract OCR

Tesseract OCR

Supported Content Types

Possible use cases

Service Setup

Service Configuration Settings

Add and Process Documents

View Processed Documents

results matching ""

No results matching ""

results matching ""

No results matching ""