Tesseract OCR
 (2).png)
The Tesseract Handwriting OCR (optical character recognition) is the process of automatically extracting handwritten information from paper, scans, and other low-quality digital documents. The raw text, without any layout information, will be extracted, and is stored in the Result property of a document.
Supported Content Types
- Images
If you need to use a different type of content, you can use the AiForged Pdf Converter.
Possible use cases
- Extract data from receipts
- Visualize your receipts data
- Apply RegEx Patterns on the raw text on a document.
Service Setup
- Open the Project Detail View of the project you would like to add the service to.
- Click on the Add Service button in the command bar.\
(2).png>)
Select the Tesseract OCR Service.
A new Service Configuration Wizard will open:\ (When navigating the Wizard, please make sure to use the Next Step button in the command bar to save any changes made).
Step 1 - Service Config (Displays the service setting, you can change it as required).
- Click the βNextβ Action in the Action Bar to save the Service.
- Step 2 - Verification. No verification required.
Service Configuration Settings
The Tesseract OCR Service can be configured by the user as a flexible solution. The following Settings are available:
Setting | Type | Required Type | Description |
---|---|---|---|
ArchivingStrategy | ![]() | Optional | Days before documents get deleted. |
AccessKey | ![]() | Optional | Override the Access Key to the configured Tesseract cloud service. |
BaseURL | ![]() | Optional | Override the URL to the configured Tesseract cloud service. |
BatchSize | ![]() | Hidden | Processing batch size. |
CheckElectronic | ![]() | Optional | Check if the document is electronic and extract text from there. |
DocumentProcessedStatus | ![]() | Optional | Document status used to denote that a document has been processed. |
Enabled | ![]() | Hidden | Enable or disable the service. |
ExecuteBeforeProcess | ![]() | When set up as a child service, specify whether this service should be executed before the parent service gets executed. | |
ExecuteAfterProcess | ![]() | When set up as a child service, specify whether this service should be executed after the parent service gets executed. | |
IsVerification | ![]() | Optional | Save the lines and words for verification. |
Language | ![]() | Optional | Hint at the language to use when performing OCR. |
MinSize | ![]() | Optional | Minimum size of the document to OCR. |
Password | ![]() | Optional | Used for service authentication. Custom Code can be used to set the password. Can be set per document. |
RemoveComments | ![]() | Optional | Remove human comments from a document. |
Add and Process Documents
- In the Tesseract OCR Service click on Inbox button.
- Select the Status you want to upload and use Status None or Received for new documents that have not been processed yet.
- Select an optional category if you know the category for the document, if you donβt want to select one just click on βNo selectionβ.
- Find the files on your Local machine and upload them. The demo's test files can be found at the following link: Click here.
- After all the documents have been uploaded you can check the documents to be processed, click on Processed Checked to process the documents.
It is recommended to only process a few documents at a time, especially if it is a new service to properly test if you receive the results you want before processing everything.
View Processed Documents
- In the Tesseract OCR Service click on the Outbox button.
- You can view your Processing results by opening a processed doc for verification.