Tesseract OCR
The Tesseract Handwriting OCR (optical character recognition) is the process of automatically extracting handwritten information from paper, scans, and other low-quality digital documents. The raw text, without any layout information, will be extracted, and is stored in the Result property of a document.
Supported Content Types
- Images
If you need to use a different type of content, you can use the AiForged Pdf Converter.
Possible use cases
- Extract data from receipts
- Visualize your receipts data
- Apply RegEx Patterns on the raw text on a document.
Service Setup
- Open the Project Detail View of the project you would like to add the service to.
- Click on the Add Service button in the command bar.\ (2).png>)
Select the Tesseract OCR Service.
A new Service Configuration Wizard will open:\ (When navigating the Wizard, please make sure to use the Next Step button in the command bar to save any changes made).
Step 1 - Service Config (Displays the service setting, you can change it as required).
- Click the βNextβ Action in the Action Bar to save the Service.
- Step 2 - Verification. No verification required.
Service Configuration Settings
The Tesseract OCR Service can be configured by the user as a flexible solution. The following Settings are available:
Setting | Type | Required Type | Description |
---|---|---|---|
ArchivingStrategy | Optional | Days before documents get deleted. | |
AccessKey | Optional | Override the Access Key to the configured Tesseract cloud service. | |
BaseURL | Optional | Override the URL to the configured Tesseract cloud service. | |
BatchSize | Hidden | Processing batch size. | |
CheckElectronic | Optional | Check if the document is electronic and extract text from there. | |
DocumentProcessedStatus | Optional | Document status used to denote that a document has been processed. | |
Enabled | Hidden | Enable or disable the service. | |
ExecuteBeforeProcess | When set up as a child service, specify whether this service should be executed before the parent service gets executed. | ||
ExecuteAfterProcess | When set up as a child service, specify whether this service should be executed after the parent service gets executed. | ||
IsVerification | Optional | Save the lines and words for verification. | |
Language | Optional | Hint at the language to use when performing OCR. | |
MinSize | Optional | Minimum size of the document to OCR. | |
Password | Optional | Used for service authentication. Custom Code can be used to set the password. Can be set per document. | |
RemoveComments | Optional | Remove human comments from a document. |
Add and Process Documents
- In the Tesseract OCR Service click on Inbox button.
- Select the Status you want to upload and use Status None or Received for new documents that have not been processed yet.
- Select an optional category if you know the category for the document, if you donβt want to select one just click on βNo selectionβ.
- Find the files on your Local machine and upload them. The demo's test files can be found at the following link: Click here.
- After all the documents have been uploaded you can check the documents to be processed, click on Processed Checked to process the documents.
It is recommended to only process a few documents at a time, especially if it is a new service to properly test if you receive the results you want before processing everything.
View Processed Documents
- In the Tesseract OCR Service click on the Outbox button.
- You can view your Processing results by opening a processed doc for verification.