The Tesseract Handwriting OCR (optical character recognition) is the process of automatically extracting handwritten information from paper, scans, and other low-quality digital documents. The raw text, without any layout information, will be extracted, and is stored in the Result property of a document.
Supported Content Types
If you need to use a different type of content, you can use the AiForged Pdf Converter.
Possible use cases
- Extract data from receipts
- Visualize your receipts data
- Apply RegEx Patterns on the raw text on a document.
- Open the Project Detail View of the project you would like to add the service to.
- Click on the Add Service button in the command bar.
Select the Tesseract OCR Service.
A new Service Configuration Wizard will open:
(When navigating the Wizard, please make sure to use the Next Step button in the command bar to save any changes made).
- Step 1 - Service Config (Displays the service setting, you can change it as required).
- Click the “Next” Action in the Action Bar to save the Service.
- Step 2 - Verification. No verification required.
Service Configuration Settings
The Tesseract OCR Service can be configured by the user as a flexible solution. The following Settings are available:
|ArchivingStrategy||Optional||Days before documents get deleted.|
|AccessKey||Optional||Override the Access Key to the configured Tesseract cloud service.|
|BaseURL||Optional||Override the URL to the configured Tesseract cloud service.|
|BatchSize||Hidden||Processing batch size.|
|CheckElectronic||Optional||Check if the document is electronic and extract text from there.|
|DocumentProcessedStatus||Optional||Document status used to denote that a document has been processed.|
|Enabled||Hidden||Enable or disable the service.|
|ExecuteBeforeProcess||When set up as a child service, specify whether this service should be executed before the parent service gets executed.|
|ExecuteAfterProcess||When set up as a child service, specify whether this service should be executed after the parent service gets executed.|
|IsVerification||Optional||Save the lines and words for verification.|
|Language||Optional||Hint at the language to use when performing OCR.|
|MinSize||Optional||Minimum size of the document to OCR.|
|Password||Optional||Used for service authentication. Custom Code can be used to set the password. Can be set per document.|
|RemoveComments||Optional||Remove human comments from a document.|
Add and Process Documents
- In the Tesseract OCR Service click on Inbox button.
- Select the Status you want to upload and use Status None or Received for new documents that have not been processed yet.
- Select an optional category if you know the category for the document, if you don’t want to select one just click on “No selection”.
- Find the files on your Local machine and upload them. The demo’s test files can be found at the following link: Click here.
- After all the documents have been uploaded you can check the documents to be processed, click on Processed Checked to process the documents.
It is recommended to only process a few documents at a time, especially if it is a new service to properly test if you receive the results you want before processing everything.
View Processed Documents
- In the Tesseract OCR Service click on the Outbox button.
- You can view your Processing results by opening a processed doc for verification.