Email¶
Use the Email Rule Type when a field should contain a valid email address. The rule validates the structure, optionally enforces an expected domain, and can normalize common OCR-induced issues before routing downstream systems.
When to use¶
- Contact details on invoices, forms, onboarding documents, and correspondence.
- Fields expected to contain a single email address for notifications or routing.
- Scenarios where the email domain should match a known organization or tenant.
Open Field Configuration¶
See Field Rules (Rules Engine) for how to open the field configuration:
- From the document overlay (supported services), or
- From the Fields panel on the right sidebar.
Configure the Email rule¶
Core settings:
- Rule Type: Email
- Default Value: The value to return when nothing is extracted (e.g., Empty Text).
- Domain (optional): Set an expected domain to validate against (e.g., example.com).
- You can also link this to another field so the expected domain is dynamic (e.g., supplier domain).
Validation performed by the rule includes:
- Presence of a single β@β separating local part and domain.
- At least one β.β in the domain portion.
- Basic character checks and shape validation to reduce false emails.
Dynamic domain validation
Use the link icon next to Domain to bind it to another field (e.g., βCompany Domainβ), so each document is validated against the correct expected domain.
Input parsing and normalization (if applicable)¶
Recommended normalization steps before/with validation:
- Trim leading/trailing whitespace.
- Remove wrapping characters from display names (e.g., βJohn Doe john@example.comβ β john@example.com).
- Normalize obvious OCR artifacts:
- Replace β(at)β or β[at]β β β@β
- Replace β(dot)β or β[dot]β β β.β
- Remove spaces around β@β and β.β (e.g., βjohn @ example . comβ β βjohn@example.comβ)
- Convert domain to lowercase; keep local part as-is (local-part is technically case-sensitive, but most systems treat it case-insensitively).
Ambiguous or multiple values
If the field may contain multiple emails, split and choose the first valid one or escalate to review. The Email rule expects a single address.
Output and formatting (if applicable)¶
- Preserve the local part as entered (unless your policies require lowercasing).
- Lowercase the domain for consistency.
- Store the final value without display names or comments (just local@domain).
Internationalized domains
If you receive nonβASCII domains (IDNs), convert to a consistent representation (e.g., punycode) in a downstream step if your target systems require it.
HITL triggers and reviewer guidance¶
Trigger review when:
- The extracted value fails basic structure checks.
- The domain does not match the configured/linked expected domain.
- The value contains disallowed characters or looks obfuscated after normalization (e.g., βname at domain dot comβ still unresolved).
- Multiple emails are present and an automatic selection policy isnβt defined.
Suggested reviewer note (HITL)
βVerify the email address. If display names or extra text are present, extract only the mailbox (local@domain). Expected domain: example.com.β
Examples¶
-
Basic enforcement with fixed domain
- Domain: example.com
- Input: βSales sales@example.comβ β Output: sales@example.com (passes)
- Input: βinfo@examples.comβ β Escalate (domain mismatch)
-
Normalizing obfuscated input
- Input: βjohn (at) example (dot) comβ β Normalize β john@example.com (passes)
-
Removing display name and trailing punctuation
- Input: βAccounts Payable ap@example.com.β β Output: ap@example.com (passes)
Best practices¶
- Keep the field singleβvalued; handle multiple addresses upstream or via a dedicated parsing step.
- Use a dynamic Domain link when the expected domain varies by document (supplier/customer).
- Normalize domain casing; avoid altering the local part unless your systems require lowercasing.
- Treat obvious OCR substitutions conservatively; escalate when uncertain rather than guessing.
- Log the reason for failures (e.g., invalid domain) to speed reviewer decisions.
Testing checklist¶
- [ ] Typical valid addresses (john@example.com, jane.doe@example.co.uk).
- [ ] Display name formats (βJane Doe jane@example.comβ).
- [ ] Obfuscated patterns (βname [at] domain [dot] comβ).
- [ ] Edge cases: consecutive dots, leading/trailing dots, missing TLD.
- [ ] Domain mismatch behavior when an expected domain is set or linked.
- [ ] Unicode/IDN scenarios if applicable to your documents.
Troubleshooting¶
-
Value fails validation unexpectedly
Check for hidden characters or spaces; ensure normalization removes display names and comments. -
Frequent domain mismatches
Verify the configured Domain or linked field; some emails may be from thirdβparty providers (e.g., gmail.com) rather than the entityβs corporate domain. -
Multiple emails in one field
Add a preβprocessing step to split and select, or escalate when multiples are found. -
Obfuscated text still not recognized
Expand your normalization patterns (e.g., variants of βatβ/βdotβ) or send to HITL with clear reviewer guidance.
UI reference¶
- Base panel (Rule Type, Default Value, Domain with link option)