Text Analysis (String Rule Type)¶
Text Analysis enriches String fields with insights such as sentiment, language, entities, entity links, and key phrases. You can store these insights as metadata, or optionally use them to override the fieldโs value and/or confidence.
Where to configure¶
Open the field, select the String rule, then expand the Text Analysis section.
Pair with String validation
Combine Text Analysis with String validations (length, regex, normalization) to catch both structural issues and content quality issues.
Culture and overrides¶
Use these controls to improve accuracy and decide whether analytics should replace the original value.
- Culture Language Code (optional): e.g.,
en
,fr
,Spanish
- Culture Country Code (optional): e.g.,
US
,ZA
,Germany
- Use Analytics Value (optional): override the extracted value with the analytics value
- Use Analytics Confidence (optional): override the extracted confidence with the analytics confidence
When to override
- Keep original text (disable overrides) for freeโform notes.
- Override (enable) when the analytics produce a clear, structured value you want to store directly (e.g., a sentiment label or top key phrase).
Analysis types and options¶
Each analysis type includes a โSelect optionsโ picker so you can choose which attributes to store. Select only what you need.
1) Sentiment Analysis¶
Options available:
- Text
- Sentiment
- PositiveScore
- NegativeScore
- NeutralScore
Typical usage:
- Store Sentiment and one or more scores for auditing.
- Optionally enable โUse Analytics Valueโ to set the fieldโs value to the Sentiment label.
Example output (stored as metadata):
- Sentiment: Negative
- PositiveScore: 0.04
- NeutralScore: 0.21
- NegativeScore: 0.75
Thresholds and HITL
Define clear thresholds for review. Example: escalate when NegativeScore โฅ 0.6 or when the difference between top scores is < 0.15 (uncertain sentiment).
2) Language Detection¶
Options available:
- Language
- Code
Typical usage:
- Validate that text matches the expected language; use Code for consistent downstream logic.
Example (HITL trigger):
- Expected language code:
en
; Detected code:es
โ send to review with message โNonโEnglish text detected.โ
3) Named Entities¶
Extract entities like people, organizations, locations, dates, and amounts.
Options available (per entity):
- Text
- Category
- SubCategory
Example output (stored as metadata):
- Entity 1
- Text: AIForged
- Category: Organization
- Entity 2
- Text: Cerritos
- Category: Location
Use cases
- Flag PII (e.g., Person) for special handling.
- Validate presence of required entities (e.g., an Organization in a supplier note).
4) Entity Linking¶
Link entities to canonical entries in a knowledge base (public or internal).
Options available (per link):
- Name
- Id
- Url
- DataSource
Example output (stored as metadata):
- Link 1
- Name: South Africa
- Id: Q258
- Url: https://www.wikidata.org/wiki/Q258
- DataSource: Wikidata
Why linking matters
Linking turns ambiguous strings into stable identifiers, improving deduplication and reporting.
5) Key Phrase Detection¶
Identify the most important phrase(s) in the text.
Options available:
- BestOption
- Combine
Common patterns:
- Use BestOption and enable โUse Analytics Valueโ to set the field to the top phrase when you want a concise label.
- Use Combine to store multiple phrases; keep original text as the field value and phrases as metadata.
Example output (stored as metadata):
- BestOption: invoice overdue
- Combine:
- invoice overdue
- payment terms
Keep payloads small
If you only need the top phrase for routing or tagging, prefer BestOption.
Filtering which results are saved¶
Use the โSelect optionsโ pickers within each analysis type to choose exactly which attributes are written back. This reduces payload size and keeps your data model clean.
Start minimal
Begin with just the attributes youโll use immediately (e.g., Sentiment + NegativeScore). Add more later as reporting needs grow.
Examples: escalation logic¶
- Sentiment-driven review
- If Sentiment = Negative and NegativeScore โฅ 0.6 โ send to HITL.
- Language mismatch
- Expected:
en
; Detected Code โen
โ send to HITL with message โNonโEnglish text detected.โ
- Expected:
- Missing required entity
- If no Organization entity found โ escalate.
- No meaningful key phrase
- If BestOption is empty and Combine < 2 phrases โ escalate for manual summary.
Performance and privacy¶
Performance
Enabling many analysis types at once increases processing time. Start with the one or two insights that deliver the most value.
Privacy
Named Entities and Entity Linking can reveal PII or connect to external sources. Ensure your data handling policy allows thisโespecially in production.
Troubleshooting¶
-
Results look inconsistent across documents
Set Culture Language/Country to guide disambiguation and formatting. -
Confidence overrides not reflected
Ensure โUse Analytics Confidenceโ is enabled and that the provider returns a confidence for the selected analysis. -
Too many false positives in entities
Limit stored attributes to Category only, or post-filter by the categories relevant to your process.
UI reference¶
The pickers below show the attributes available per analysis type.
Quick checklist¶
- [ ] Set Culture Language/Country when you know the locale.
- [ ] Select only the attributes you need in each โSelect optionsโ dropdown.
- [ ] Decide whether to override Value/Confidence or store analytics as metadata only.
- [ ] Define clear HITL thresholds (e.g., negative sentiment, language mismatch).
- [ ] Test with a small, varied sample and review both pass/fail paths.