DocumentDataType¶
Overview¶
The DocumentDataType enum specifies the type of data associated with a document. A document can have multiple related IDocumentData entries, each with a specific type.
Underlying type: int
Member Reference¶
Image¶
Value: 0
Description: Original document payload (e.g., binary image/PDF bytes, audio).
Usage:
var images = module.GetDocumentData(doc, new List<DocumentDataType?> { DocumentDataType.Image });
var original = images?.FirstOrDefault();
if (original != null)
{
// original.Data holds the source bytes (may be large)
// original.ContentType indicates MIME type
}
When to use:
- To access the original file bytes for reprocessing, archiving, or downstream conversions.
Tip:
- Use Preview for thumbnails; stream large Data blobs to avoid memory pressure.
Result¶
Value: 1
Description: Outputs produced by processing (e.g., normalized PDFs, JSON results, exports).
Usage:
var results = module.GetDocumentData(doc, new List<DocumentDataType?> { DocumentDataType.Result });
var export = results?.FirstOrDefault(r => r.ContentType == "application/json");
When to use:
- To fetch provider or pipeline outputs for integration, auditing, or export.
Tip:
- Inspect ContentType (e.g., application/json, application/pdf) to route handling.
Page¶
Value: 2
Description: Per-page data derived from the Image (e.g., page images, OCR page text).
Usage:
var pages = module.GetDocumentData(doc, new List<DocumentDataType?> { DocumentDataType.Page });
var page0 = pages?.FirstOrDefault(p => p.Index == 0);
var text = page0?.Text; // OCR text if available
When to use:
- Page-level processing (thumbnails, per-page OCR, layout ops).
Tip:
- Use Index to select the correct page; Width/Height/Resolution can guide rendering.
Backup¶
Value: 3
Description: Backup copy of the original image/data for rollback or recovery.
Usage:
var backups = module.GetDocumentData(doc, new List<DocumentDataType?> { DocumentDataType.Backup });
When to use:
- To restore or compare when original content has been transformed.
Tip:
- Not all pipelines create backups; null-check before use.
Definition¶
Value: 10
Description: Definition artifacts (e.g., trained model definition, schema) generated by services.
Usage:
var defs = module.GetDocumentData(doc, new List<DocumentDataType?> { DocumentDataType.Definition });
var defBlob = defs?.FirstOrDefault();
When to use:
- Accessing model/definition outputs after training for diagnostics or export.
Tip:
- Pair with Verification step artifacts to validate field mappings post-training.
Training¶
Value: 11
Description: Training data artifacts (e.g., labeled samples, provider-generated training outputs).
Usage:
var trainingData = module.GetDocumentData(doc, new List<DocumentDataType?> { DocumentDataType.Training });
When to use:
- Auditing or migrating training assets; quality reviews of labeled data.
Tip:
- Training data can be large and varied; filter by ContentType to process selectively.
Best Practices¶
- Always filter queries by specific DocumentDataType values to reduce payload size and improve performance.
- Use Index for Page data to target the correct page; prefer Preview for UI thumbnails.
- Check ContentType before parsing Data to avoid decoding errors.
- For large binary payloads, stream or process in chunks to prevent memory spikes.
- Keep Result artifacts for audit and integration; consider retention policies for Backup and Training blobs.