Skip to content

DocumentDataType

Overview

The DocumentDataType enum specifies the type of data associated with a document. A document can have multiple related IDocumentData entries, each with a specific type.

Underlying type: int


Member Reference

Image

Value: 0

Description:
Original document payload (e.g., binary image/PDF bytes, audio).

Usage:

var images = module.GetDocumentData(doc, new List<DocumentDataType?> { DocumentDataType.Image });
var original = images?.FirstOrDefault();
if (original != null)
{
    // original.Data holds the source bytes (may be large)
    // original.ContentType indicates MIME type
}

When to use:

  • To access the original file bytes for reprocessing, archiving, or downstream conversions.

Tip

Use Preview for thumbnails; stream large Data blobs to avoid memory pressure.


Result

Value: 1

Description:
Outputs produced by processing (e.g., normalized PDFs, JSON results, exports).

Usage:

var results = module.GetDocumentData(doc, new List<DocumentDataType?> { DocumentDataType.Result });
var export = results?.FirstOrDefault(r => r.ContentType == "application/json");

When to use:

  • To fetch provider or pipeline outputs for integration, auditing, or export.

Tip

Inspect ContentType (e.g., application/json, application/pdf) to route handling.


Page

Value: 2

Description:
Per-page data derived from the Image (e.g., page images, OCR page text).

Usage:

var pages = module.GetDocumentData(doc, new List<DocumentDataType?> { DocumentDataType.Page });
var page0 = pages?.FirstOrDefault(p => p.Index == 0);
var text = page0?.Text; // OCR text if available

When to use:

  • Page-level processing (thumbnails, per-page OCR, layout ops).

Tip

Use Index to select the correct page; Width/Height/Resolution can guide rendering.


Backup

Value: 3

Description:
Backup copy of the original image/data for rollback or recovery.

Usage:

var backups = module.GetDocumentData(doc, new List<DocumentDataType?> { DocumentDataType.Backup });

When to use:

  • To restore or compare when original content has been transformed.

Tip

Not all pipelines create backups; null-check before use.


Definition

Value: 10

Description:
Definition artifacts (e.g., trained model definition, schema) generated by services.

Usage:

var defs = module.GetDocumentData(doc, new List<DocumentDataType?> { DocumentDataType.Definition });
var defBlob = defs?.FirstOrDefault();

When to use:

  • Accessing model/definition outputs after training for diagnostics or export.

Tip

Pair with Verification step artifacts to validate field mappings post-training.


Training

Value: 11

Description:
Training data artifacts (e.g., labeled samples, provider-generated training outputs).

Usage:

var trainingData = module.GetDocumentData(doc, new List<DocumentDataType?> { DocumentDataType.Training });

When to use:

  • Auditing or migrating training assets; quality reviews of labeled data.

Tip

Training data can be large and varied; filter by ContentType to process selectively.


Best Practices

  • Always filter queries by specific DocumentDataType values to reduce payload size and improve performance.
  • Use Index for Page data to target the correct page; prefer Preview for UI thumbnails.
  • Check ContentType before parsing Data to avoid decoding errors.
  • For large binary payloads, stream or process in chunks to prevent memory spikes.
  • Keep Result artifacts for audit and integration; consider retention policies for Backup and Training blobs.