Skip to content

DocumentDataType

Overview

The DocumentDataType enum specifies the type of data associated with a document. A document can have multiple related IDocumentData entries, each with a specific type.

Underlying type: int


Member Reference

Image

Value: 0

Description: Original document payload (e.g., binary image/PDF bytes, audio).

Usage:

var images = module.GetDocumentData(doc, new List<DocumentDataType?> { DocumentDataType.Image });
var original = images?.FirstOrDefault();
if (original != null)
{
    // original.Data holds the source bytes (may be large)
    // original.ContentType indicates MIME type
}

When to use:

  • To access the original file bytes for reprocessing, archiving, or downstream conversions.

Tip:

  • Use Preview for thumbnails; stream large Data blobs to avoid memory pressure.

Result

Value: 1

Description: Outputs produced by processing (e.g., normalized PDFs, JSON results, exports).

Usage:

var results = module.GetDocumentData(doc, new List<DocumentDataType?> { DocumentDataType.Result });
var export = results?.FirstOrDefault(r => r.ContentType == "application/json");

When to use:

  • To fetch provider or pipeline outputs for integration, auditing, or export.

Tip:

  • Inspect ContentType (e.g., application/json, application/pdf) to route handling.

Page

Value: 2

Description: Per-page data derived from the Image (e.g., page images, OCR page text).

Usage:

var pages = module.GetDocumentData(doc, new List<DocumentDataType?> { DocumentDataType.Page });
var page0 = pages?.FirstOrDefault(p => p.Index == 0);
var text = page0?.Text; // OCR text if available

When to use:

  • Page-level processing (thumbnails, per-page OCR, layout ops).

Tip:

  • Use Index to select the correct page; Width/Height/Resolution can guide rendering.

Backup

Value: 3

Description: Backup copy of the original image/data for rollback or recovery.

Usage:

var backups = module.GetDocumentData(doc, new List<DocumentDataType?> { DocumentDataType.Backup });

When to use:

  • To restore or compare when original content has been transformed.

Tip:

  • Not all pipelines create backups; null-check before use.

Definition

Value: 10

Description: Definition artifacts (e.g., trained model definition, schema) generated by services.

Usage:

var defs = module.GetDocumentData(doc, new List<DocumentDataType?> { DocumentDataType.Definition });
var defBlob = defs?.FirstOrDefault();

When to use:

  • Accessing model/definition outputs after training for diagnostics or export.

Tip:

  • Pair with Verification step artifacts to validate field mappings post-training.

Training

Value: 11

Description: Training data artifacts (e.g., labeled samples, provider-generated training outputs).

Usage:

var trainingData = module.GetDocumentData(doc, new List<DocumentDataType?> { DocumentDataType.Training });

When to use:

  • Auditing or migrating training assets; quality reviews of labeled data.

Tip:

  • Training data can be large and varied; filter by ContentType to process selectively.

Best Practices

  • Always filter queries by specific DocumentDataType values to reduce payload size and improve performance.
  • Use Index for Page data to target the correct page; prefer Preview for UI thumbnails.
  • Check ContentType before parsing Data to avoid decoding errors.
  • For large binary payloads, stream or process in chunks to prevent memory spikes.
  • Keep Result artifacts for audit and integration; consider retention policies for Backup and Training blobs.