Microsoft Custom Text Classification
Overview
The Microsoft Custom Text Classification Service in AIForged leverages Azure Cognitive Services for Language to enable powerful, custom document classification. This cloud-based API allows you to train machine learning models on your own labeled data, providing flexibility and high accuracy for text classification tasks.
This service is ideal for organizations with unique document types, custom workflows, or specific classification needs that arenβt addressed by out-of-the-box models.
How It Works
- Custom Model Training: Users upload their own labeled documents to train the model. Each document must be tagged with the correct class or category.
- Flexible Data Split: You can specify the percentage of documents used for training versus evaluation (recommended: 80% training, 20% testing).
- Dependent OCR Service: An OCR service is automatically created alongside this classification service to extract text from documents before classification. Default OCR settings are usually sufficient, but can be adjusted if needed.
Possible Use Cases
- Distinguishing between different types or variants of similar documents.
- Automating email or ticket triage.
- Custom document routing based on content.
Supported Content Types
- Word or Excel documents
- Images
- Text files
If your content is in a different format, use the AIForged PDF Converter as a pre-processor to generate a compatible file for classification.
Best Practices
- Upload at Least 50 Documents per Category: For best results, upload a minimum of 50 labeled documents for each category you want to train.
- Use Clear and Consistent Labels: Ensure each document is accurately tagged with its intended category.
- Recommended Data Split: Use 80% of your data for training and 20% for evaluation to minimize overfitting and improve model performance.
- Diverse Examples: Include a variety of real-world examples for each category to help the model generalize.
Service Setup
Follow these steps to add and configure the Microsoft Custom Text Classification Service to your agent:
- Open the Agent View Navigate to the agent where you want to add the classification service.
- Add the Classification Service
Click the Add Service
button.
- Select Service Type
Choose Microsoft Custom Text Classifier Service from the available service types.
Configure the Service Wizard Open the Service Configuration Wizard.
or
- Step 1: General Settings
Configure the core settings. Default settings are sufficient for most use cases.
- Set the Training/Testing Split Percentage as desired (default: 80% training, 20% testing).
- Set the Training/Testing Split Percentage as desired (default: 80% training, 20% testing).
- Step 2: Define Categories
Add user-defined categories for your classification task.
Step 3: Upload Training Documents
- Click the Upload
button or drag and drop files over the documents grid.
- Select the user-defined category for your uploads.
- Upload files for each category (recommended: at least 50 documents per category).
- Once all files are uploaded, review your settings.
- Click the Train
button.
- In the dialog window, click Process (leave all settings as default).
- The training progress dialog will appear and close automatically when finished.
Tip: Training duration depends on the number of files uploaded.
- Click the Upload
- Step 4: Definition Document After training, a definition document will be created automatically.
- Step 5: Dependent OCR Service
The service configuration page for the dependent OCR service will open.
- Choose the appropriate OCR engine (Tesseract, Microsoft, or Google) if you need to adjust settings.
- For most use cases, default settings are sufficient.
- Step 1: General Settings
Configure the core settings. Default settings are sufficient for most use cases.
Adding and Processing Documents
- Open Service
In your Microsoft Custom Text Classification Service, click the Upload
button or drag and drop files over the document grid.
- Select Category (Optional) If known, select the document category. Otherwise, click on No category.
- Process Documents
After uploading, select the documents to process and click the Process
button.
Tip: For new services, process a small batch first to verify results before processing everything.
Troubleshooting Tips
- Model Accuracy Is Low:
- Ensure you have uploaded enough labeled documents per category (at least 50).
- Double-check that labels are accurate and unambiguous.
- Increase data diversity by including a variety of document types for each category.
- Training Fails or Is Slow:
- Make sure your files are in a supported format (PDF, Word, Excel, Image, or Text).
- Check for upload errors or corrupted documents.
- Large datasets may take longer; monitor the progress dialog for updates.
- OCR Issues:
- If text extraction is poor, try a different OCR engine or adjust OCR settings.
- Use clear, high-quality source documents for best OCR accuracy.