Skip to main contentPreparation is where you turn raw data into a high-quality training set. The quality of your model is directly proportional to the quality of this data.
Data Hygiene
- Avoid Overlap: Ensure your Fine-tune Dataset does not overlap with your Test Set. You want to train on one set of data and validate on a completely different set to ensure the model generalizes well.
- Curate for Quality: Remove incorrect, ambiguous, or low-value examples.
AI Review & Modification
- AI Review: Use high-capability models to systematically check your dataset for accuracy and consistency.
- Broad Modification: Perform bulk updates using Regex Keyword Replacement to instantly align thousands of responses with new requirements or brand guidelines across your entire dataset.
Augmentations
- Generate from Examples: Use your best examples as seeds to generate dozens of similar but unique training cases.
- Variation Injection: Automatically create variations in tone and phrasing to make your model more robust to different user styles.
Next