Skip to main content
Preparation is where you turn raw data into a high-quality training set. The quality of your model is directly proportional to the quality of this data.

Data Hygiene

  • Avoid Overlap: Ensure your Fine-tune Dataset does not overlap with your Test Set. You want to train on one set of data and validate on a completely different set to ensure the model generalizes well.
  • Curate for Quality: Remove incorrect, ambiguous, or low-value examples.

AI Review & Modification

  • AI Review: Use high-capability models to systematically check your dataset for accuracy and consistency.
  • Broad Modification: Perform bulk updates using Regex Keyword Replacement to instantly align thousands of responses with new requirements or brand guidelines across your entire dataset.

Augmentations

  • Generate from Examples: Use your best examples as seeds to generate dozens of similar but unique training cases.
  • Variation Injection: Automatically create variations in tone and phrasing to make your model more robust to different user styles.

Next