Now supporting Claude Sonnet 4.6

Synthetic Training Data for LLMs

Generate high-quality, schema-validated datasets with full provenance tracking. Perfect for fine-tuning, RLHF, and evaluation.

Schema ValidatedCross-Order DeduplicationFull ProvenanceSafety Filtered

Three Data Schemas

Purpose-built schemas for every ML training paradigm.

instruction_v1

Instruction-response pairs for supervised fine-tuning. Each record includes system context, user instruction, and ideal response.

preference_v1

Paired responses with preference labels for RLHF. Includes chosen/rejected responses with quality scores and reasoning.

eval_v1

Evaluation datasets for benchmarking. Input-output pairs with configurable metrics like exact_match and semantic_similarity.

One API Call Away

Generate production-quality training data with a simple REST API. Full schema validation, quality scoring, and provenance tracking built in.

  • Schema-validated output guaranteed
  • Critic scoring rejects low-quality records
  • Deduplication across your dataset
  • Full provenance metadata per record
api-request.sh
curl -X POST https://api.stackai.app/v1/synthetic/generate \ -H "Authorization: Bearer $API_KEY" \ -H "Content-Type: application/json" \ -d '{ "schema": "instruction_v1", "domain": "customer_support", "count": 100 }'

How It Works

Three steps from configuration to production-ready datasets.

1

Configure Your Job

Choose a schema, specify your domain, set constraints like language and difficulty, and pick your quality tier.

2

Generate with QC

Our pipeline generates data using your chosen LLMs, applies critic scoring, heuristic validation, and safety filtering.

3

Download Dataset

Get your validated dataset in NDJSON or JSON format with full manifest and provenance metadata.

Simple, Transparent Pricing

Pay per record or save with a subscription. No hidden fees.

Economy

$0.50

per 1K records

GPT-4o Mini · Fast prototyping

Popular

Standard

$5

per 1K records

GPT-4o · Balanced quality

Premium

$25

per 1K records

Claude Sonnet 4.5 · Production quality

Ready to Generate?

Start creating high-quality training data in minutes. No credit card required.

Get Started Free