Documentation

Quick Start

Generate synthetic training data with a single API call:

terminal
curl -X POST https://api.stackai.app/v1/synthetic/generate \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "schema": "instruction_v1",
    "domain": "customer_support",
    "count": 100,
    "models": {
      "generator": {"provider": "openai", "model": "gpt-4o"}
    }
  }'
Get API Key

Data Schemas

instruction_v1

Instruction-response pairs for supervised fine-tuning.

instruction_v1.json
{
  "system_prompt": "You are a helpful assistant...",
  "instruction": "How do I reset my password?",
  "response": "To reset your password, follow these steps...",
  "metadata": {
    "domain": "customer_support",
    "difficulty": "easy"
  }
}

preference_v1

Paired responses with preference labels for RLHF training.

preference_v1.json
{
  "prompt": "Explain quantum computing",
  "chosen": "Quantum computing uses quantum mechanics...",
  "rejected": "Quantum computing is very complicated...",
  "chosen_score": 8.5,
  "rejected_score": 4.2,
  "reasoning": "The chosen response provides clear explanation..."
}

eval_v1

Evaluation datasets for benchmarking model performance.

eval_v1.json
{
  "input": "What is the capital of France?",
  "ideal_output": "Paris",
  "metrics": ["exact_match", "factual_accuracy"]
}

Models & Quality Tiers

Choose a quality tier based on your needs. Each tier maps to a specific model:

TierProviderModelSchemasPrice/1K
EconomyOpenAIgpt-4o-miniinstruction, eval$0.50
StandardOpenAIgpt-4oinstruction, eval$5.00
PremiumAnthropicclaude-sonnet-4-6instruction, preference, eval$25.00

Generator vs Critic

All schemas require a generator model. The preference_v1 schema also uses a critic model to score and compare responses. Other schemas do not use a critic.

preference_v1 with critic
curl -X POST https://api.stackai.app/v1/synthetic/generate \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "schema": "preference_v1",
    "domain": "code_review",
    "count": 50,
    "models": {
      "generator": {"provider": "anthropic", "model": "claude-sonnet-4-6"},
      "critic": {"provider": "anthropic", "model": "claude-sonnet-4-6"}
    }
  }'

API Reference

POST /v1/synthetic/generate

Create a new generation job

GET /v1/synthetic/jobs/:jobId

Get job status and statistics

GET /v1/synthetic/jobs/:jobId/results

Stream job results

GET /v1/synthetic/jobs/:jobId/results/url

Get presigned download URL

Constraints

Control generation with optional constraints:

ParameterTypeDescription
languagestringISO language code (e.g., "en", "es")
tonestringWriting tone (e.g., "formal", "casual")
difficultystringContent difficulty (e.g., "easy", "hard")
max_tokensnumberMaximum tokens per response (50-4000)