Distil CLI

Train specialized small language models (SLMs) using the Distil Labs platform. The platform uses knowledge distillation to create models up to 70x smaller than large models while maintaining comparable accuracy.

What you can help with depends on the environment:

| Environment | Capabilities | |-------------|--------------| | Claude Code | Full end-to-end workflow: task selection, data preparation, running CLI commands, training, and deployment | | Claude Browser | Task selection and data preparation only: help users choose the right task type and create job_description.json, config.yaml, train.csv, test.csv files. User runs CLI commands themselves. |

Instructions

Prerequisites

Install the CLI and authenticate:

# Install
curl -fsSL https://cli-assets.distillabs.ai/install.sh | sh

# Authenticate (if not already logged in)
distil login

Other auth commands: distil signup (create account), distil whoami (check user), distil logout

Core Workflow

Step 1: Create a Model

distil model create my-model-name
# Returns: Model ID (use this for all subsequent commands)

List all models with distil model list.

Step 2: Task Selection

Choosing the right task type is crucial. Help the user by asking what they need the model to do:

| If the user needs to... | Choose | Data Guide | |-------------------------|--------|------------| | Solve problems by returning text answers (QA or text transformations) | Question Answering | data-question-answering.md | | Assign text to categories from a fixed set | Classification | data-classification.md | | Generate structured tool/API calls from natural language | Tool Calling | data-tool-calling.md | | Answer questions given context (requires existing knowledge database) | Open Book QA (RAG) | data-qa-rag.md | | Answer questions from knowledge learned during training | Closed Book QA | data-qa-closed.md |

Question Answering — The most general task type. Solves problems by returning text answers. Use for question answering, text transformations, or any task that takes text input and produces text output.

Examples: "What is the termination clause?" from contracts, "Summarize this document", "Extract the key dates from this email", "Reformat this data as JSON"

Classification — Assigns text to one category from a fixed set. Use when you need deterministic categorization, not open-ended generation.

Examples: Intent detection, content moderation (toxic/safe), sentiment analysis, ticket triage by department

Tool Calling — Maps natural language to structured function calls with correct parameters. Use when routing user requests to backend APIs/services.

Examples: Voice assistant commands → smart home APIs, chatbot intents → CRM operations, natural language → database queries
Note: Only supports Llama3 family student models

Open Book QA (RAG) — Answers questions using provided context passages. Only use this if you already have a well-structured knowledge database with context chunks. The model expects retrieved context to be provided at inference time.

Examples: Customer support from product docs, legal document analysis, technical documentation assistants
When to pick: You already have a RAG pipeline with good retrieval, and want the model to answer strictly from provided context
Not appropriate if: You don't have a knowledge database yet, or your context chunks are poorly structured

Closed Book QA — Answers questions from knowledge learned during training. The user provides a knowledge database and the model learns the knowledge from it during training—no context needed at inference.

Examples: FAQ bots, domain-specific knowledge assistants
When to pick: You want knowledge "baked into" the model, users shouldn't need to provide context, or RAG retrieval is difficult for your use case

Step 3: Data Preparation

IMPORTANT: Before creating any files, you MUST read the data preparation guide for the selected task type. Each task has specific requirements for file formats and content.

| Task Type | Data Guide to Read First | |-----------|--------------------------| | Question Answering | data-question-answering.md | | Classification | data-classification.md | | Tool Calling | data-tool-calling.md | | Open Book QA (RAG) | data-qa-rag.md | | Closed Book QA | data-qa-closed.md |

After reading the appropriate guide, help the user prepare these files:

| File | Required | Description | |------|----------|-------------| | job_description.json | Yes | Task objectives and configuration | | train.csv or train.jsonl | Yes | 20+ labeled (question, answer) pairs | | test.csv or test.jsonl | Yes | Held-out evaluation set | | config.yaml | Yes | Task type, student model, and teacher model (see config.md for options) | | unstructured.csv | No | Domain text for synthetic data generation |

Note on config.yaml: Always ask the user which student model they want to train and which teacher model to use. See config.md for the full list of available models. If the user is unsure, recommend Llama-3.2-1B-Instruct as the student and openai.gpt-oss-120b as the teacher.

Step 4: Upload Data

distil model upload-data <model-id> --data ./my-data-folder

Step 5: Teacher Evaluation

Before training, validate whether a large language model can solve your task. This serves as:

Feasibility check: If the teacher can solve the task, the student model will learn it effectively
Performance benchmark: Teacher accuracy predicts expected SLM performance

distil model run-teacher-evaluation <model-id>
distil model teacher-evaluation <model-id>  # Check status/results

Interpreting results:

High accuracy → Task is well-defined, proceed to training
Low accuracy → Revise task description, improve data quality, or check for inconsistencies

For details on evaluation metrics (LLM-as-a-Judge, Exact-Match, ROUGE-L, tool_call_equivalence, etc.), see metrics.md.

Step 6: Model Training

Train your SLM using knowledge distillation:

Teacher model generates synthetic training data from your examples
Synthetic data is validated for diversity and quality
Student model learns from synthetic data with task-specific optimization

distil model run-training <model-id>
distil model training <model-id>  # Check status

Training takes several hours. Statuses: JOB_PENDING, JOB_RUNNING, JOB_SUCCEEDED, JOB_FAILED

If SLM performance is below expectations:

Increase the number of training examples
Make task description more specific
Modify config parameters (e.g., increase epochs)
Try a larger student model

When training completes, compare SLM metrics against teacher metrics. For help interpreting results, see metrics.md.

Step 7: Download and Deploy

distil model download <model-id>

For local deployment with Ollama or vLLM, read deployment.md.

CLI Reference

# List all models
distil model list

# Show specific model details
distil model show <model-id>

# Download uploaded data files
distil model download-data <model-id>

# JSON output for scripting
distil model list --output json

Command aliases: distil model = distil models = distil m

Supported Models

Student Models (what you train): Llama 3.2 (1B, 3B), Llama 3.1 8B, SmolLM2 (135M, 1.7B), Gemma 3 (270M, 1B, 4B), Qwen3 (0.6B, 1.7B, 4B, 8B), IBM Granite 3.1/3.3 8B

Teacher Models (used for distillation): DeepSeek R1, V3.1, Qwen3 (235B, 480B), Llama 3.1 405B, 3.3 70B, GPT OSS (20B, 120B)

Troubleshooting

Check model status:

distil model show <model-id>

Training failed:

Check teacher evaluation results first
Verify data format matches task type requirements
Ensure sufficient training examples (20+ minimum)

Authentication issues:

distil logout
distil login

Platform Support

Linux (x86_64): Yes
macOS (Intel): Yes
macOS (Apple Silicon): Yes
Windows: Use WSL or REST API

Examples

Claude Code (End-to-End Workflow)

In Claude Code, you can run CLI commands directly.

Example 1: Train a classification model

User: Help me train a model to classify customer support tickets into categories

Claude: I'll help you train a classification model. Let me start by creating a model:

> distil model create support-ticket-classifier

Now let's prepare your data. I'll read `data-classification.md` for the format...

[Claude creates job_description.json, config.yaml, and helps format train.csv/test.csv]

Data files ready. Uploading:

> distil model upload-data abc123 --data ./support-classifier-data

Let's validate with teacher evaluation first:

> distil model run-teacher-evaluation abc123

[Checks status, then proceeds to training when ready]

> distil model run-training abc123

Example 2: Check training status and download

User: What's the status of my model abc123?

Claude: Let me check:

> distil model show abc123
> distil model training abc123

Your training completed successfully! Downloading the model:

> distil model download abc123

Model downloaded. To run it locally with Ollama:

> ollama create support-classifier -f model/Modelfile
> ollama run support-classifier

Example 3: Debug a failed training

User: My training failed, can you help?

Claude: Let me investigate:

> distil model show abc123
> distil model training abc123

I see the issue - teacher evaluation accuracy was low. Let me check your data:

> distil model download-data abc123

[Reviews data files and suggests improvements]

Claude Browser (Data Preparation Only)

In the browser, Claude cannot run CLI commands but can help prepare all data files. The user then runs commands themselves.

Example 1: Prepare classification data

User: I want to build a sentiment classifier for product reviews

Claude: I'll help you prepare the data files. Based on `data-classification.md`, you need:

1. **job_description.json:**
{
  "task_description": "Classify product reviews by sentiment",
  "classes_description": {
    "positive": "Reviews expressing satisfaction, praise, or recommendation",
    "negative": "Reviews expressing dissatisfaction, complaints, or warnings",
    "neutral": "Reviews that are balanced or purely factual"
  }
}

2. **config.yaml:**
task: classification
student_model_name: Llama-3.2-1B-Instruct
...

3. **train.csv format:**
question,answer
"This product exceeded my expectations!",positive
"Broke after two days, waste of money",negative
...

Once you have 20+ examples in train.csv and test.csv, run:
$ distil model create sentiment-classifier
$ distil model upload-data <model-id> --data ./my-data
$ distil model run-teacher-evaluation <model-id>

Example 2: Prepare RAG/QA data

User: I need a model that answers questions about our product documentation

Claude: For RAG-style QA, I'll prepare files per `data-qa-rag.md`:

1. **job_description.json:**
{
  "task_description": "Answer customer questions using provided documentation excerpts"
}

2. **config.yaml:**
task: question-answering-open-book
...

3. **train.csv format** (note the context column):
question,context,answer
"How do I reset my password?","Password Reset: Click forgot password...","Click 'Forgot Password' on the login page..."
...

[Provides complete file templates]

Save these files and run the CLI commands to train.

Example 3: Prepare tool calling data

User: I want a model that can call our internal APIs based on user requests

Claude: For tool calling, I'll prepare files per `data-tool-calling.md`:

1. **job_description.json** with your tool schemas:
{
  "task_description": "Select the correct API endpoint based on user request",
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "get_order_status",
        "description": "Look up order status by order ID",
        "parameters": {...}
      }
    },
    ...
  ]
}

Note: Tool calling requires Llama3 family models.

2. **train.csv:**
question,answer
"Where is my order #12345?","{""name"": ""get_order_status"", ""parameters"": {""order_id"": ""12345""}}"
...

[Provides complete templates]

Distil Cli

Skill Content