Refinery

Iterative prompt improvement powered by BRAID.

Refinery takes a draft prompt and automatically improves it through repeated A/B evaluation cycles. It keeps refining until the improved version consistently outperforms the original — no manual tuning required.

Use Refinery when you want to:

Improve a prompt without manually rewriting it
Find the best-performing variation of a prompt through automated iteration
Apply BRAID-structured reasoning to any prompt

What Is BRAID?

BRAID (Bounded Reasoning for Autonomous Inference and Decisions) is a structured reasoning framework that adds decision flowcharts to prompts. It helps LLMs follow complex logic more reliably by breaking reasoning into explicit steps.

If your draft doesn't already use BRAID, Refinery generates a BRAID flowchart for it automatically before starting the improvement loop.

How It Works

Select a draft — Pick the prompt draft you want to improve.
Configure the model — Choose the execution model, reasoning effort, and text verbosity for prompt generation.
Generate variables — Raison auto-generates sample variable values from the prompt template. You can also enter values manually.
Set output schema (optional) — Define a JSON schema for structured output. Raison can auto-generate one from the prompt content.
Start the refinement — Refinery runs an automated loop:

The Improvement Loop

┌─────────────────────────────────────────────────┐
│  1. Generate BRAID flowchart (if needed)        │
│  2. Evaluate original prompt (baseline score)   │
│  3. Generate improved BRAID version             │
│  4. A/B evaluate: improved vs. original         │
│                    │                            │
│              ┌─────┴─────┐                      │
│              ▼           ▼                      │
│           BRAID       Original                  │
│           wins        wins                      │
│              │           │                      │
│              ▼           ▼                      │
│          Streak      Improver rewrites          │
│            +1        the BRAID prompt           │
│              │           │                      │
│              ▼           ▼                      │
│          3 wins      Loop continues             │
│          in a        with new version           │
│          row?                                   │
│              │                                  │
│              ▼                                  │
│           Done ✓                                │
└─────────────────────────────────────────────────┘

The original prompt is evaluated once and the output is cached as a fixed baseline.
Each iteration evaluates the current BRAID version against the original.
If the BRAID version wins, the consecutive win streak increments.
If the BRAID version loses, an improver model rewrites the BRAID prompt and the streak resets.
The loop stops when the BRAID version wins 3 consecutive times (configurable) or hits the maximum iteration limit.

Monitoring Progress

The detail view shows:

Current prompts — The original draft and the latest BRAID version side by side.
Charts — Scores, cost, tokens, and duration across iterations.
Iteration list — Navigate to any iteration to see outputs, scores, judge feedback, and the BRAID prompt used.
Win streak — Current consecutive wins toward the target.

You can pause a running refinement and resume it later.

Configuration

Setting	Default	Range	Description
Consecutive wins required	3	1–10	How many consecutive wins before the refinement completes
Max iterations	30	1–100	Upper limit on iteration count
Execution model	gpt-5-nano	gpt-5-nano, gpt-5-mini, gpt-5, gpt-5.2	Model used to execute prompts
Reasoning effort	Medium	Varies by model	How much the model reasons before responding
Text verbosity	Medium	Low, Medium, High	Controls response length

The judge and improver models are fixed at gpt-5.2 with high reasoning effort to ensure accurate scoring and high-quality rewrites.

Status	Description
Pending	Refinement created, waiting to start
Generating BRAID	Building the BRAID flowchart for the prompt
Evaluating Original	Scoring the original prompt to establish a baseline
Running	Iterating: generating improvements and evaluating
Paused	Manually paused; can be resumed
Completed	Improvement loop finished (consistent winner found)
Failed	An error occurred during refinement
Max Iterations	Hit the iteration limit without a consistent winner

Billing

Refinery iterations consume AI Prompt Builder (BRAID) credits. Each iteration counts toward your BRAID message quota for the billing period.

Plan	BRAID messages / seat
Free	—
Team	100
Team Plus	1,000
Enterprise	Unlimited

When your BRAID message limit is reached, Refinery iterations will be blocked until the next billing period or until you upgrade your plan.

Access

Refinery is available on Team, Team Plus, and Enterprise plans.

Evaluations

Self-Hosting