Skip to main content
BDD for the AI Era: Why we are evolving from Gherkin to Contracts

BDD for the AI Era: Why we are evolving from Gherkin to Contracts

5 views

At Scramble, we are deep believers in Behavior-Driven Development (BDD). We love Gherkin. The ability to define software behaviour in plain English using Given, When, Then is one of the most successful bridges ever built between business intent and code.

But as we began moving from human-led development to AI-agent orchestration, we hit a snag. Gherkin was designed for humans who have common sense and shared context. AI agents don’t have that. They operate under strict token constraints and rely entirely on what is in their current context window.

We realized that to build reliable software with LLMs, we needed to evolve the "Story" into a Machine-Validatable Contract.

The Problem: The Token Tax and Implicit Rules When you feed an AI a Gherkin story, you are often paying a high token tax. Each scenario repeats its background and setup. Based on our analysis, a comparable Gherkin spec can conservatively estimate between 1,500 and 2,500 tokens per task.

More importantly, global rules, things like "all error paths must use typed errors" or "never log PII," are often left implicit in BDD because repeating them in every single file is exhausting for humans. For an AI, "implicit" usually results in hallucinations.

Introducing CDS: Contract-Driven Specification CDS is a framework we developed at Scramble to provide a constraint envelope that an AI agent can actually execute with precision. It doesn't replace the spirit of BDD; it just provides the formal rigour required for autonomous agents.

1. The Three-Tier Hierarchy CDS eliminates redundancy through a structured inheritance chain:

Project Contracts: You define your global Invariants (Security, Style, Tech Stack) once for the entire repository.

Plan Contracts: These define the feature scope and the Task Graph (the DAG).

Task Contracts: This is the atomic unit of work for an individual agent.

Our resolver walks this chain and resolves all inherited rules into a single, flat, token-efficient prompt. In our initial orchestrated runs, this hydration system reduced prompts to an average of just 613 tokens.

2. From Test Failures to Breach Reports When an agent fails to meet a requirement, CDS produces a Breach Report. Instead of a vague stack trace, the report identifies the exact clause violated, how severe it is, and a remediation hint. This allows the AI to fix its code based on a logical violation rather than a guess.

3. Deterministic Orchestration CDS plans are Directed Acyclic Graphs (DAGs). Using Kahn’s algorithm, our orchestrator identifies which tasks can run in parallel, ensuring agents work with industrial precision. If a run halts, the system creates a Checkpoint with SHA-256 stale-detection, so you can resume exactly where you left off.

Our Methodology: Putting it to the test We are currently evaluating CDS against our traditional BDD workflows at Scramble. We aren't looking to "beat" Gherkin; we are looking to bridge the gap between human requirements and machine execution. We are tracking how well CDS acceptance criteria match Gherkin quality and using each task as a data point for the framework's evolution.

Help us peer-review the Whitepaper We are releasing the CDS CLI, the Spec, and the MCP Server as Open Source once our testing is complete.

I have authored a technical whitepaper detailing the resolution logic, the agent isolation principles, and the work we've done across 95 task contracts. Want to see the methodology we’re testing before the public release?

👇 Comment "METHOD" below or send me a DM, and I’ll send you the PDF.

Let’s move BDD into the AI era together.

Share this post

Comments (0)

No comments yet. Be the first to comment!

Leave a Comment

Comments are moderated and will appear after approval.

Your email will not be published.