Specification driven development UPDATE: how to write better specs that AI agents actually follow

A few months ago I wrote about specification driven development and why I doubled down on it as AI tools got powerful.

At the time, it felt like the antidote to chaos. Write clear specs first, then let AI generate code against them. Less back-and-forth, fewer misunderstandings, faster delivery.

It worked… mostly. In the beginning.

But as I started giving bigger chunks to AI agents (not just single functions but whole features and workflows) the limitations became obvious. The AI would “follow” the spec in the most literal, sometimes brain dead way, or it would creatively reinterpret ambiguous parts and go off the tracks.

I needed to improve on my specification driven development approach: specs that are actually machine-readable, enforceable and resilient to AI’s strengths and weaknesses.

Here’s what we changed and what’s working now at Code Of Us.

Why my approach wasn’t enough anymore

Classic specs (even the good ones) are written for humans. They assume shared context, common sense and the ability to ask clarifying questions.

AI has none of that reliably. It has:

Massive pattern matching
Zero real understanding
Tendency to fill gaps with the most common patterns it saw in training
Overconfidence when it should say “I need more info”

So, when the specification says “handle errors gracefully,” the AI might add a generic try catch that logs to console and continues, which is technically “handling” the error.

When we say “make it performant,” it might memoize everything, blowing up memory.

We needed specs that remove as much ambiguity as possible while still letting humans stay in control.

The structure that works for AI agents

I now write specs in clear sections. This format has dramatically improved how well AI follows them.

Context & Goals (high level, for humans and AI)
- Business problem
- Success metrics (what does “done” actually mean, performance numbers, user flows, error rates)
- Non goals (what we explicitly are NOT doing in this piece)
Requirements broken into MUST, SHOULD, MAY sections
Data models & invariants
- Exact types or schemas (I paste JSON Schema or TypeScript interfaces when possible)
- Business rules and invariants that must never be violated
Flows & edge cases
- Happy path
- All known edge cases and errors, listed exhaustively
- Expected behavior for each
Technical constraints & architecture rules
- Allowed technologies/libraries
- Forbidden patterns
- How it must integrate with existing system (naming conventions, folder structure, logging, observability,…)
- Performance budgets
Testing requirements
- What tests must be generated
- Coverage expectations
- Specific scenarios that need property-based or snapshot tests
Acceptance criteria written so a non-technical stakeholder can verify

We keep the whole spec in a single Markdown file (or Notion page) that lives with the code. It becomes the source of truth.

Making specs machine friendly

A few tricks I picked up:

Use structured formats where possible. For backend work, include OpenAPI snippets or exact function signatures the AI should implement.
Repeat critical rules in multiple sections. AI sometimes “forgets” details if they appear only once.
Include examples. Positive and negative. AI loves concrete examples.
Version the spec. When I change it, I note why and what changed. This helps when reviewing AI output later.
Add a “Common failure modes” section based on past mistakes. Things like “Do not assume UTC unless specified, respect user timezone settings.”

One of my best recent additions: a short “AI instructions” block at the top that says things like:

“Think step by step before coding.”
“Think like senior systems architect.”
“If anything is ambiguous, list your assumptions explicitly before proceeding.”
“Prioritize readability and maintainability over clever code.”

It actually helps.

The workflow in practice

New feature or change:

Product/PM and tech lead write initial spec (usually 1-3 hours for medium features)
We review it internally for clarity and completeness
Feed the spec to the AI (Claude for planning, Cursor for implementation)
AI generates code + tests
Run the quality gate
Human review focuses only on deviations from spec and things the spec might have missed
Update the spec if we discover new edge cases during implementation

This loop is tight. Most medium tickets now close with far less rework.

For bigger epics, we break them into specified sub tasks that agents can tackle somewhat independently, then integrate carefully.

What changed in my prompting

I stopped long, rambling prompts. Now the prompt is usually: “Here is the complete specification: <pasted full spec>. Implement this exactly. Generate code and tests. Think step by step and list any assumptions.”

Then we iterate with follow-ups only when needed.

The spec does 80% of the work to be honest.

Real results we’re seeing

Fewer “it works on my machine but not in staging” surprises
Onboarding new team members is faster because the spec explains why things are built a certain way
AI agents produce more consistent output across different models
Refactoring is less scary because the spec tells us what the original intent was

One recent project: a complex dashboard with multiple data sources. In the old way, it would have taken weeks of back-and-forth. With Spec 2.0 + quality gate, the AI handled the bulk in days, we caught two important missing invariants in review and the client signed off with minimal changes.

Challenges that remain

AI still struggles with truly novel problems or deep domain knowledge that isn’t well-represented online. Specs can’t fix that and you still need experienced humans for the hard thinking.

Maintaining the specs takes discipline. When pressure mounts, it’s tempting to skip details. We fight that by making the spec part of the definition of done.

Also, overly rigid specs can stifle good ideas. We leave room for the AI (and humans) to suggest improvements, but they must be called out and approved, not snuck in.

Where this is heading

I believe the winning teams in 2026-2027 won’t treat specs as documentation. They’ll treat them as executable contracts between humans and AI.

Some teams are already experimenting with turning specs into formal languages or feeding them directly into agent toolkits. We’re watching that space but staying practical for now.

For most agencies and product teams, a well-structured Markdown spec + strong quality gate is already massive leverage.

If you’re still vibe coding features or giving vague instructions to AI and hoping for the best, you’re leaving a ton of reliability (and money) on the table.

Write the spec first. Make it detailed enough that even a very smart but slightly autistic pattern-matcher can follow it without destroying your architecture.

Then enforce it.

It’s not flashy, but it works. In this new world where anyone can generate code, the ability to specify clearly and verify ruthlessly is quickly becoming the real competitive advantage.