By Arpan Jhaveri in ai coding — 11 Jan 2026

How I ship production code with AI without writing any code.

I built a system to 10x my development velocity as a solo founder. It's a structured, gate-enforced model that makes AI-driven software delivery reliable, auditable, and self-improving. This is how.

AI coding tools are getting frighteningly good at writing code. Just do a search on LinkedIn and you will find senior engineers in the trenches talking about how they are moving on from writing code to orchestrating systems.

There's the now-infamous post by Boris Cherny where he proclaimed that Claude Code is writing 100% of his code, to Linus Torvalds' use of Google Antigravity to write code. It's like people came back from vacation at the end of 2025, or perhaps used their down time to more deeply explore, and collectively discovered that writing code by hand is effectively dead.

A massive acceleration in software development, and likely a vast majority of knowledge work is upon us.

I have been leaning into spec-driven development to build OnePerfectSlice, and a lot of this rings true. Based on my experience, AI is frighteningly good at writing production grade code. The job of a software engineer has shifted from writing code, to AI-assisted code, to having AI just do it for you.

Underneath this shift though is a well designed system. It takes a lot more than just a few prompts and some "vibes."

This post describes the system I built to 10x my development velocity as a solo founder: a structured, gate-enforced development model that makes AI-driven software delivery reliable, auditable, and self-improving.

It's my take on Spec Driven Development (SDD). At its core, it treats software development as a state machine with explicit stages, hard gates, adversarial review, and irreversible transitions.

This is how.

The Core Philosophy: Development as a State Machine

After catching myself repeating the same set of steps and making the same set of errors, I thought: there has to be a better way, and this is still too slow.

My background isn’t engineering. It’s marketing, or more broadly go-to-market. And one thing every competent GTM leader knows is that execution does not scale without structure. You need a sales process. You need playbooks that define what happens at each stage. And you need mechanisms to verify that both the process and the playbooks are actually being followed.

Once I started looking at it through that lens, the shift was obvious: the workflow itself was the product I needed to iterate on.

So I wrote it down. I codified it in markdown files for Claude Code. I added slash commands to make it repeatable. And instead of trying to get better at prompting, I focused on building a system that could enforce correctness when execution was delegated to AI.

What emerged was a system with a set of core principles.

The Principles

An enforced process with binding gates: Work moves through defined stages. Each transition has explicit entry and exit criteria. If those criteria aren’t met, the work does not advance. No exceptions.
Artifacts as source of truth: Each stage produces concrete outputs that prove the work occurred, can be audited, and meets a defined quality bar. State is inferred from evidence, not intent.
Adversarial reviews: The agent that produces work cannot approve it. Every transition is evaluated by an independent reviewer with an explicit “prove this is wrong” mandate. I added this after repeatedly watching Opus 4.5 and Codex miss edge cases when left to self-review.
Layered documentation: I separated document by intent to approximate how a senior-engineer might work.
- Process defines what's valid (stages, gates, transitions)
- Conventions define how things are done (formats, patterns, gotchas)
- Principles encode judgment (tradeoffs, when to break rules, how to think)
Recursive self-improvement: Whenever the system misses something, I I log it. If it happens three times, I update either Conventions or Principles or both so the same mistakes shouldn't happen again.

The Undiscovered Country

While iterating on this system, I noticed something unexpected: I started spending more time improving the workflow than writing code. I had somehow boiled the entire workflow down to a small set of slash commands in Claude Code.

Instead of writing code, development started to feel like operating a machine. I wasn't thinking about syntax or files. I was discussing specs with Claude Code, inspecting outputs, and deciding whether the system was allowed to advance.

The job had shifted from writing to orchestrating.

The System In Practice

My entire workflow now runs through four slash commands in Claude Code:

/scope — Define a new project. Takes a problem statement and produces requirements, deliverables, and success criteria. Output: requirements.md and deliverables.md in a new projects folder that lives inside the repo.

/spec D# — Write a specification for a deliverable. This is where most of the thinking happens. The spec includes the full implementation code — not pseudocode, not descriptions, but actual copy-pasteable code that's been reviewed before a single line hits the codebase. Output: a spec file with proposed implementation, reviewed and approved.

/code — Implement a spec. At this point, the agent isn't improvising. It's applying reviewed code from the spec, running tests, and producing a PR. The creative work already happened in /spec.

/redteam — Adversarial review. A separate agent with one job: prove the work is wrong. Runs at every gate — after spec, after code, before merge.

I can write specs in parallel. They can be implemented in parallel. Everything follows a highly standard process and playbook. The bottleneck is literally just my oh so human ability to co-author specs, accepting risks, and decide what to build next.

Things I Learned

Three lessons from this exercise stand out more than everything else.

Adversarial review matters more than I expected. One review wasn't enough. I ended up with three. Each review is run by a separate agent with the explicit mandate: prove this is wrong. The first two happen during /spec. The third happens during /code. By the time something is ready to merge, it's been adversarially reviewed three times.

After the initial spec: catches architectural issues before any code is written
After writing the code in the spec: catches implementation issues while it's still cheap to change, and it's easy to see and read through the full scope of changes
Before a PR: catches drift between spec and reality

Scale comes from the standards. The system only works because everything is uniform. Same spec format. Same gate criteria. Same review process. Same folder structure.

This felt constraining until I realized it's that it enabled massive parallelism. I can have multiple specs in flight because they all follow the same structure. I can review them quickly because I know exactly where to look. Multiple Claude Code sessions working on different git worktrees can implement them reliably because there's no ambiguity about what "done" means.

Spec size is a first class constraint. Larger specs tend to result in more downstream errors that propagate, and a bigger headache later. In the system here, we check the spec for: (a) # of files created, (b) # of files modified, (c) # of acceptance criteria generated, (d) number of requirements which are in EARS notation.

Ironically, my own context switching dropped dramatically as a result. The more I comfortably abstracted, the more I personally could work through.