6 Practices that turned AI from prototyper to workhorse (106 PRs in 14 days)
Summary
1. <i>Specs and plans are source code</i>: Specs and plans live in git alongside source code, not in chat history. A new agent reads arch.md for the big picture, then its specific spec. You always know why something was built.<p>2. <i>Three models review every phase</i>: Claude, Gemini, and Codex catch almost entirely different bugs. No single model found more than 55% of issues. If you only review with the model that wrote the code, you're missing half the bugs. 20 bugs caught before shipping. Claude Code found 5 bugs, Gemini and Codex caught another 15, including a severe security issue Claude missed.<p>3. <i>Enforce the process, don't suggest it</i>. A state machine forces Spec → Plan → Implement → Review → PR. The AI can't skip steps. Tests must pass before advancing. AIs don't stick to the plan by themselves, you need rails.<p>4. <i>Annotate, don't edit</i>. Most of the work is writing specs and reviews that guide the code, not hacking at files in an open-ended chat.<p>5. <i>Agents coordinate agents</i>. An architect agent spawns builder agents into isolated git worktrees. You direct the architect; it directs the builders. They message each other async.<p>6. <i>Manage the whole lifecycle</i>. Most AI tools help you write code faster — maybe 30% of the job. The other 70% is planning how, reviewing, integrating, deployment scripts, managing staging vs prod. Have AI run the whole pipeline from spec to PR and beyond.<p><i>Overall result</i>: One engineer able to produce what a team of 3-4 would usually do. Measured 1.2 points better code on a 10 point scale vs claude code. Downsides: takes a lot longer, much more token usage, but still reasonable at $1.60 per PR.<p>We open sourced it: https://github.com/cluesmith/codev More details and raw results: https://cluesmith.com/blog/a-tour-of-codevos/
Tags
Metadata
- Article ID
- #224
- Source
- HackerNews
- Scraped At
- 3/2/2026, 7:10:06 AM
- URL Hash
- dfd885e197f6a4dd…