Rethinking Code Review in the Age of Agentic AI

As software development evolves toward agentic AI practices, traditional notions of code review must evolve with it. Conducting a line-by-line human review of agent-generated code is increasingly analogous to manually inspecting the machine code output of a compiler. Just as we do not review compiled binaries when validating programs written in a third-generation language, we should reconsider both the necessity — and practicality — of human review for AI-generated source code.

A Brief Evolution in Agent-Assisted Coding

To understand what humans should write and review, it helps to look briefly at how AI agents have entered the development process.

We began with code assist and completion tools that suggested single lines, blocks, or functions within the source file a developer was editing. The human remained in control — deciding when and how to incorporate the AI’s recommendations. Later, we gave agents the ability to generate entire modules or systems, with humans reviewing the output afterward. This created enormous productivity potential, but also a new problem: agents could now generate code faster than humans could review it. The difference between producing vast amounts of good or bad code increasingly depended on the clarity and precision of our instructions to the AI. Our “prompts” became more deliberate and sophisticated.

In prompt engineering, the human directs the AI to generate code, reviews that output, and iteratively refines prompts until the result is correct. The code itself remains the primary deliverable to review and test in the CI/CD lifecycle. Productivity improves, but only incrementally. The main challenge becomes providing the agent with enough context to succeed.

With context engineering, humans supply broader system understanding: API specifications, database schemas, architecture documents, and other related source code. Equipped with this richer context, agents can generate larger and more reliable components with less human intervention — and in far less time. Yet, the code remains the deliverable to review. Even as the quality improves, human reviewers become the bottleneck, tasked with verifying vast volumes of generated output.

The Shift to Spec-Driven Development

A more advanced stage of this evolution moves beyond prompts and context to focus on specifications — artifacts that define what to build and how to validate it. This approach, known as Spec-Driven Development (SDD), repositions the specification as the central artifact of the system. As Birgitta Böckeler in this MartinFowler.com article describes, “the spec becomes the source of truth for the human and the AI.”

Further, the article defines different SSD approaches currently in practice where SDD can be understood across three stages:

  • Spec-first: the specification is written first, then code is generated or hand-written to meet it.
  • Spec-anchored: the specification persists and evolves alongside the implementation.
  • Spec-as-source: the specification is the source; code is generated automatically and may never need manual editing.

Tools such as Kiro, Spec-Kit, and Tessl support this evolution by making the specification a living artifact — not just documentation or a contract, but the executable driver of generation, validation, and testing.

What and How We Validate

In this paradigm, the role of human engineers shifts from line-by-line debugging to defining and validating precise specifications, constraints, and transformation logic that guide AI generation. The transformation logic includes the models, guidelines, and operational constraints that shape how the agents produce and verify code — along with their own internal quality assurance mechanisms such as independent AI code reviews, implementation-plan validation, and automated test-plan generation.

The central question becomes: how do humans best validate the outcomes of this process? If humans are still required to manually review every line of generated code, the productivity benefits of agentic development evaporate. Instead, the focus of assurance must move upstream.

In SDD, the specification itself contains the verification needed to ensure correctness and completeness through its acceptance criteria. The spec — validated line by line by humans and agents alike — becomes the primary object of rigorous review. Once validated, it drives downstream automated validation: linters, analyzers, independent code-review agents, and continuous testing within the CI/CD pipeline. The human effort ensures the sources of generation are correct; the automated systems ensure the generated artifacts conform.

The Path Forward

The future of software assurance lies not in inspecting the code that AI agents produce, but in governing the process that produces it. By shifting our focus from output review to input validation — ensuring that specifications, transformation logic, and acceptance criteria are sound — we can maintain both quality and scalability in an era of automated software generation.

Learn more about Rethinking Code Review in the Age of Agentic AI

Leave a Reply