Reviewing AI-written code: a checklist for what to scrutinise

ode generated by an AI agent looks like code, runs like code, and passes the obvious tests like code — but it has its own characteristic failure modes. Reviewing AI output well isn't about distrusting it; it's about knowing which categories of mistake are most likely and looking for them deliberately. The review skill compounds quickly once you know where to focus.

AI Code Review Map

  • Review AI Codefocus on
  • Five mistake categoriescovers
  • 10-minute checklistfollow
  • Easier review setupmake
  • Characteristic failure modes
  • Review in order
  • Codebase investments
Start at the centre, then check failure types and review workflow.

Quick reference

Five categories

Wrong API, dropped requirements, over-engineering, sycophantic patches, wrong reasoning.

Review against the prompt

Walk the diff against your task spec, not the agent's description.

Run the code

Edge cases the agent didn't test. Don't trust the description.

Read the test assertions

Not just the count. Tests passing means little if assertions are wrong.

New deps

Every new import or package is a decision worth confirming.

Codebase hygiene

Strict types, fast tests, pre-commit lint. AI work rewards them.

Code generated by an AI agent looks like code, runs like code, and passes the obvious tests like code — but it has its own characteristic failure modes. Reviewing AI output well isn't about distrusting it; it's about knowing which categories of mistake are most likely and looking for them deliberately. The review skill compounds quickly once you know where to focus.

What you'll learn

The five categories of mistake

AI-written code goes wrong in characteristic ways. Five categories cover most of the real problems:

1. Plausible-but-wrong API usage. The agent calls a method that doesn't exist on the type, or uses an old version of an API, or invents a parameter name that looks reasonable. Type-checked languages catch a lot of this; loosely-typed code does not.

2. Silently dropped requirements. You asked for X and Y, the agent built X, the tests for X pass, and Y is just... missing. The output looks complete because it ran. Skim the diff against your requirements list, not against your assumptions.

3. Over-engineering. The agent adds abstractions, configuration knobs, and "flexibility" you didn't ask for. The code works but is harder to maintain. Watch for new files, new exports, new options.

4. Sycophantic patches. When the agent is corrected, it sometimes patches just enough to make you go away — adding a special case rather than fixing the underlying issue. Look for narrow conditionals that exist only to handle the specific case you flagged.

5. Confidently-wrong reasoning. The agent's explanation of what the code does is wrong, and the code matches the wrong explanation. You read the explanation, agree with it, and never check the actual behaviour. Run the code; don't take the agent's word.

A 10-minute review checklist

  1. 1

    Read the diff

    Read the diff against requirements, not the agent's description.

  2. 2

    Run the code

    Actually run it on a real input you didn't test in the prompt.

  3. 3

    Check the tests

    Read the assertions, not just the test count.

  4. 4

    Look for dependencies

    New imports, new packages, new external calls.

  5. 5

    Spot-check explanation

    Verify it matches the code.

  6. 6

    Read it once more

    A second pass with a clear mind catches things the first pass missed.

Six quick checks for reviewing AI-written code

A workable review pattern, in order:

1. Read the diff against requirements, not the agent's description. Open your original prompt or task spec. Walk through the diff item by item. Anything missing? Anything extra?

2. Run the code. Don't just look at it. Actually run it on a real input you didn't test in the prompt. Edge cases, empty inputs, malformed inputs. AI code passes happy paths reliably and breaks on edges that weren't in the test suite.

3. Check the tests. Are the tests testing the actual behaviour, or are they testing what the agent assumed? Tests written by the agent often pass because the agent designed them to. Read the assertions, not just the test count.

4. Look for new dependencies. New imports, new packages, new external calls. Each one is a decision worth confirming.

5. Spot-check the explanation. Pick one section of the agent's description and verify it matches the code. If the explanation says "we cache the result," check that the code actually caches the result.

6. Read it once more, fresh. A second pass with a clear mind catches things the first pass missed. AI code reads convincingly the first time, less so the second.

This review should take 5–15 minutes for a typical change. If you find yourself reading for an hour, the change is too big — break it down.

Set up the codebase to make review easier

  • Strict type checkingTypeScript with no `any`, full strict mode, and strict null checks catches a huge fraction of plausible-but-wrong API usage automatically.
  • A test suite that runs in secondsIf they run in five seconds, you'll run them constantly.
  • A pre-commit hook that runs lint and typesErrors caught at commit time never make it into a PR.
  • A short AGENTS.md or similarA one-page document about your codebase's conventions helps the agent produce code that doesn't need stylistic review every time.
  • Focused PRsEncourage one-concept PRs.
Five codebase habits that make AI code review easier

A few investments make AI-written code dramatically easier to review:

Strict type checking. TypeScript with no any, full strict mode, and strict null checks catches a huge fraction of plausible-but-wrong API usage automatically. Same for Rust, Kotlin, etc. Languages that allow loose typing pay a tax on AI-assisted code review.

A test suite that runs in seconds. If running the tests takes a minute, you'll skip them. If they run in five seconds, you'll run them constantly. Fast tests are an AI-coding multiplier.

A pre-commit hook that runs lint and types. Errors caught at commit time never make it into a PR. The agent often produces lint errors that a quick npm run lint would catch — automate this.

A short AGENTS.md or similar. A one-page document about your codebase's conventions — naming, file structure, error-handling patterns — helps the agent produce code that doesn't need stylistic review every time.

Focused PRs. Small PRs are easier to review whether the author is human or AI. Encourage one-concept PRs.

None of this is unique to AI-assisted work. AI assistance just rewards the codebase hygiene that you should have anyway, and punishes the absence of it more sharply.

Want a more guided way to practise this?

Set this guide as your objective and the coach turns it into a hands-on session.
Practise in the app

Common questions

How long should a review take?

5–15 minutes for a focused PR. If it's taking longer, the change is too big — push back on the scope rather than spending an hour scrutinising every line.

Should I have the agent review its own code?

Useful as a sanity check but not as a substitute for human review. The agent will catch some issues (lint, typos, obvious bugs) and miss others (over-engineering, requirement drift). Treat it as a free first pass, not a replacement.

How do I review code in a language I don't know well?

Run it. Read the tests. Ask the agent to explain specific lines. Be cautious about merging anything you can't fundamentally understand — the agent is great at producing syntactically correct code in languages it has more practice with than you do, which is its own failure mode.

What's the single highest-leverage check?

Running the code on inputs you didn't mention. Almost everything else can be automated; this one requires a thinking human and catches the most embarrassing bugs.

Bottom line

AI-written code fails in characteristic ways: wrong API usage, dropped requirements, over-engineering, sycophantic patches, and confidently wrong reasoning. A 10-minute review checklist — read the diff against the prompt, run the code, check the tests, scrutinise dependencies, spot-check the explanation — catches the great majority. Set up the codebase to make this review fast.

Next steps

  • On your next AI-assisted PR, walk the diff against your original prompt and note anything dropped or extra.
  • Add a "run on an unexpected input" step to your review habit. Time how long it takes.
  • Audit your codebase for the four hygiene fundamentals (strict types, fast tests, pre-commit lint, short conventions doc). Fix the weakest one this week.

More guides from Taim.io

Guide

Reading a model card without zoning out

Read guide

Guide

What Current AI Models Still Get Wrong, Mid-2026

Read guide

Guide

What C2PA provenance actually proves

Read guide
view all guides

Explore more themes

Work smarter with AIAutomate what slows you downGrow with confidenceFix things that need fixingGet your money workingStay secure in an AI worldLive more sustainablyBuild real softwareBuild skills that compoundBuild habits that hold upSharpen your creative craftSell with intentSpeak with weightRun projects that landBuild a real networkCode with agentsWork for yourselfKeep your judgment sharp
Taim.io app

Continue this topic inside the Taim.io app

You have the guide. Now turn it into practice: set this as your objective and the coach builds a hands-on session around it.