Reviewing AI-written code: a checklist for what to scrutinise

Written by Lars Nyman • May 9, 2026 • Updated June 4, 2026 • How we review

• 5 min read

ode generated by an AI agent looks like code, runs like code, and passes the obvious tests like code, but it has its own characteristic failure modes. Reviewing AI output well isn't about distrusting it; it's about knowing which categories of mistake are most likely and looking for them deliberately. The review skill compounds quickly once you know where to focus.

AI Code Review Map

Review AI Codefocus on
Five mistake categoriescovers
10-minute checklistfollow
Easier review setupmake
Characteristic failure modes
Review in order
Codebase investments

Start at the centre, then check failure types and review workflow.

Article mapOpen the visual summary

AI Code Review Map

Review AI Codefocus on
Five mistake categoriescovers
10-minute checklistfollow
Easier review setupmake
Characteristic failure modes
Review in order
Codebase investments

Start at the centre, then check failure types and review workflow.

Table of Contents10 sections

What you'll learn· 1 min
The five categories of mistake· 1 min
A 10-minute review checklist· 2 min
Read the diff
Run the code
Check the tests
Look for dependencies
Spot-check explanation
Read it once more
Set up the codebase to make review easier· 1 min

What you'll learn

The five categories of mistake AI agents make most often

A review checklist that catches them in under 10 minutes

How to set up the codebase to make AI-written code easier to review

The five categories of mistake

AI-written code goes wrong in characteristic ways. Five categories cover most of the real problems:

1. Plausible-but-wrong API usage. The agent calls a method that doesn't exist on the type, or uses an old version of an API, or invents a parameter name that looks reasonable. Type-checked languages catch a lot of this; loosely-typed code does not.

2. Silently dropped requirements. You asked for X and Y, the agent built X, the tests for X pass, and Y is just... missing. The output looks complete because it ran. Skim the diff against your requirements list, not against your assumptions.

3. Over-engineering. The agent adds abstractions, configuration knobs, and "flexibility" you didn't ask for. The code works but is harder to maintain. Watch for new files, new exports, new options.

4. Sycophantic patches. When the agent is corrected, it sometimes patches just enough to make you go away; adding a special case rather than fixing the underlying issue. Look for narrow conditionals that exist only to handle the specific case you flagged.

5. Confidently-wrong reasoning. The agent's explanation of what the code does is wrong, and the code matches the wrong explanation. You read the explanation, agree with it, and never check the actual behaviour. Run the code; don't take the agent's word.

A 10-minute review checklist

1
Read the diff
Read the diff against requirements, not the agent's description.
2
Run the code
Actually run it on a real input you didn't test in the prompt.
3
Check the tests
Read the assertions, not just the test count.
4
Look for dependencies
New imports, new packages, new external calls.
5
Spot-check explanation
Verify it matches the code.
6
Read it once more
A second pass with a clear mind catches things the first pass missed.

Six quick checks for reviewing AI-written code

A workable review pattern, in order:

1. Read the diff against requirements, not the agent's description. Open your original prompt or task spec. Walk through the diff item by item. Anything missing? Anything extra?

2. Run the code. Don't just look at it. Actually run it on a real input you didn't test in the prompt. Edge cases, empty inputs, malformed inputs. AI code passes happy paths reliably and breaks on edges that weren't in the test suite.

3. Check the tests. Are the tests testing the actual behaviour, or are they testing what the agent assumed? Tests written by the agent often pass because the agent designed them to. Read the assertions, not just the test count.

4. Look for new dependencies. New imports, new packages, new external calls. Each one is a decision worth confirming.

5. Spot-check the explanation. Pick one section of the agent's description and verify it matches the code. If the explanation says "we cache the result," check that the code actually caches the result.

6. Read it once more, fresh. A second pass with a clear mind catches things the first pass missed. AI code reads convincingly the first time, less so the second.

This review should take 5-15 minutes for a typical change. If you find yourself reading for an hour, the change is too big; break it down.

Set up the codebase to make review easier

Strict type checkingTypeScript with no `any`, full strict mode, and strict null checks catches a huge fraction of plausible-but-wrong API usage automatically.
A test suite that runs in secondsIf they run in five seconds, you'll run them constantly.
A pre-commit hook that runs lint and typesErrors caught at commit time never make it into a PR.
A short AGENTS.md or similarA one-page document about your codebase's conventions helps the agent produce code that doesn't need stylistic review every time.
Focused PRsEncourage one-concept PRs.

Five codebase habits that make AI code review easier

A few investments make AI-written code dramatically easier to review:

Strict type checking. TypeScript with no any, full strict mode, and strict null checks catches a huge fraction of plausible-but-wrong API usage automatically. Same for Rust, Kotlin, etc. Languages that allow loose typing pay a tax on AI-assisted code review.

A test suite that runs in seconds. If running the tests takes a minute, you'll skip them. If they run in five seconds, you'll run them constantly. Fast tests are an AI-coding multiplier.

A pre-commit hook that runs lint and types. Errors caught at commit time never make it into a PR. The agent often produces lint errors that a quick npm run lint would catch; automate this.

A short AGENTS.md or similar. A one-page document about your codebase's conventions; naming, file structure, error-handling patterns; helps the agent produce code that doesn't need stylistic review every time.

Focused PRs. Small PRs are easier to review whether the author is human or AI. Encourage one-concept PRs.

None of this is unique to AI-assisted work. AI assistance just rewards the codebase hygiene that you should have anyway, and punishes the absence of it more sharply.

Quick reference

Five categories

Wrong API, dropped requirements, over-engineering, sycophantic patches, wrong reasoning.

Review against the prompt

Walk the diff against your task spec, not the agent's description.

Run the code

Edge cases the agent didn't test. Don't trust the description.

Read the test assertions

Not just the count. Tests passing means little if assertions are wrong.

New deps

Every new import or package is a decision worth confirming.

Codebase hygiene

Strict types, fast tests, pre-commit lint. AI work rewards them.

Want a more guided way to practice this?

Use quick checks, feedback, and a cleaner retry.

Practice this guide

Common questions

How long should a review take?

5-15 minutes for a focused PR. If it's taking longer, the change is too big; push back on the scope rather than spending an hour scrutinising every line.

Should I have the agent review its own code?

Useful as a sanity check but not as a substitute for human review. The agent will catch some issues (lint, typos, obvious bugs) and miss others (over-engineering, requirement drift). Treat it as a free first pass, not a replacement.

How do I review code in a language I don't know well?

Run it. Read the tests. Ask the agent to explain specific lines. Be cautious about merging anything you can't fundamentally understand; the agent is great at producing syntactically correct code in languages it has more practice with than you do, which is its own failure mode.

What's the single highest-leverage check?

Running the code on inputs you didn't mention. Almost everything else can be automated; this one requires a thinking human and catches the most embarrassing bugs.

Bottom line

AI-written code fails in characteristic ways: wrong API usage, dropped requirements, over-engineering, sycophantic patches, and confidently wrong reasoning. A 10-minute review checklist: read the diff against the prompt, run the code, check the tests, scrutinise dependencies, spot-check the explanation; catches the great majority. Set up the codebase to make this review fast.

Next steps

On your next AI-assisted PR, walk the diff against your original prompt and note anything dropped or extra.
Add a "run on an unexpected input" step to your review habit. Time how long it takes.
Audit your codebase for the four hygiene fundamentals (strict types, fast tests, pre-commit lint, short conventions doc). Fix the weakest one this week.

Reviewing AI-written code: a checklist for what to scrutinise

What you'll learn

The five categories of mistake

A 10-minute review checklist

Read the diff

Run the code

Check the tests

Look for dependencies

Spot-check explanation

Read it once more

Set up the codebase to make review easier

Quick reference

Five categories

Review against the prompt

Run the code

Read the test assertions

New deps

Codebase hygiene

Want a more guided way to practice this?

Common questions

How long should a review take?

Should I have the agent review its own code?

How do I review code in a language I don't know well?

What's the single highest-leverage check?

Bottom line

Next steps

Share this guide

Next in this theme

Agent-friendly repo conventions: AGENTS.md, structure, tests

More agentic coding guides

Reviewing AI‑Written Code: What To Read First

Planning a task the agent can actually finish

Agent-friendly repo conventions: AGENTS.md, structure, tests

Continue this topic inside the Taim.io app