Code Generation
Generate, refactor, and review code with Claude.
← All modules in this stageCode is the language Claude is unusually good at — partly because there's enormous training data, partly because syntax errors give a sharp signal. This module teaches you to lean into that strength without losing the engineering discipline that keeps software working.
By the end of this module you'll have
- A reusable spec → code → tests loop you can apply to any function
- A code review prompt that catches real issues, not nitpicks
- The instinct to ask Claude for failing tests first when fixing a bug
Time: about 1 hour for the basics, ~5 hours with all three notebooks.
Prerequisites: Modules 6 (advanced prompting), 7 (building apps), 10 (conversations).
Three loops worth knowing
| Loop | When to use it |
|---|---|
| Spec → code | Greenfield: you know what you want, don't want to type it |
| Bug → failing test → fix | Debugging: pin down the bug before asking for a fix |
| Code → review | Existing code: you suspect issues but haven't read carefully |
All three follow the same shape: tell Claude what good looks like, then ask for the change.
Loop 1 · Spec → code with tests
Don't ask "write a function that does X". Ask for the function and its tests in one go — and read the tests first.
from anthropic import Anthropic
from dotenv import load_dotenv
load_dotenv()
client = Anthropic()
spec = """\
Function: parse_iso_duration(s: str) -> int
- Input is an ISO-8601 duration like "PT1H30M" or "P1DT2H".
- Returns total seconds as an int.
- Supports days (D), hours (H), minutes (M), seconds (S). Ignore years/months.
- Raises ValueError on bad input.
"""
prompt = f"""\
{spec}
Write:
1. The Python implementation of the function (no external deps, stdlib only).
2. A pytest module testing edge cases: valid, missing-T, all-zero, malformed input.
Reply with two fenced code blocks: implementation, then tests. No prose between them.
"""
response = client.messages.create(
model="claude-sonnet-4-6", max_tokens=1500,
messages=[{"role": "user", "content": prompt}],
)
print(response.content[0].text)
When you read the result, read the tests first. They tell you what the model thinks the spec means. If the tests cover what you expected, the implementation usually does too.
Loop 2 · Bug → failing test → fix
The fastest way to fix an intermittent bug is to make it deterministic. Ask Claude for a failing test before asking for a fix.
bug_report = """\
Our `slugify(s)` function crashes on strings containing the em dash "—" (U+2014).
Expected: emit '-' and continue. Actual: UnicodeEncodeError.
Code is in slug.py: ...
"""
prompt_one = f"{bug_report}\n\nWrite a single failing pytest that reproduces this bug. No commentary."
prompt_two = "Now propose the smallest change to slug.py that makes that test pass. Show a unified diff."
Two calls; two prompts. Don't merge them — the test acts as a contract that the patch must satisfy.
Loop 3 · Code review
Hand Claude code with explicit review criteria and you'll get useful feedback. Hand it code with no rubric and you'll get a list of cosmetic suggestions.
code = open("payments.py").read()
review_prompt = f"""\
Review the code below as a senior engineer. Focus on, in this order:
1. Correctness — bugs, race conditions, edge cases
2. Security — injection, privilege escalation, secrets handling
3. Failure modes — what happens on partial success, network errors
4. Tests that are missing
Do NOT comment on style, naming, or formatting unless it causes a real problem.
For each issue, output: SEVERITY (high|medium|low), the file/line, the issue, the fix.
CODE:
{code}
"""
response = client.messages.create(
model="claude-opus-4-7", # use Opus for high-stakes review
max_tokens=2000,
messages=[{"role": "user", "content": review_prompt}],
)
print(response.content[0].text)
Three things make this work:
- Ranked focus areas. "In this order" tells Claude what matters.
- Explicit anti-instruction. "Do NOT comment on style" suppresses noise.
- Structured output. Severity + location + issue + fix is reviewable in seconds.
What Claude is unusually good at
- Boilerplate: SQL migrations, OpenAPI definitions, dataclass scaffolds, test fixtures.
- Translations between languages: Python ↔ TypeScript, SQL dialects, regex flavours.
- Reading unfamiliar code and explaining what it does — fast onboarding.
- Writing tests for code you describe — often catches edge cases you missed.
- Refactors with a clear target shape — "extract this into a helper" is reliable; "make this nicer" is not.
Where you stay in charge
- Choices the spec doesn't make. API design, naming conventions, dependency choices.
- Anything touching production data without a code-review gate.
- Security-sensitive paths — auth, crypto, input validation. Always review.
- Performance work — Claude can rewrite to "be faster" but you need benchmarks before merging.
Try changing one thing
- Add
"Use only the standard library."to the spec. Notice how dependency-free the code becomes. - Run the same review prompt on
claude-haiku-4-5-20251001. Compare findings — Haiku misses subtleties; sometimes that's fine and the cost saving is real. - Ask for tests before implementation. Claude will sometimes catch ambiguity in the spec just by trying to write the test.
- Wrap the review loop in a script that runs over every file in
git diff origin/main...HEAD. You've just built a code-review bot — Module 14 turns that into something productionish.
Going deeper: open the notebooks
notebooks/01_introduction.ipynb— spec-driven generation, review loops on small files (~1.5–2h)notebooks/02_intermediate.ipynb— Claude as a refactoring partner, language translations (~2–3h)notebooks/03_advanced.ipynb— security-minded review, large-codebase strategies (~1.5–2.5h)
Module checklist
- [ ] You've generated a function plus its tests in one round trip
- [ ] You've reproduced a bug as a test before asking for a fix
- [ ] You've run a code review with explicit criteria and ignored stylistic noise
- [ ] You can name two things you would never let Claude touch without review
Next module
Module 13 · Content Creation — applying the same "spec + iterate" discipline to long-form prose.