Stage 03 · BuilderModule 15 of 26~8h

Advanced Reasoning

Tackle hard problems with planning and self-critique.

Most prompts are one-shot: question in, answer out. The hardest problems aren't shaped like that. They need a plan, a few sub-results, a check, and only then an answer. This module is the small set of patterns that work when one prompt isn't enough.

By the end of this module you'll have

A decompose → solve → combine loop for problems too big for one call
A plan-then-execute pattern that makes multi-step reasoning visible and reviewable
A verifier prompt that catches the model's own mistakes before they reach a user

Time: about 1.5 hours for the basics, ~8 hours with all three notebooks.

Prerequisites: Modules 6 (advanced prompting), 8 (tool use), 12 (code generation).

When to reach for this module

Use these patterns when:

The task has clear sub-tasks (research, plan, draft, review).
The answer needs to combine many small lookups (e.g. compare 12 vendors).
A wrong answer is expensive (auth flows, money movement, legal text).
One prompt keeps "drifting" — losing track of constraints partway through.

Don't use them when a one-shot prompt with a structured-output spec already works. These patterns trade latency and cost for reliability; they're not free.

Pattern 1 · Decompose, solve, combine

Hard problems shrink when you break them up. The classic shape:

from anthropic import Anthropic
from dotenv import load_dotenv

load_dotenv()
client = Anthropic()

def call(messages, *, model="claude-sonnet-4-6", max_tokens=600):
    return client.messages.create(model=model, max_tokens=max_tokens, messages=messages).content[0].text

big_question = (
    "We have $50k to spend on dev tooling next year. "
    "We're a 12-person Python/TS team. What should we buy?"
)

sub_questions = [
    "What does a 12-person Python/TS team typically spend on observability annually?",
    "What does the same team typically spend on CI/CD and dev environments?",
    "What does the same team typically spend on AI coding assistants and review tools?",
]

partials = []
for q in sub_questions:
    partials.append(call([{"role": "user", "content": q}]))

combine_prompt = (
    f"ORIGINAL QUESTION:\n{big_question}\n\n"
    + "\n\n".join(f"SUB-ANSWER {i+1}:\n{p}" for i, p in enumerate(partials))
    + "\n\nUsing only the sub-answers above, recommend a budget breakdown. "
      "Total must equal $50k. Show line items with rationale."
)
print(call([{"role": "user", "content": combine_prompt}], model="claude-opus-4-7"))

Key things going on:

Sub-questions can run in parallel — they don't depend on each other. Use asyncio.gather or a thread pool to drop latency.
The combine step uses a stronger model — synthesis is the harder cognitive task, even if each sub-question is easy.
You control the boundary. Each sub-call has a smaller, cleaner context than the original prompt would have.

For decomposition that isn't known in advance, ask Claude to generate the sub-questions first ("List the 3–5 sub-questions you'd answer to solve X. Reply as a JSON list."), then run the loop above.

Pattern 2 · Plan, then execute

Auditable reasoning beats hidden reasoning when stakes are high. Force the plan into its own message so you can read it.

plan_prompt = """\
You are reviewing a database migration plan. Before answering, output a numbered plan
of how you will reason. Do NOT execute the plan yet. Format:

PLAN:
1. ...
2. ...

Migration:
ALTER TABLE users ADD COLUMN tier VARCHAR(16) NOT NULL;
"""

plan = call([{"role": "user", "content": plan_prompt}], model="claude-opus-4-7")
print("PLAN:\n", plan, "\n")

# A human (or a CI bot) can review the plan here.

execute_prompt = (
    f"{plan_prompt}\n\nThat plan is approved. Now execute it and give the final review."
)
result = call([{"role": "user", "content": execute_prompt}], model="claude-opus-4-7", max_tokens=1500)
print("REVIEW:\n", result)

You separated deciding what to do from doing it. Now you can reject a bad plan in seconds without reading a 1,500-word answer.

Pairs well with tool use. Plan in step 1; let Claude call tools to gather facts in step 2. The plan keeps the tool-using loop on rails.

Pattern 3 · Verifier

Self-critique (Module 6) was the same model checking its own work. A verifier is a separate prompt — sometimes a different model — explicitly trying to disprove the answer.

def verify(question: str, answer: str) -> tuple[bool, str]:
    response = call([{
        "role": "user",
        "content": (
            "You are a strict adversarial reviewer. Try to find one concrete reason the "
            "answer below is wrong. If you find one, output:\n"
            "WRONG: <one-sentence reason>\n"
            "Otherwise, output:\nLOOKS GOOD\n\n"
            f"QUESTION:\n{question}\n\nANSWER:\n{answer}"
        ),
    }], model="claude-opus-4-7", max_tokens=200).strip()
    return response.startswith("LOOKS GOOD"), response

ok, verdict = verify(big_question, "Buy GitHub Copilot for everyone, $20k. Done.")
print(verdict)

Verifiers are most powerful when they have something to check against — a tool result, a database lookup, the original spec. A verifier with no ground truth is just second opinion theatre.

Choosing a pattern

Problem shape	Pattern
Big question with clear sub-questions	Decompose, solve, combine
High-stakes reasoning where you need an audit trail	Plan, then execute
Answer needs a quality gate before reaching the user	Verifier
Long sequence of tool calls	Plan-then-execute plus tool use
Many candidates need ranking	Generate N candidates, score with verifier, return the best

Try changing one thing

Run the decomposition example with the same model for the sub-questions and the combine step. Then try Sonnet for sub-questions, Opus for combine. Compare.
Make the verifier vote: run it three times with temperature=0.7. If 2/3 say WRONG, escalate to a human.
Add a "stop" condition to the plan-then-execute pattern: if the plan has more than 8 steps, refuse and ask for a tighter plan first.
Implement best-of-N: generate 4 answers in parallel, run the verifier on each, return the one that survives.

Going deeper: open the notebooks

notebooks/01_introduction.ipynb — decomposition, planning, verification on real tasks (~1.5–2h)
notebooks/02_intermediate.ipynb — best-of-N, voting, hybrid pipelines (~2–3h)
notebooks/03_advanced.ipynb — formal "definition of done", legal/compliance reasoning (~1.5–2.5h)

Module checklist

[ ] You've decomposed at least one problem you couldn't solve in one prompt
[ ] You've watched a model write a plan you then approved before letting it execute
[ ] You've caught a wrong answer with a verifier and not let it ship
[ ] You can name a situation where each of the three patterns is the right call

Stage 03 complete

That's the Builder stage. You can now apply Claude to actual analysis, code, content, and reasoning workloads — and ship them where they survive contact with users. The Pro stage takes those workloads and makes them fast, cheap, and provably good.

Module 16 · Optimization →

This moduleOverviewResourcesExercises

Notebook links: nbviewer uses nbviewer.jupyter.org/url/… (not nbviewer.org) and loads the file via jsDelivr (cdn.jsdelivr.net/gh/…) so Project Jupyter does not hit GitHub’s REST API (which can return 503 when rate-limited). Requires a public repo mirror (Berta-one/claude-guides, branch master). Raw points at the copy on this website (same path under claudeguides.berta.one). Colab and the GitHub file link also need that public mirror. Forks: set CLAUDE_GUIDES_GITHUB_REPO and CLAUDE_GUIDES_BRANCH when building. You can always clone the repo and open .ipynb locally.

Notebooks

Intronbviewer Colab GitHub RawIntermediatenbviewer Colab GitHub RawAdvancednbviewer Colab GitHub RawModule overview

Code

View main_application.py (on this site)GitHub mirror

Next in module →

Full pathStep 15 of 26 · typical ~8h

← Previous module (Guide 14)

Paid & instructor-led courses