Advanced Reasoning
Tackle hard problems with planning and self-critique.
← All modules in this stageMost prompts are one-shot: question in, answer out. The hardest problems aren't shaped like that. They need a plan, a few sub-results, a check, and only then an answer. This module is the small set of patterns that work when one prompt isn't enough.
By the end of this module you'll have
- A decompose → solve → combine loop for problems too big for one call
- A plan-then-execute pattern that makes multi-step reasoning visible and reviewable
- A verifier prompt that catches the model's own mistakes before they reach a user
Time: about 1.5 hours for the basics, ~8 hours with all three notebooks.
Prerequisites: Modules 6 (advanced prompting), 8 (tool use), 12 (code generation).
When to reach for this module
Use these patterns when:
- The task has clear sub-tasks (research, plan, draft, review).
- The answer needs to combine many small lookups (e.g. compare 12 vendors).
- A wrong answer is expensive (auth flows, money movement, legal text).
- One prompt keeps "drifting" — losing track of constraints partway through.
Don't use them when a one-shot prompt with a structured-output spec already works. These patterns trade latency and cost for reliability; they're not free.
Pattern 1 · Decompose, solve, combine
Hard problems shrink when you break them up. The classic shape:
from anthropic import Anthropic
from dotenv import load_dotenv
load_dotenv()
client = Anthropic()
def call(messages, *, model="claude-sonnet-4-6", max_tokens=600):
return client.messages.create(model=model, max_tokens=max_tokens, messages=messages).content[0].text
big_question = (
"We have $50k to spend on dev tooling next year. "
"We're a 12-person Python/TS team. What should we buy?"
)
sub_questions = [
"What does a 12-person Python/TS team typically spend on observability annually?",
"What does the same team typically spend on CI/CD and dev environments?",
"What does the same team typically spend on AI coding assistants and review tools?",
]
partials = []
for q in sub_questions:
partials.append(call([{"role": "user", "content": q}]))
combine_prompt = (
f"ORIGINAL QUESTION:\n{big_question}\n\n"
+ "\n\n".join(f"SUB-ANSWER {i+1}:\n{p}" for i, p in enumerate(partials))
+ "\n\nUsing only the sub-answers above, recommend a budget breakdown. "
"Total must equal $50k. Show line items with rationale."
)
print(call([{"role": "user", "content": combine_prompt}], model="claude-opus-4-7"))
Key things going on:
- Sub-questions can run in parallel — they don't depend on each other. Use
asyncio.gatheror a thread pool to drop latency. - The combine step uses a stronger model — synthesis is the harder cognitive task, even if each sub-question is easy.
- You control the boundary. Each sub-call has a smaller, cleaner context than the original prompt would have.
For decomposition that isn't known in advance, ask Claude to generate the sub-questions first ("List the 3–5 sub-questions you'd answer to solve X. Reply as a JSON list."), then run the loop above.
Pattern 2 · Plan, then execute
Auditable reasoning beats hidden reasoning when stakes are high. Force the plan into its own message so you can read it.
plan_prompt = """\
You are reviewing a database migration plan. Before answering, output a numbered plan
of how you will reason. Do NOT execute the plan yet. Format:
PLAN:
1. ...
2. ...
Migration:
ALTER TABLE users ADD COLUMN tier VARCHAR(16) NOT NULL;
"""
plan = call([{"role": "user", "content": plan_prompt}], model="claude-opus-4-7")
print("PLAN:\n", plan, "\n")
# A human (or a CI bot) can review the plan here.
execute_prompt = (
f"{plan_prompt}\n\nThat plan is approved. Now execute it and give the final review."
)
result = call([{"role": "user", "content": execute_prompt}], model="claude-opus-4-7", max_tokens=1500)
print("REVIEW:\n", result)
You separated deciding what to do from doing it. Now you can reject a bad plan in seconds without reading a 1,500-word answer.
Pairs well with tool use. Plan in step 1; let Claude call tools to gather facts in step 2. The plan keeps the tool-using loop on rails.
Pattern 3 · Verifier
Self-critique (Module 6) was the same model checking its own work. A verifier is a separate prompt — sometimes a different model — explicitly trying to disprove the answer.
def verify(question: str, answer: str) -> tuple[bool, str]:
response = call([{
"role": "user",
"content": (
"You are a strict adversarial reviewer. Try to find one concrete reason the "
"answer below is wrong. If you find one, output:\n"
"WRONG: <one-sentence reason>\n"
"Otherwise, output:\nLOOKS GOOD\n\n"
f"QUESTION:\n{question}\n\nANSWER:\n{answer}"
),
}], model="claude-opus-4-7", max_tokens=200).strip()
return response.startswith("LOOKS GOOD"), response
ok, verdict = verify(big_question, "Buy GitHub Copilot for everyone, $20k. Done.")
print(verdict)
Verifiers are most powerful when they have something to check against — a tool result, a database lookup, the original spec. A verifier with no ground truth is just second opinion theatre.
Choosing a pattern
| Problem shape | Pattern |
|---|---|
| Big question with clear sub-questions | Decompose, solve, combine |
| High-stakes reasoning where you need an audit trail | Plan, then execute |
| Answer needs a quality gate before reaching the user | Verifier |
| Long sequence of tool calls | Plan-then-execute plus tool use |
| Many candidates need ranking | Generate N candidates, score with verifier, return the best |
Try changing one thing
- Run the decomposition example with the same model for the sub-questions and the combine step. Then try Sonnet for sub-questions, Opus for combine. Compare.
- Make the verifier vote: run it three times with
temperature=0.7. If 2/3 say WRONG, escalate to a human. - Add a "stop" condition to the plan-then-execute pattern: if the plan has more than 8 steps, refuse and ask for a tighter plan first.
- Implement best-of-N: generate 4 answers in parallel, run the verifier on each, return the one that survives.
Going deeper: open the notebooks
notebooks/01_introduction.ipynb— decomposition, planning, verification on real tasks (~1.5–2h)notebooks/02_intermediate.ipynb— best-of-N, voting, hybrid pipelines (~2–3h)notebooks/03_advanced.ipynb— formal "definition of done", legal/compliance reasoning (~1.5–2.5h)
Module checklist
- [ ] You've decomposed at least one problem you couldn't solve in one prompt
- [ ] You've watched a model write a plan you then approved before letting it execute
- [ ] You've caught a wrong answer with a verifier and not let it ship
- [ ] You can name a situation where each of the three patterns is the right call
Stage 03 complete
That's the Builder stage. You can now apply Claude to actual analysis, code, content, and reasoning workloads — and ship them where they survive contact with users. The Pro stage takes those workloads and makes them fast, cheap, and provably good.