Stage 03 · BuilderModule 15 of 26~8h

Advanced Reasoning

Tackle hard problems with planning and self-critique.

← All modules in this stage

Most prompts are one-shot: question in, answer out. The hardest problems aren't shaped like that. They need a plan, a few sub-results, a check, and only then an answer. This module is the small set of patterns that work when one prompt isn't enough.

By the end of this module you'll have

Time: about 1.5 hours for the basics, ~8 hours with all three notebooks.

Prerequisites: Modules 6 (advanced prompting), 8 (tool use), 12 (code generation).


When to reach for this module

Use these patterns when:

Don't use them when a one-shot prompt with a structured-output spec already works. These patterns trade latency and cost for reliability; they're not free.


Pattern 1 · Decompose, solve, combine

Hard problems shrink when you break them up. The classic shape:

from anthropic import Anthropic
from dotenv import load_dotenv

load_dotenv()
client = Anthropic()

def call(messages, *, model="claude-sonnet-4-6", max_tokens=600):
    return client.messages.create(model=model, max_tokens=max_tokens, messages=messages).content[0].text

big_question = (
    "We have $50k to spend on dev tooling next year. "
    "We're a 12-person Python/TS team. What should we buy?"
)

sub_questions = [
    "What does a 12-person Python/TS team typically spend on observability annually?",
    "What does the same team typically spend on CI/CD and dev environments?",
    "What does the same team typically spend on AI coding assistants and review tools?",
]

partials = []
for q in sub_questions:
    partials.append(call([{"role": "user", "content": q}]))

combine_prompt = (
    f"ORIGINAL QUESTION:\n{big_question}\n\n"
    + "\n\n".join(f"SUB-ANSWER {i+1}:\n{p}" for i, p in enumerate(partials))
    + "\n\nUsing only the sub-answers above, recommend a budget breakdown. "
      "Total must equal $50k. Show line items with rationale."
)
print(call([{"role": "user", "content": combine_prompt}], model="claude-opus-4-7"))

Key things going on:

For decomposition that isn't known in advance, ask Claude to generate the sub-questions first ("List the 3–5 sub-questions you'd answer to solve X. Reply as a JSON list."), then run the loop above.


Pattern 2 · Plan, then execute

Auditable reasoning beats hidden reasoning when stakes are high. Force the plan into its own message so you can read it.

plan_prompt = """\
You are reviewing a database migration plan. Before answering, output a numbered plan
of how you will reason. Do NOT execute the plan yet. Format:

PLAN:
1. ...
2. ...

Migration:
ALTER TABLE users ADD COLUMN tier VARCHAR(16) NOT NULL;
"""

plan = call([{"role": "user", "content": plan_prompt}], model="claude-opus-4-7")
print("PLAN:\n", plan, "\n")

# A human (or a CI bot) can review the plan here.

execute_prompt = (
    f"{plan_prompt}\n\nThat plan is approved. Now execute it and give the final review."
)
result = call([{"role": "user", "content": execute_prompt}], model="claude-opus-4-7", max_tokens=1500)
print("REVIEW:\n", result)

You separated deciding what to do from doing it. Now you can reject a bad plan in seconds without reading a 1,500-word answer.

Pairs well with tool use. Plan in step 1; let Claude call tools to gather facts in step 2. The plan keeps the tool-using loop on rails.


Pattern 3 · Verifier

Self-critique (Module 6) was the same model checking its own work. A verifier is a separate prompt — sometimes a different model — explicitly trying to disprove the answer.

def verify(question: str, answer: str) -> tuple[bool, str]:
    response = call([{
        "role": "user",
        "content": (
            "You are a strict adversarial reviewer. Try to find one concrete reason the "
            "answer below is wrong. If you find one, output:\n"
            "WRONG: <one-sentence reason>\n"
            "Otherwise, output:\nLOOKS GOOD\n\n"
            f"QUESTION:\n{question}\n\nANSWER:\n{answer}"
        ),
    }], model="claude-opus-4-7", max_tokens=200).strip()
    return response.startswith("LOOKS GOOD"), response

ok, verdict = verify(big_question, "Buy GitHub Copilot for everyone, $20k. Done.")
print(verdict)

Verifiers are most powerful when they have something to check against — a tool result, a database lookup, the original spec. A verifier with no ground truth is just second opinion theatre.


Choosing a pattern

Problem shape Pattern
Big question with clear sub-questions Decompose, solve, combine
High-stakes reasoning where you need an audit trail Plan, then execute
Answer needs a quality gate before reaching the user Verifier
Long sequence of tool calls Plan-then-execute plus tool use
Many candidates need ranking Generate N candidates, score with verifier, return the best

Try changing one thing


Going deeper: open the notebooks


Module checklist


Stage 03 complete

That's the Builder stage. You can now apply Claude to actual analysis, code, content, and reasoning workloads — and ship them where they survive contact with users. The Pro stage takes those workloads and makes them fast, cheap, and provably good.

Module 16 · Optimization