Stage 04 · ProModule 18 of 26~10h

Multi-Agent Systems

Coordinate multiple Claude agents on complex tasks.

Sometimes the right answer comes from one prompt. Sometimes it comes from a small team of specialised prompts that pass work between them. This module shows you the simplest useful version, and the failure modes you'll meet the moment you scale beyond it.

By the end of this module you'll have

A working planner → workers → synthesiser pipeline solving a real task
A clear sense of what "agent" actually means here (spoiler: it's a prompt with a job)
The instinct to ask "would one well-shaped prompt have done this cheaper?" before adding another agent

Time: about 2 hours for the basics, ~10 hours with all three notebooks.

Prerequisites: Modules 8 (tool use), 10 (conversations), 15 (advanced reasoning), 14 (production patterns).

What "agent" really means

Strip the marketing away and an agent is a prompt with a defined role, optional tools, and a defined output. A multi-agent system is a few of those passing structured messages around.

This is freeing: you don't need a framework. A dictionary of role-prompts and a Python loop is a multi-agent system. We'll build one in 60 lines.

The simplest useful pattern: planner → workers → synthesiser

Save as agents.py:

from concurrent.futures import ThreadPoolExecutor
from anthropic import Anthropic
from dotenv import load_dotenv

load_dotenv()
client = Anthropic()

def call(role_prompt: str, user: str, *, model="claude-sonnet-4-6", max_tokens=600) -> str:
    return client.messages.create(
        model=model, max_tokens=max_tokens, system=role_prompt,
        messages=[{"role": "user", "content": user}],
    ).content[0].text

PLANNER = (
    "You are a planner. Given a research task, output 3–5 sub-questions whose answers "
    "would together solve it. Reply with a JSON list of strings only."
)

WORKER = (
    "You are a focused researcher. Answer the single sub-question concisely (≤120 words). "
    "If you don't know, say so explicitly. Cite no sources you cannot name."
)

SYNTHESISER = (
    "You are an editor. Given an original task and a set of researcher answers, "
    "produce a single coherent answer to the original task. Note any contradictions "
    "between researchers and resolve them or flag them."
)

import json
def solve(task: str) -> str:
    raw = call(PLANNER, task, model="claude-haiku-4-5-20251001", max_tokens=300)
    sub_questions = json.loads(raw[raw.find("[") : raw.rfind("]") + 1])

    with ThreadPoolExecutor(max_workers=5) as pool:
        answers = list(pool.map(lambda q: call(WORKER, q), sub_questions))

    bundle = f"TASK:\n{task}\n\n" + "\n\n".join(
        f"SUB-Q: {q}\nA: {a}" for q, a in zip(sub_questions, answers)
    )
    return call(SYNTHESISER, bundle, model="claude-opus-4-7", max_tokens=900)

if __name__ == "__main__":
    print(solve("How should a 12-person engineering team think about adopting Claude in 2026?"))

What you should notice running this:

The planner is cheap (Haiku) — it's just structuring work.
The workers run in parallel — that's the whole latency win.
The synthesiser is your strongest model — synthesis is the real cognitive task.
Total cost is roughly the sum of all calls; the win is quality and parallel latency, not cost.

Why this beats "one giant prompt"

A single prompt that asks "research this and write me an answer" has problems multi-agent neatly avoids:

Problem	How agents fix it
Context window pressure	Each worker only sees its sub-question
Drift mid-answer	Each role has a tight, single-purpose prompt
No parallelism	Workers fan out concurrently
Hard to audit	You can read the plan, the workers, the synthesis separately
Hard to swap a piece	Change one role-prompt without touching the others

It's the Unix philosophy applied to LLMs: small programs, doing one thing, piped together.

Where multi-agent goes wrong

Real failure modes you'll meet:

Cost balloons. Every agent is a call. Five agents is five times the spend. Always measure before scaling up the number of roles.
Information leaks back to the planner. If the synthesis is poor, you may be tempted to feed it back to the planner — now you have a loop with no termination guarantee. Cap iterations.
Coordination errors. Agents disagree, contradict each other, or one's output doesn't match the next one's expected input. Strict structured outputs (JSON schema, Module 6) reduce this 10x.
Sequential when parallel was possible. If your three workers don't depend on each other, they should be asyncio.gather-ed, not awaited in series.
Pretending architecture is intelligence. Five mediocre agents do not equal one excellent answer. If the planner is bad, no number of workers will save you.

When not to reach for multi-agent

Symptom	Better answer
Single-prompt accuracy is already 92%	Stop. You're done.
The task is small but the prompt is long	Tighter prompt. (Module 3.)
The task is "answer factually about our docs"	RAG. (Module 9.)
The task is "decide whether to call our API"	One prompt + one tool. (Module 8.)
You want auditable reasoning on a single problem	Plan-then-execute. (Module 15.)

A well-tuned single-prompt pipeline beats a sloppy multi-agent system every time.

Patterns worth knowing

Debate: Two workers argue opposing positions; a judge agent picks. Useful for hard, contested calls.
Critic: A worker generates; a critic agent refuses or accepts; the loop runs until accept (capped). Quality up, cost up.
Router: A small Haiku agent decides which of several specialist prompts handles the request. Cheap, often dramatic quality win.
Tool-using agent: A loop that lets one prompt call tools repeatedly until it has a final answer. Module 8 + Module 15 combined.

Each one is just a different message-passing topology over the same primitive: a prompt with a role and a defined output.

Try changing one thing

Add a router in front of solve that picks between "research mode" (planner→workers→synthesiser) and "quick-answer mode" (one prompt). Most queries take the cheap path.
Make the planner output sub-questions with dependencies (a JSON DAG). Run independent ones in parallel, dependent ones in series.
Add a critic agent after the synthesiser that returns "ACCEPT" or "REVISE: ". Loop up to twice.
Replace one worker with a tool-using agent that searches your docs (RAG inside the worker). Now your team has a librarian.

Going deeper: open the notebooks

notebooks/01_introduction.ipynb — planner/workers/synthesiser, debate, critic loops (~1.5–2h)
notebooks/02_intermediate.ipynb — DAG-shaped pipelines, observability across agents (~2–3h)
notebooks/03_advanced.ipynb — agents with persistent memory, complex coordination (~1.5–2.5h)

Module checklist

[ ] You've run a 3-agent pipeline and read all four artefacts (plan, worker outputs, synthesis)
[ ] You can describe one situation where multi-agent is the right call and one where it absolutely isn't
[ ] You've measured the cost difference between single-prompt and multi-agent for the same task
[ ] You have a hard cap on iterations in any feedback loop you build

Next module

Module 19 · Enterprise Scale — taking these patterns and surviving thousands of users.

This moduleOverviewResourcesExercises

Notebook links: nbviewer uses nbviewer.jupyter.org/url/… (not nbviewer.org) and loads the file via jsDelivr (cdn.jsdelivr.net/gh/…) so Project Jupyter does not hit GitHub’s REST API (which can return 503 when rate-limited). Requires a public repo mirror (Berta-one/claude-guides, branch master). Raw points at the copy on this website (same path under claudeguides.berta.one). Colab and the GitHub file link also need that public mirror. Forks: set CLAUDE_GUIDES_GITHUB_REPO and CLAUDE_GUIDES_BRANCH when building. You can always clone the repo and open .ipynb locally.

Notebooks

Intronbviewer Colab GitHub RawIntermediatenbviewer Colab GitHub RawAdvancednbviewer Colab GitHub RawModule overview

Code

View main_application.py (on this site)GitHub mirror

Next in module →

Full pathStep 18 of 26 · typical ~10h

← Previous module (Guide 17)

Paid & instructor-led courses