Multi-Agent Systems
Coordinate multiple Claude agents on complex tasks.
← All modules in this stageSometimes the right answer comes from one prompt. Sometimes it comes from a small team of specialised prompts that pass work between them. This module shows you the simplest useful version, and the failure modes you'll meet the moment you scale beyond it.
By the end of this module you'll have
- A working planner → workers → synthesiser pipeline solving a real task
- A clear sense of what "agent" actually means here (spoiler: it's a prompt with a job)
- The instinct to ask "would one well-shaped prompt have done this cheaper?" before adding another agent
Time: about 2 hours for the basics, ~10 hours with all three notebooks.
Prerequisites: Modules 8 (tool use), 10 (conversations), 15 (advanced reasoning), 14 (production patterns).
What "agent" really means
Strip the marketing away and an agent is a prompt with a defined role, optional tools, and a defined output. A multi-agent system is a few of those passing structured messages around.
This is freeing: you don't need a framework. A dictionary of role-prompts and a Python loop is a multi-agent system. We'll build one in 60 lines.
The simplest useful pattern: planner → workers → synthesiser
Save as agents.py:
from concurrent.futures import ThreadPoolExecutor
from anthropic import Anthropic
from dotenv import load_dotenv
load_dotenv()
client = Anthropic()
def call(role_prompt: str, user: str, *, model="claude-sonnet-4-6", max_tokens=600) -> str:
return client.messages.create(
model=model, max_tokens=max_tokens, system=role_prompt,
messages=[{"role": "user", "content": user}],
).content[0].text
PLANNER = (
"You are a planner. Given a research task, output 3–5 sub-questions whose answers "
"would together solve it. Reply with a JSON list of strings only."
)
WORKER = (
"You are a focused researcher. Answer the single sub-question concisely (≤120 words). "
"If you don't know, say so explicitly. Cite no sources you cannot name."
)
SYNTHESISER = (
"You are an editor. Given an original task and a set of researcher answers, "
"produce a single coherent answer to the original task. Note any contradictions "
"between researchers and resolve them or flag them."
)
import json
def solve(task: str) -> str:
raw = call(PLANNER, task, model="claude-haiku-4-5-20251001", max_tokens=300)
sub_questions = json.loads(raw[raw.find("[") : raw.rfind("]") + 1])
with ThreadPoolExecutor(max_workers=5) as pool:
answers = list(pool.map(lambda q: call(WORKER, q), sub_questions))
bundle = f"TASK:\n{task}\n\n" + "\n\n".join(
f"SUB-Q: {q}\nA: {a}" for q, a in zip(sub_questions, answers)
)
return call(SYNTHESISER, bundle, model="claude-opus-4-7", max_tokens=900)
if __name__ == "__main__":
print(solve("How should a 12-person engineering team think about adopting Claude in 2026?"))
What you should notice running this:
- The planner is cheap (Haiku) — it's just structuring work.
- The workers run in parallel — that's the whole latency win.
- The synthesiser is your strongest model — synthesis is the real cognitive task.
- Total cost is roughly the sum of all calls; the win is quality and parallel latency, not cost.
Why this beats "one giant prompt"
A single prompt that asks "research this and write me an answer" has problems multi-agent neatly avoids:
| Problem | How agents fix it |
|---|---|
| Context window pressure | Each worker only sees its sub-question |
| Drift mid-answer | Each role has a tight, single-purpose prompt |
| No parallelism | Workers fan out concurrently |
| Hard to audit | You can read the plan, the workers, the synthesis separately |
| Hard to swap a piece | Change one role-prompt without touching the others |
It's the Unix philosophy applied to LLMs: small programs, doing one thing, piped together.
Where multi-agent goes wrong
Real failure modes you'll meet:
- Cost balloons. Every agent is a call. Five agents is five times the spend. Always measure before scaling up the number of roles.
- Information leaks back to the planner. If the synthesis is poor, you may be tempted to feed it back to the planner — now you have a loop with no termination guarantee. Cap iterations.
- Coordination errors. Agents disagree, contradict each other, or one's output doesn't match the next one's expected input. Strict structured outputs (JSON schema, Module 6) reduce this 10x.
- Sequential when parallel was possible. If your three workers don't depend on each other, they should be
asyncio.gather-ed, not awaited in series. - Pretending architecture is intelligence. Five mediocre agents do not equal one excellent answer. If the planner is bad, no number of workers will save you.
When not to reach for multi-agent
| Symptom | Better answer |
|---|---|
| Single-prompt accuracy is already 92% | Stop. You're done. |
| The task is small but the prompt is long | Tighter prompt. (Module 3.) |
| The task is "answer factually about our docs" | RAG. (Module 9.) |
| The task is "decide whether to call our API" | One prompt + one tool. (Module 8.) |
| You want auditable reasoning on a single problem | Plan-then-execute. (Module 15.) |
A well-tuned single-prompt pipeline beats a sloppy multi-agent system every time.
Patterns worth knowing
- Debate: Two workers argue opposing positions; a judge agent picks. Useful for hard, contested calls.
- Critic: A worker generates; a critic agent refuses or accepts; the loop runs until accept (capped). Quality up, cost up.
- Router: A small Haiku agent decides which of several specialist prompts handles the request. Cheap, often dramatic quality win.
- Tool-using agent: A loop that lets one prompt call tools repeatedly until it has a final answer. Module 8 + Module 15 combined.
Each one is just a different message-passing topology over the same primitive: a prompt with a role and a defined output.
Try changing one thing
- Add a router in front of
solvethat picks between "research mode" (planner→workers→synthesiser) and "quick-answer mode" (one prompt). Most queries take the cheap path. - Make the planner output sub-questions with dependencies (a JSON DAG). Run independent ones in parallel, dependent ones in series.
- Add a critic agent after the synthesiser that returns "ACCEPT" or "REVISE:
". Loop up to twice. - Replace one worker with a tool-using agent that searches your docs (RAG inside the worker). Now your team has a librarian.
Going deeper: open the notebooks
notebooks/01_introduction.ipynb— planner/workers/synthesiser, debate, critic loops (~1.5–2h)notebooks/02_intermediate.ipynb— DAG-shaped pipelines, observability across agents (~2–3h)notebooks/03_advanced.ipynb— agents with persistent memory, complex coordination (~1.5–2.5h)
Module checklist
- [ ] You've run a 3-agent pipeline and read all four artefacts (plan, worker outputs, synthesis)
- [ ] You can describe one situation where multi-agent is the right call and one where it absolutely isn't
- [ ] You've measured the cost difference between single-prompt and multi-agent for the same task
- [ ] You have a hard cap on iterations in any feedback loop you build
Next module
Module 19 · Enterprise Scale — taking these patterns and surviving thousands of users.