Stage 04 · ProModule 18 of 26~10h

Multi-Agent Systems

Coordinate multiple Claude agents on complex tasks.

← All modules in this stage

Sometimes the right answer comes from one prompt. Sometimes it comes from a small team of specialised prompts that pass work between them. This module shows you the simplest useful version, and the failure modes you'll meet the moment you scale beyond it.

By the end of this module you'll have

Time: about 2 hours for the basics, ~10 hours with all three notebooks.

Prerequisites: Modules 8 (tool use), 10 (conversations), 15 (advanced reasoning), 14 (production patterns).


What "agent" really means

Strip the marketing away and an agent is a prompt with a defined role, optional tools, and a defined output. A multi-agent system is a few of those passing structured messages around.

This is freeing: you don't need a framework. A dictionary of role-prompts and a Python loop is a multi-agent system. We'll build one in 60 lines.


The simplest useful pattern: planner → workers → synthesiser

Save as agents.py:

from concurrent.futures import ThreadPoolExecutor
from anthropic import Anthropic
from dotenv import load_dotenv

load_dotenv()
client = Anthropic()

def call(role_prompt: str, user: str, *, model="claude-sonnet-4-6", max_tokens=600) -> str:
    return client.messages.create(
        model=model, max_tokens=max_tokens, system=role_prompt,
        messages=[{"role": "user", "content": user}],
    ).content[0].text

PLANNER = (
    "You are a planner. Given a research task, output 3–5 sub-questions whose answers "
    "would together solve it. Reply with a JSON list of strings only."
)

WORKER = (
    "You are a focused researcher. Answer the single sub-question concisely (≤120 words). "
    "If you don't know, say so explicitly. Cite no sources you cannot name."
)

SYNTHESISER = (
    "You are an editor. Given an original task and a set of researcher answers, "
    "produce a single coherent answer to the original task. Note any contradictions "
    "between researchers and resolve them or flag them."
)

import json
def solve(task: str) -> str:
    raw = call(PLANNER, task, model="claude-haiku-4-5-20251001", max_tokens=300)
    sub_questions = json.loads(raw[raw.find("[") : raw.rfind("]") + 1])

    with ThreadPoolExecutor(max_workers=5) as pool:
        answers = list(pool.map(lambda q: call(WORKER, q), sub_questions))

    bundle = f"TASK:\n{task}\n\n" + "\n\n".join(
        f"SUB-Q: {q}\nA: {a}" for q, a in zip(sub_questions, answers)
    )
    return call(SYNTHESISER, bundle, model="claude-opus-4-7", max_tokens=900)

if __name__ == "__main__":
    print(solve("How should a 12-person engineering team think about adopting Claude in 2026?"))

What you should notice running this:


Why this beats "one giant prompt"

A single prompt that asks "research this and write me an answer" has problems multi-agent neatly avoids:

Problem How agents fix it
Context window pressure Each worker only sees its sub-question
Drift mid-answer Each role has a tight, single-purpose prompt
No parallelism Workers fan out concurrently
Hard to audit You can read the plan, the workers, the synthesis separately
Hard to swap a piece Change one role-prompt without touching the others

It's the Unix philosophy applied to LLMs: small programs, doing one thing, piped together.


Where multi-agent goes wrong

Real failure modes you'll meet:

  1. Cost balloons. Every agent is a call. Five agents is five times the spend. Always measure before scaling up the number of roles.
  2. Information leaks back to the planner. If the synthesis is poor, you may be tempted to feed it back to the planner — now you have a loop with no termination guarantee. Cap iterations.
  3. Coordination errors. Agents disagree, contradict each other, or one's output doesn't match the next one's expected input. Strict structured outputs (JSON schema, Module 6) reduce this 10x.
  4. Sequential when parallel was possible. If your three workers don't depend on each other, they should be asyncio.gather-ed, not awaited in series.
  5. Pretending architecture is intelligence. Five mediocre agents do not equal one excellent answer. If the planner is bad, no number of workers will save you.

When not to reach for multi-agent

Symptom Better answer
Single-prompt accuracy is already 92% Stop. You're done.
The task is small but the prompt is long Tighter prompt. (Module 3.)
The task is "answer factually about our docs" RAG. (Module 9.)
The task is "decide whether to call our API" One prompt + one tool. (Module 8.)
You want auditable reasoning on a single problem Plan-then-execute. (Module 15.)

A well-tuned single-prompt pipeline beats a sloppy multi-agent system every time.


Patterns worth knowing

Each one is just a different message-passing topology over the same primitive: a prompt with a role and a defined output.


Try changing one thing


Going deeper: open the notebooks


Module checklist


Next module

Module 19 · Enterprise Scale — taking these patterns and surviving thousands of users.