Stage 02 · PractitionerModule 10 of 26~6h

Conversations

Build chat that remembers context and stays on topic.

Claude has no memory between API calls. Every turn, you hand it the entire history. That sounds tedious until you realise it's the cleanest possible model: nothing is hidden, nothing magic, and you decide exactly what the model "remembers."

By the end of this module you'll have

A working multi-turn chat loop with a stable persona
A clear understanding of why message lists grow, and how to keep them under control
A tiny rolling-window strategy so chats never blow your context budget

Time: about 1 hour for the basics, ~6 hours with all three notebooks.

Prerequisites: Modules 4 (API basics), 5 (tokens), 7 (building apps).

A multi-turn chat in 30 lines

Save as chat_loop.py:

from anthropic import Anthropic
from dotenv import load_dotenv

load_dotenv()
client = Anthropic()

SYSTEM = (
    "You are a friendly study buddy for someone learning to code. "
    "Keep replies under 4 sentences. Ask one follow-up question when useful."
)

history: list[dict] = []     # the whole conversation lives here

def turn(user_text: str) -> str:
    history.append({"role": "user", "content": user_text})
    response = client.messages.create(
        model="claude-sonnet-4-6", max_tokens=400, system=SYSTEM, messages=history,
    )
    reply = response.content[0].text
    history.append({"role": "assistant", "content": reply})
    return reply

if __name__ == "__main__":
    print("Chat with Claude. Ctrl-C to quit.")
    while True:
        try:
            user_text = input("\nyou> ").strip()
            if not user_text:
                continue
            print(f"claude> {turn(user_text)}")
        except (KeyboardInterrupt, EOFError):
            print()
            break

Run it. Have a conversation. Notice that Claude does remember what you said three turns ago — because history keeps growing, and you keep sending the whole thing back.

What just happened?

Three things make this work, and they're the things that make it expensive at scale:

You replay history every turn. The model is stateless; the conversation is a list of {role, content} pairs you maintain yourself.
The system prompt is constant. It's not part of messages — it sits beside them. Stable persona, stable rules.
Each turn pays for the whole history. Turn 10 sends turns 1–9 again. That's why long chats get expensive — and why you eventually need to trim.

Tool use fits cleanly here. Module 8's tool-use loop is just a special case of this conversation pattern: you append assistant content with tool calls, then a user message with tool_result blocks. Same shape, more block types.

Keeping the history under control

Naive conversations grow until they hit the model's context window or your budget — whichever comes first. Three strategies, in increasing order of effort:

Strategy 1: Rolling window (cheap and effective)

Keep only the most recent N turns:

MAX_TURNS_KEPT = 12   # ≈ 6 user + 6 assistant messages

def trimmed():
    # Pair each user message with the assistant's reply, keep the last N.
    return history[-MAX_TURNS_KEPT * 2 :]

# In turn():
response = client.messages.create(
    model="claude-sonnet-4-6", max_tokens=400, system=SYSTEM, messages=trimmed(),
)

This is the right default for almost every chat product. Users rarely care about turn 47 from yesterday.

Strategy 2: Summarise older turns (when context matters)

When the conversation has been running long enough that important context is in older turns, summarise instead of truncate:

def summarise_old_turns():
    if len(history) > 30:
        old, recent = history[:-12], history[-12:]
        summary_prompt = (
            "Summarise the conversation below in under 200 words. Preserve any decisions "
            "made, names, numbers, or open questions.\n\n" +
            "\n".join(f"{m['role']}: {m['content']}" for m in old)
        )
        response = client.messages.create(
            model="claude-haiku-4-5-20251001", max_tokens=300,
            messages=[{"role": "user", "content": summary_prompt}],
        )
        history[:-12] = [{"role": "user", "content": "[Earlier conversation summary] " + response.content[0].text}]

Sonnet to chat, Haiku to summarise. The summary lives at the start of the trimmed history as context.

Strategy 3: Structured memory (when it really matters)

Pull facts out of the chat and store them separately ("user prefers metric units", "user is debugging Django"). Inject the relevant ones into system next time. This is its own design problem — Module 18 (Multi-Agent) covers patterns.

Persistence (when you need it)

For a CLI demo, in-memory is fine. For anything user-facing:

Need	Cheap solution
Per-session memory	A dict keyed by session_id. Fine until you scale beyond one process.
Survives restarts	SQLite or any KV store. Serialize the message list as JSON.
Multi-device continuity	Same store, but tied to a user_id, not a session_id.
Compliance (GDPR delete-my-data)	Make sure the conversation key is the only identifier and is deletable.

Don't reach for Redis or DynamoDB on day one. SQLite handles surprising scale.

Try changing one thing

Drop the system prompt mid-conversation. Watch persona drift across turns.
Set MAX_TURNS_KEPT = 2. Have a conversation that needs memory; watch Claude lose track. That's your motivation for summarisation.
Add a temperature=0.9 for a chattier, more random feel. Use 0.0 for terse and consistent.
Inject print(sum(len(m['content']) for m in history)) after each turn to feel the cost grow.

Going deeper: open the notebooks

notebooks/01_introduction.ipynb — message pruning vs summarisation, session storage, what to keep from tool outputs (~1.5–2h)
notebooks/02_intermediate.ipynb — episodic vs working memory patterns, cross-device continuity (~2–3h)
notebooks/03_advanced.ipynb — coherence/helpfulness/safety evals, moderation hooks, load testing chat (~1.5–2.5h)

Module checklist

[ ] You've held a multi-turn conversation through your script
[ ] You can explain why turn 10 is more expensive than turn 1
[ ] You've trimmed a history with a rolling window and noticed the trade-off
[ ] You know one situation where summarisation is worth the extra Haiku call

Stage 02 complete

That's the Practitioner stage. You can now build apps that prompt well, call tools, ground answers in your data, and hold real conversations. The next stage takes these primitives and applies them to actual work.

Module 11 · Data Analysis with Claude →

This moduleOverviewResourcesExercises

Notebook links: nbviewer uses nbviewer.jupyter.org/url/… (not nbviewer.org) and loads the file via jsDelivr (cdn.jsdelivr.net/gh/…) so Project Jupyter does not hit GitHub’s REST API (which can return 503 when rate-limited). Requires a public repo mirror (Berta-one/claude-guides, branch master). Raw points at the copy on this website (same path under claudeguides.berta.one). Colab and the GitHub file link also need that public mirror. Forks: set CLAUDE_GUIDES_GITHUB_REPO and CLAUDE_GUIDES_BRANCH when building. You can always clone the repo and open .ipynb locally.

Notebooks

Intronbviewer Colab GitHub RawIntermediatenbviewer Colab GitHub RawAdvancednbviewer Colab GitHub RawModule overview

Code

View main_application.py (on this site)GitHub mirror

Next in module →

Full pathStep 10 of 26 · typical ~6h

← Previous module (Guide 9)

Paid & instructor-led courses