Stage 02 · PractitionerModule 10 of 26~6h

Conversations

Build chat that remembers context and stays on topic.

← All modules in this stage

Claude has no memory between API calls. Every turn, you hand it the entire history. That sounds tedious until you realise it's the cleanest possible model: nothing is hidden, nothing magic, and you decide exactly what the model "remembers."

By the end of this module you'll have

Time: about 1 hour for the basics, ~6 hours with all three notebooks.

Prerequisites: Modules 4 (API basics), 5 (tokens), 7 (building apps).


A multi-turn chat in 30 lines

Save as chat_loop.py:

from anthropic import Anthropic
from dotenv import load_dotenv

load_dotenv()
client = Anthropic()

SYSTEM = (
    "You are a friendly study buddy for someone learning to code. "
    "Keep replies under 4 sentences. Ask one follow-up question when useful."
)

history: list[dict] = []     # the whole conversation lives here

def turn(user_text: str) -> str:
    history.append({"role": "user", "content": user_text})
    response = client.messages.create(
        model="claude-sonnet-4-6", max_tokens=400, system=SYSTEM, messages=history,
    )
    reply = response.content[0].text
    history.append({"role": "assistant", "content": reply})
    return reply

if __name__ == "__main__":
    print("Chat with Claude. Ctrl-C to quit.")
    while True:
        try:
            user_text = input("\nyou> ").strip()
            if not user_text:
                continue
            print(f"claude> {turn(user_text)}")
        except (KeyboardInterrupt, EOFError):
            print()
            break

Run it. Have a conversation. Notice that Claude does remember what you said three turns ago — because history keeps growing, and you keep sending the whole thing back.


What just happened?

Three things make this work, and they're the things that make it expensive at scale:

  1. You replay history every turn. The model is stateless; the conversation is a list of {role, content} pairs you maintain yourself.
  2. The system prompt is constant. It's not part of messages — it sits beside them. Stable persona, stable rules.
  3. Each turn pays for the whole history. Turn 10 sends turns 1–9 again. That's why long chats get expensive — and why you eventually need to trim.

Tool use fits cleanly here. Module 8's tool-use loop is just a special case of this conversation pattern: you append assistant content with tool calls, then a user message with tool_result blocks. Same shape, more block types.


Keeping the history under control

Naive conversations grow until they hit the model's context window or your budget — whichever comes first. Three strategies, in increasing order of effort:

Strategy 1: Rolling window (cheap and effective)

Keep only the most recent N turns:

MAX_TURNS_KEPT = 12   # ≈ 6 user + 6 assistant messages

def trimmed():
    # Pair each user message with the assistant's reply, keep the last N.
    return history[-MAX_TURNS_KEPT * 2 :]

# In turn():
response = client.messages.create(
    model="claude-sonnet-4-6", max_tokens=400, system=SYSTEM, messages=trimmed(),
)

This is the right default for almost every chat product. Users rarely care about turn 47 from yesterday.

Strategy 2: Summarise older turns (when context matters)

When the conversation has been running long enough that important context is in older turns, summarise instead of truncate:

def summarise_old_turns():
    if len(history) > 30:
        old, recent = history[:-12], history[-12:]
        summary_prompt = (
            "Summarise the conversation below in under 200 words. Preserve any decisions "
            "made, names, numbers, or open questions.\n\n" +
            "\n".join(f"{m['role']}: {m['content']}" for m in old)
        )
        response = client.messages.create(
            model="claude-haiku-4-5-20251001", max_tokens=300,
            messages=[{"role": "user", "content": summary_prompt}],
        )
        history[:-12] = [{"role": "user", "content": "[Earlier conversation summary] " + response.content[0].text}]

Sonnet to chat, Haiku to summarise. The summary lives at the start of the trimmed history as context.

Strategy 3: Structured memory (when it really matters)

Pull facts out of the chat and store them separately ("user prefers metric units", "user is debugging Django"). Inject the relevant ones into system next time. This is its own design problem — Module 18 (Multi-Agent) covers patterns.


Persistence (when you need it)

For a CLI demo, in-memory is fine. For anything user-facing:

Need Cheap solution
Per-session memory A dict keyed by session_id. Fine until you scale beyond one process.
Survives restarts SQLite or any KV store. Serialize the message list as JSON.
Multi-device continuity Same store, but tied to a user_id, not a session_id.
Compliance (GDPR delete-my-data) Make sure the conversation key is the only identifier and is deletable.

Don't reach for Redis or DynamoDB on day one. SQLite handles surprising scale.


Try changing one thing


Going deeper: open the notebooks


Module checklist


Stage 02 complete

That's the Practitioner stage. You can now build apps that prompt well, call tools, ground answers in your data, and hold real conversations. The next stage takes these primitives and applies them to actual work.

Module 11 · Data Analysis with Claude