Conversations
Build chat that remembers context and stays on topic.
← All modules in this stageClaude has no memory between API calls. Every turn, you hand it the entire history. That sounds tedious until you realise it's the cleanest possible model: nothing is hidden, nothing magic, and you decide exactly what the model "remembers."
By the end of this module you'll have
- A working multi-turn chat loop with a stable persona
- A clear understanding of why message lists grow, and how to keep them under control
- A tiny rolling-window strategy so chats never blow your context budget
Time: about 1 hour for the basics, ~6 hours with all three notebooks.
Prerequisites: Modules 4 (API basics), 5 (tokens), 7 (building apps).
A multi-turn chat in 30 lines
Save as chat_loop.py:
from anthropic import Anthropic
from dotenv import load_dotenv
load_dotenv()
client = Anthropic()
SYSTEM = (
"You are a friendly study buddy for someone learning to code. "
"Keep replies under 4 sentences. Ask one follow-up question when useful."
)
history: list[dict] = [] # the whole conversation lives here
def turn(user_text: str) -> str:
history.append({"role": "user", "content": user_text})
response = client.messages.create(
model="claude-sonnet-4-6", max_tokens=400, system=SYSTEM, messages=history,
)
reply = response.content[0].text
history.append({"role": "assistant", "content": reply})
return reply
if __name__ == "__main__":
print("Chat with Claude. Ctrl-C to quit.")
while True:
try:
user_text = input("\nyou> ").strip()
if not user_text:
continue
print(f"claude> {turn(user_text)}")
except (KeyboardInterrupt, EOFError):
print()
break
Run it. Have a conversation. Notice that Claude does remember what you said three turns ago — because history keeps growing, and you keep sending the whole thing back.
What just happened?
Three things make this work, and they're the things that make it expensive at scale:
- You replay history every turn. The model is stateless; the conversation is a list of
{role, content}pairs you maintain yourself. - The
systemprompt is constant. It's not part ofmessages— it sits beside them. Stable persona, stable rules. - Each turn pays for the whole history. Turn 10 sends turns 1–9 again. That's why long chats get expensive — and why you eventually need to trim.
Tool use fits cleanly here. Module 8's tool-use loop is just a special case of this conversation pattern: you append
assistantcontent with tool calls, then ausermessage withtool_resultblocks. Same shape, more block types.
Keeping the history under control
Naive conversations grow until they hit the model's context window or your budget — whichever comes first. Three strategies, in increasing order of effort:
Strategy 1: Rolling window (cheap and effective)
Keep only the most recent N turns:
MAX_TURNS_KEPT = 12 # ≈ 6 user + 6 assistant messages
def trimmed():
# Pair each user message with the assistant's reply, keep the last N.
return history[-MAX_TURNS_KEPT * 2 :]
# In turn():
response = client.messages.create(
model="claude-sonnet-4-6", max_tokens=400, system=SYSTEM, messages=trimmed(),
)
This is the right default for almost every chat product. Users rarely care about turn 47 from yesterday.
Strategy 2: Summarise older turns (when context matters)
When the conversation has been running long enough that important context is in older turns, summarise instead of truncate:
def summarise_old_turns():
if len(history) > 30:
old, recent = history[:-12], history[-12:]
summary_prompt = (
"Summarise the conversation below in under 200 words. Preserve any decisions "
"made, names, numbers, or open questions.\n\n" +
"\n".join(f"{m['role']}: {m['content']}" for m in old)
)
response = client.messages.create(
model="claude-haiku-4-5-20251001", max_tokens=300,
messages=[{"role": "user", "content": summary_prompt}],
)
history[:-12] = [{"role": "user", "content": "[Earlier conversation summary] " + response.content[0].text}]
Sonnet to chat, Haiku to summarise. The summary lives at the start of the trimmed history as context.
Strategy 3: Structured memory (when it really matters)
Pull facts out of the chat and store them separately ("user prefers metric units", "user is debugging Django"). Inject the relevant ones into system next time. This is its own design problem — Module 18 (Multi-Agent) covers patterns.
Persistence (when you need it)
For a CLI demo, in-memory is fine. For anything user-facing:
| Need | Cheap solution |
|---|---|
| Per-session memory | A dict keyed by session_id. Fine until you scale beyond one process. |
| Survives restarts | SQLite or any KV store. Serialize the message list as JSON. |
| Multi-device continuity | Same store, but tied to a user_id, not a session_id. |
| Compliance (GDPR delete-my-data) | Make sure the conversation key is the only identifier and is deletable. |
Don't reach for Redis or DynamoDB on day one. SQLite handles surprising scale.
Try changing one thing
- Drop the
systemprompt mid-conversation. Watch persona drift across turns. - Set
MAX_TURNS_KEPT = 2. Have a conversation that needs memory; watch Claude lose track. That's your motivation for summarisation. - Add a
temperature=0.9for a chattier, more random feel. Use0.0for terse and consistent. - Inject
print(sum(len(m['content']) for m in history))after each turn to feel the cost grow.
Going deeper: open the notebooks
notebooks/01_introduction.ipynb— message pruning vs summarisation, session storage, what to keep from tool outputs (~1.5–2h)notebooks/02_intermediate.ipynb— episodic vs working memory patterns, cross-device continuity (~2–3h)notebooks/03_advanced.ipynb— coherence/helpfulness/safety evals, moderation hooks, load testing chat (~1.5–2.5h)
Module checklist
- [ ] You've held a multi-turn conversation through your script
- [ ] You can explain why turn 10 is more expensive than turn 1
- [ ] You've trimmed a history with a rolling window and noticed the trade-off
- [ ] You know one situation where summarisation is worth the extra Haiku call
Stage 02 complete
That's the Practitioner stage. You can now build apps that prompt well, call tools, ground answers in your data, and hold real conversations. The next stage takes these primitives and applies them to actual work.