Stage 03 · BuilderModule 14 of 26~8h

Production Patterns

Logs, retries, safety — ship without surprises.

The day after you ship a Claude feature, three things go wrong: a model returns nonsense and you can't reproduce it, a slow response page-times-out a user, or a single bad input cascades into a thousand. This module gives you the small set of patterns that make those things visible, recoverable, and survivable.

By the end of this module you'll have

Structured logs for every model call: enough to debug tomorrow, nothing that endangers users
A timeout + retry + fallback pattern that fails gracefully instead of catastrophically
A simple online eval that catches quality regressions before users do

Time: about 2 hours for the basics, ~8 hours with all three notebooks.

Prerequisites: Modules 4 (API basics), 7 (building apps). Familiarity with at least one production system you've operated.

Pattern 1 · Log every call (carefully)

Every model call should write a structured log line. You'll thank yourself the first time a user reports a bad answer.

import json, time, uuid, logging
from anthropic import Anthropic
from dotenv import load_dotenv

load_dotenv()
client = Anthropic()
log = logging.getLogger("claude")

def hashed_prefix(text: str, n: int = 80) -> str:
    """Truncated, *not* hashed — but never log full bodies in production."""
    return text[:n].replace("\n", "\\n") + ("…" if len(text) > n else "")

def call(*, prompt: str, model: str, **kw):
    request_id = str(uuid.uuid4())
    started = time.perf_counter()
    try:
        r = client.messages.create(
            model=model, max_tokens=kw.pop("max_tokens", 600),
            messages=[{"role": "user", "content": prompt}], **kw,
        )
        elapsed_ms = (time.perf_counter() - started) * 1000
        log.info(json.dumps({
            "event":      "claude.call",
            "request_id": request_id,
            "model":      model,
            "latency_ms": round(elapsed_ms),
            "in_tokens":  r.usage.input_tokens,
            "out_tokens": r.usage.output_tokens,
            "stop":       r.stop_reason,
            "prompt":     hashed_prefix(prompt),       # truncated only
        }))
        return r
    except Exception as exc:
        log.exception(json.dumps({
            "event":      "claude.error",
            "request_id": request_id,
            "model":      model,
            "error":      type(exc).__name__,
        }))
        raise

What this gets you, day one: searchable logs, latency percentiles, cost-per-feature dashboards (just sum tokens grouped by feature), and a request_id you can quote when a user complains. What it doesn't get you: PII in your logs.

Pattern 2 · Timeout + retry + fallback

Three layers, each handling a different class of problem:

import time, random
from anthropic import (
    Anthropic, RateLimitError, APIConnectionError, APITimeoutError, APIStatusError,
)

TRANSIENT = (RateLimitError, APIConnectionError, APITimeoutError)

def call_with_recovery(prompt: str, *, primary="claude-sonnet-4-6", fallback="claude-haiku-4-5-20251001"):
    for attempt in range(4):
        try:
            return client.messages.create(
                model=primary, max_tokens=600, timeout=20.0,            # layer 1: timeout
                messages=[{"role": "user", "content": prompt}],
            )
        except TRANSIENT:
            if attempt == 3:
                break
            time.sleep((2 ** attempt) + random.random())                # layer 2: retry with backoff
    # layer 3: fallback to a cheaper, often more available model
    return client.messages.create(
        model=fallback, max_tokens=600, timeout=20.0,
        messages=[{"role": "user", "content": prompt}],
    )

Three rules to keep this honest:

Set a real timeout. No timeout means your user is staring at a spinner forever.
Cap retries. Four attempts is plenty. Anything more masks systemic problems.
Have a fallback you'd actually ship. "Sorry, try again" is fine — that's still a fallback. Don't pretend Haiku is a perfect substitute for Sonnet; just decide what degraded UX looks like.

Pattern 3 · Online evals (catch regressions in flight)

You can't run the full eval suite on every request, but you can run a small classifier on the response itself and alert when quality drops.

def passes_smell_test(prompt: str, response_text: str) -> bool:
    """Cheap, cheap signal. Misses subtle regressions but catches obvious ones."""
    judge = client.messages.create(
        model="claude-haiku-4-5-20251001", max_tokens=20,
        system=(
            "You are a quality gate. Reply with one word: PASS or FAIL.\n"
            "FAIL if the response is empty, refuses for a benign request, or contradicts itself."
        ),
        messages=[{
            "role": "user",
            "content": f"PROMPT:\n{prompt}\n\nRESPONSE:\n{response_text}\n\nVerdict:",
        }],
    )
    return judge.content[0].text.strip().upper().startswith("PASS")

Wire it in as a sample (e.g. 1% of traffic), log the failures with full request/response, and build a dashboard. Module 20 turns this into a real eval framework.

What to log and what to never log

Log	Don't log
Model id, latency, in/out tokens, request_id, error type	Full user prompts that contain PII or secrets
Stop reason (`end_turn`, `max_tokens`, `tool_use`)	API keys, even hashed
Truncated prompt prefix (≤ 100 chars)	Raw response bodies that may contain user data
Whether the smell test passed	Anything you couldn't justify in a privacy review
Cost-per-call (`out_tokens × out_rate + in_tokens × in_rate`)	"Just for now" log-everything fields. They never come out.

A useful instinct: imagine an auditor reading your logs. Could they reconstruct who asked what? If yes, change what you log.

A small operational checklist

Before a Claude-backed feature goes to real users:

[ ] Every call logs a structured line with model, latency, tokens, request_id
[ ] There's a timeout on every request
[ ] Retries are capped and backoff is exponential with jitter
[ ] There's a defined fallback (degraded model, cached answer, or graceful "try again")
[ ] PII never reaches the logs
[ ] You can compute cost-per-feature from yesterday's logs
[ ] You have a smell-test or sampled human review running on responses
[ ] You know how to roll the model id back if the next release misbehaves

If any of those is unchecked, you'll find out in production. Better to find out in staging.

Try changing one thing

Add a feature field to the log line and tag every call site. Now your dashboard groups by feature.
Set timeout=0.5 deliberately and watch the retry/fallback fire. Helps you trust the recovery path.
Run the smell test on 100 historical responses. Set a baseline pass rate; alert when it drops 5 points.
Add cache_key = hash((model, prompt)) and skip the call when you have a recent cached response. Module 16 dives deeper.

Going deeper: open the notebooks

notebooks/01_introduction.ipynb — observability, structured logs, error budgets (~1.5–2h)
notebooks/02_intermediate.ipynb — circuit breakers, idempotency, cost tracking per user (~2–3h)
notebooks/03_advanced.ipynb — disaster recovery, multi-region, incident playbooks (~1.5–2.5h)

Module checklist

[ ] You've added structured logging around at least one Claude call
[ ] You've tested your retry/fallback path by deliberately failing the primary call
[ ] You can name the items on the pre-launch operational checklist from memory
[ ] You've decided what your feature's degraded mode looks like

Next module

Module 15 · Advanced Reasoning — patterns for the hard problems where one prompt isn't enough.

This moduleOverviewResourcesExercises

Notebook links: nbviewer uses nbviewer.jupyter.org/url/… (not nbviewer.org) and loads the file via jsDelivr (cdn.jsdelivr.net/gh/…) so Project Jupyter does not hit GitHub’s REST API (which can return 503 when rate-limited). Requires a public repo mirror (Berta-one/claude-guides, branch master). Raw points at the copy on this website (same path under claudeguides.berta.one). Colab and the GitHub file link also need that public mirror. Forks: set CLAUDE_GUIDES_GITHUB_REPO and CLAUDE_GUIDES_BRANCH when building. You can always clone the repo and open .ipynb locally.

Notebooks

Intronbviewer Colab GitHub RawIntermediatenbviewer Colab GitHub RawAdvancednbviewer Colab GitHub RawModule overview

Code

View main_application.py (on this site)GitHub mirror

Next in module →

Full pathStep 14 of 26 · typical ~8h

← Previous module (Guide 13)

Paid & instructor-led courses