Stage 01 · FoundationsModule 2 of 26~4h

Models & Capabilities

Pick the right Claude model for the job at hand.

There are several Claude models. They cost different amounts, run at different speeds, and handle different difficulty levels. This module teaches you to pick the right one — by measuring, not guessing.

By the end of this module you'll have

A clear mental model of the Haiku / Sonnet / Opus families and when each pays off
A small benchmark script that compares two models on your task in seconds
A reusable rule of thumb for escalating from cheaper to more capable models

Time: about 45 minutes for the basics, ~4 hours with all three notebooks.

Prerequisites: Module 1 finished and your environment working.

The shape of the family (today)

Family	Latest ID	Best for	Rough cost	Rough speed
Haiku	`claude-haiku-4-5-20251001`	Classification, routing, high-volume cheap work, drafts	$	Very fast
Sonnet	`claude-sonnet-4-6`	Most production work — the sensible default	$$	Fast
Opus	`claude-opus-4-7`	Hard reasoning, long multi-step plans, premium quality	$$$	Slower

Pricing and speed change over time — always confirm with the official docs. What stays true: smaller is faster and cheaper, larger reasons better.

Rule of thumb. Start on Sonnet. Drop to Haiku once you've proven Sonnet works and you have a quality bar to test against. Escalate to Opus only when Sonnet visibly fails on tasks you actually care about.

Compare two models on your task (5 minutes)

Save as compare_models.py and adjust the task string to something you care about:

import os, time
from anthropic import Anthropic
from dotenv import load_dotenv

load_dotenv()
client = Anthropic()

task = "Summarize the plot of Hamlet in three bullet points, neutral tone."

models_to_compare = [
    "claude-haiku-4-5-20251001",
    "claude-sonnet-4-6",
]

for model in models_to_compare:
    start = time.perf_counter()
    response = client.messages.create(
        model=model,
        max_tokens=400,
        messages=[{"role": "user", "content": task}],
    )
    elapsed_ms = (time.perf_counter() - start) * 1000
    text = response.content[0].text

    print(f"\n=== {model} ===")
    print(f"latency:   {elapsed_ms:.0f} ms")
    print(f"in/out:    {response.usage.input_tokens} / {response.usage.output_tokens} tokens")
    print(f"output:    {text}")

Look at three things in the output:

Latency — how long the user waits.
Tokens out — what you'll be billed for, and a rough proxy for verbosity.
The reply itself — read it. Does the cheaper model's answer hold up against the more expensive one for this task?

If the answers look equally good, you've just saved real money. If the bigger model is clearly better, you've earned the right to spend more.

How to choose, in practice

Start with the task, not the model. Three questions decide most of it:

What's the failure cost? Mis-routed support ticket = low. Wrong code shipped to prod = high. Higher cost → bigger model.
How long is the chain of reasoning? A single classification = small. A multi-step plan with branching = larger.
How long is the input? Long context windows are supported across the family, but longer prompts on a bigger model multiply latency and cost.

Then measure. A 10-line benchmark on 20 real examples beats a one-liner argument every time. Module 20 (Testing & Evaluation) shows how to turn that into a regression suite.

Try changing one thing

Add claude-opus-4-7 to models_to_compare. Notice how much slower it is — and whether the answer is meaningfully better.
Lower max_tokens to 80. Different models truncate differently — some still finish a thought, others stop mid-sentence.
Change the task to something domain-specific (e.g., "extract email addresses from this paragraph"). Cheaper models often win on simple, well-scoped tasks.
Add system="Reply in JSON only". See which model holds the format more reliably.

Going deeper: open the notebooks

notebooks/01_introduction.ipynb — capability map, reading model cards like a PM + engineer (~1.5–2h)
notebooks/02_intermediate.ipynb — A/B tests across models, regression suites (~2–3h)
notebooks/03_advanced.ipynb — latency budgets, multi-lingual, failure analysis from logs (~1.5–2.5h)

Module checklist

[ ] You've benchmarked the same task on at least two models
[ ] You can name two situations where Haiku is the right call, and two where Opus is
[ ] You have a starting model in mind for the project you'll build later in this curriculum

Next module

Module 3 · Prompt Engineering Basics — the patterns that make any model behave more reliably.

This moduleOverviewResourcesExercises

Notebook links: nbviewer uses nbviewer.jupyter.org/url/… (not nbviewer.org) and loads the file via jsDelivr (cdn.jsdelivr.net/gh/…) so Project Jupyter does not hit GitHub’s REST API (which can return 503 when rate-limited). Requires a public repo mirror (Berta-one/claude-guides, branch master). Raw points at the copy on this website (same path under claudeguides.berta.one). Colab and the GitHub file link also need that public mirror. Forks: set CLAUDE_GUIDES_GITHUB_REPO and CLAUDE_GUIDES_BRANCH when building. You can always clone the repo and open .ipynb locally.

Notebooks

Intronbviewer Colab GitHub RawIntermediatenbviewer Colab GitHub RawAdvancednbviewer Colab GitHub RawModule overview

Code

View main_application.py (on this site)GitHub mirror

Next in module →

Full pathStep 2 of 26 · typical ~4h

← Previous module (Guide 1)

Paid & instructor-led courses