AI Coach

What this lesson is about

Every conversation you have with Claude Code has a cost attached to it — and without understanding how that cost is calculated, it is easy to run up a surprisingly large bill without realising why. This lesson explains exactly how Claude charges for its work, which habits quietly inflate your costs, and which simple techniques can reduce your spending by as much as 90% without any loss in quality.

Core concept: The taxi meter

Picture yourself getting into a taxi. The moment the driver pulls away from the kerb, the meter starts running. It does not matter whether you are making polite small talk or giving detailed turn-by-turn directions — the meter runs the same way for every word exchanged. A five-minute chat about the weather costs you just as much per minute as a detailed discussion about the fastest route.

Claude Code works in exactly the same way, except instead of measuring distance and time, it measures tokens — small units that represent words, parts of words, and punctuation. Every token that goes into Claude (your instructions, your files, your context) costs money. Every token that comes out of Claude (its responses, its generated code, its explanations) also costs money.

The meter runs whether you are asking a quick question or processing an entire codebase. Understanding this is the first step to spending wisely.

What tokens are

A token is the basic unit Claude uses to read and write text. It is not exactly one word — it is more like one syllable or short word fragment. As a rough guide:

1 token ≈ 4 characters of text
100 tokens ≈ 75 words
A typical work email ≈ 200–400 tokens
A 10-page business document ≈ 4,000–6,000 tokens

You do not need to count tokens manually. What matters is developing an intuition for what makes conversations expensive versus cheap.

Input tokens vs output tokens

Not all tokens cost the same. Claude distinguishes between two types:

Token type	What it includes	Relative cost
Input tokens	Everything you send to Claude: your message, uploaded files, the entire conversation history, your `CLAUDE.md` file	Lower cost per token
Output tokens	Everything Claude writes back: responses, generated code, explanations, analysis	Higher cost per token

What this means in practice:

Asking Claude to read a large file is relatively inexpensive. Asking Claude to rewrite that entire file from scratch is more expensive, because every word it generates is an output token.

This is why it is worth being precise with your instructions. Asking Claude to “fix the broken login function” will cost less than asking it to “rewrite the entire authentication system from scratch” — even if the end result would have been similar.

How to check your spend mid-session

At any point during a Claude Code session, you can type the following command directly into the conversation:

/cost

Claude will respond with a summary of how many tokens have been used and what that has cost so far in the current session. This is the equivalent of glancing at the taxi meter before you reach your destination — it lets you make informed decisions about whether to continue or wrap up.

Make it a habit to check /cost at natural pauses in your work: after completing a task, before starting a new large request, or any time a session has been running for more than 30 minutes.

How to estimate cost before running a large task

Before you ask Claude to do something substantial — process a large folder of files, analyse a lengthy document, or generate a detailed report — it is worth taking a moment to think about what that task will involve in token terms.

Ask yourself:

How much am I sending in? If you are about to paste in a large file or ask Claude to read many documents at once, that is a significant number of input tokens.
How much will Claude need to write back? A request for a one-sentence summary costs far less than a request for a full 20-page report.
Is this task genuinely necessary right now? Sometimes a large expensive task can be broken into smaller, cheaper steps that give you the same outcome.

You can also simply ask Claude directly before committing: “How expensive do you think this task will be? Can we do it in a cheaper way?” Claude will give you an honest assessment.

Prompt caching: the photocopier principle

What prompt caching is

Imagine you need to make 50 copies of a 10-page document. The photocopier has to scan the original pages once — that takes time and uses a small amount of resources. But once the original is scanned, producing copies two through fifty is fast and costs almost nothing, because the machine is just reprinting something it already has stored.

Claude uses the same principle with prompt caching. The first time Claude reads a large piece of context — your CLAUDE.md file, a lengthy system prompt, or a big block of background information — it processes it fully and that costs full price. But if the same content appears at the start of your next message, Claude recognises it from the previous scan and charges you at a dramatically reduced rate for that repeated portion.

The result: repeated context can cost up to 90% less than it would without caching. For people who work on long projects with Claude over multiple sessions, this is the single most powerful cost-saving technique available.

How to enable prompt caching on your CLAUDE.md file

Your CLAUDE.md file (a special file that Claude reads automatically at the start of every session to understand your project and preferences) is the perfect candidate for prompt caching, because it is sent to Claude every single time you open a session.

To enable caching for your CLAUDE.md, add the following line at the very top of the file:

<!-- cache_control: ephemeral -->

Your CLAUDE.md might then look like this:

<!-- cache_control: ephemeral -->

# My Project Context

## About this project
This is a small business website for Nkosi & Associates, a Durban-based
law firm. The site is built with plain HTML and CSS — no complex frameworks.

## My preferences
- Always explain what you are doing before you do it
- Use plain English, not technical jargon
- Ask before deleting anything

That single comment at the top tells Claude to cache the file’s contents after the first read. Every subsequent session that loads this file will cost a fraction of what it would otherwise.

How to set a spending limit

To prevent an unexpectedly large bill — perhaps from an accidental loop, a very long session you forgot to close, or a runaway automated task — you can set a hard spending limit in Claude Code’s configuration.

Open or create your .claude/settings.json file (refer back to Lesson 5 if you need a reminder of where this lives) and add a costLimit value:

{
  "costLimit": {
    "monthly": 50.00,
    "currency": "USD"
  }
}

What this does: Once Claude Code estimates your spending has reached $50 for the month, it will stop and alert you rather than continuing to accumulate charges. You can adjust the number to whatever suits your budget.

This is the equivalent of setting a data cap on your phone plan — a simple safety net that ensures a distracted afternoon cannot turn into an unpleasant surprise at the end of the month.

The most common causes of unexpectedly high bills

Reading large files unnecessarily

Every time Claude reads a file, those file contents are added to the input token count. If you ask Claude to “look through all my project files and tell me what needs updating,” and your project contains dozens of large files, you are paying for all of them — even the ones that turn out to be irrelevant.

Better approach: Be specific about which files you want Claude to look at. “Please read only invoice-template.html and suggest improvements” is far cheaper than “please read all my files.”

Very long sessions without `/compact`

Claude Code keeps the entire conversation history in what is called the context window — think of it as Claude’s short-term memory. As a session grows longer, that history grows larger, and every new message you send includes the full weight of everything said before it.

The /compact command (covered in depth in Lesson 4) summarises and compresses that history, dramatically reducing the token overhead on subsequent messages. Without it, a long working session can become progressively more expensive with each exchange, even if you are asking simple questions.

Habit to build: Run /compact after completing each distinct task, before starting the next one.

Using Opus for tasks that Sonnet handles just as well

Claude comes in several versions, or models, each with different capabilities and price points:

Model	Best for	Relative cost
Claude Sonnet	Everyday tasks: writing, editing, coding, analysis, Q&A	Standard
Claude Opus	Genuinely complex reasoning: multi-step strategy, intricate logic, difficult problems	Significantly higher

Many people default to using Opus for everything, assuming that the most powerful model always produces the best results. In practice, Sonnet handles the vast majority of everyday business tasks just as well as Opus — and at a fraction of the cost.

Rule of thumb: Start with Sonnet. Only switch to Opus if Sonnet’s output genuinely falls short for a specific task.

Practical daily habits that keep costs low

Adopting these habits will keep your Claude Code spending predictable and reasonable without any sacrifice in the quality of work you get back:

Habit	What to do	Why it helps
Start focused	Open new sessions for new tasks rather than letting one session grow indefinitely	Keeps context lean from the beginning
Be specific	Name the exact files or sections you want Claude to work on	Avoids unnecessary file reads
Use `/compact` regularly	Run it after each completed task	Compresses history, reduces ongoing token overhead
Check `/cost` often	Glance at it at natural pause points	Keeps you aware before spending escalates
Cache your `CLAUDE.md`	Add the cache control comment at the top	Cuts the cost of repeated context by up to 90%
Default to Sonnet	Only switch to Opus when you genuinely need it	Avoids paying a premium for everyday tasks
Summarise, do not paste	Instead of pasting entire documents, summarise the relevant parts in your message	Reduces input tokens significantly

A real cost comparison

Here is a side-by-side illustration of the same working afternoon, handled two different ways.

Session with poor habits

What happened	Token impact
Started Claude, pasted in 8 large project files “just in case”	Very high input cost upfront
Used Claude Opus throughout	Premium rate on every exchange
Session ran for 3 hours without `/compact`	History grew heavier with each message
Never checked `/cost`	No awareness of escalating spend
`CLAUDE.md` loaded fresh every message, no caching	Full cost on repeated context each time
Estimated session total	~$18–24

Session with good habits

What happened	Token impact
Opened session, loaded only the two files actually needed	Lean input from the start
Used Claude Sonnet for all tasks	Standard rate throughout
Ran `/compact` after completing each task	History stayed compressed
Checked `/cost` twice during the session	Stayed aware and adjusted approach once
`CLAUDE.md` cached from the first message onward	Up to 90% saving on repeated context
Estimated session total	~$2–4

The work produced in both sessions was equivalent. The difference was entirely in approach.

Practical Exercise

a. Open your terminal, navigate to your claude-practice folder (type cd claude-practice and press Enter), and start a Claude Code session by typing claude and pressing Enter. Ask Claude a few questions about anything — keep it conversational for a minute or two. Then type /cost and press Enter. Note what you see. This is your baseline: a sense of how quickly tokens accumulate in ordinary conversation.

b. Create a simple CLAUDE.md file in your practice folder. Open a text editor, paste in the following content, and save it as CLAUDE.md inside claude-practice:

<!-- cache_control: ephemeral -->

# Practice Session Context

This is my Claude Code practice folder. I am learning how Claude Code works.
Please keep all explanations simple and avoid technical jargon.
Always tell me what you are about to do before you do it.

Close your current session by typing /exit, then restart Claude Code by typing claude again. Claude will now automatically read and cache your CLAUDE.md on every session — you do not need to do anything further.

c. Open .claude/settings.json in your text editor (create it if it does not exist — see Lesson 5 for the location). Add a monthly cost limit that feels sensible for your learning phase:

{
  "costLimit": {
    "monthly": 10.00,
    "currency": "USD"
  }
}

Save the file. You now have a safety net in place. Start a new Claude Code session and ask it to confirm it can see your CLAUDE.md file. Then run /cost again to see the starting token cost with your cached context in place.

Common problems and how to fix them

`/cost` shows a much higher number than I expected

What is happening: The most common culprits are a very long session history, large files that were read earlier in the session, or the use of the Opus model.

How to fix it: Type /compact to compress the session history immediately. Check whether you asked Claude to read any large files you did not actually need. If you are using Opus, consider whether Sonnet would have handled the task equally well. Going forward, check /cost more frequently so you catch escalation earlier.

Prompt caching does not seem to be working — costs are still high

What is happening: Either the cache control comment is not at the very top of your CLAUDE.md file (even a blank line above it can prevent it from being recognised), or the file is being modified between sessions, which invalidates the cached version.

How to fix it: Open your CLAUDE.md and confirm that  is the absolute first line — no spaces, no blank lines above it. Save the file and start a fresh session. If you frequently edit your CLAUDE.md, the cache will reset each time you change it, which is expected behaviour.

I cannot find `.claude/settings.json` to set a spending limit

What is happening: The .claude folder is hidden (its name starts with a dot, which hides it on Mac and Linux by default), and the settings.json file may not exist yet if you have not created it.

How to fix it: On a Mac, press Command + Shift + . in Finder to reveal hidden files. On Windows, enable “Hidden items” in File Explorer’s View menu. If the file does not exist yet, create it manually: inside your project folder, create a folder named .claude, then inside that folder create a file named settings.json and paste in the JSON structure shown earlier in this lesson.

Claude keeps using Opus even though I want to use Sonnet

What is happening: The model selection may be set in your configuration, or you may have specified Opus in a previous instruction that is still in the session history.

How to fix it: Check your .claude/settings.json for a "model" entry and change it to "claude-sonnet-4-20250514" (or whichever the current Sonnet version is). You can also tell Claude directly during a session: “Please switch to Sonnet for the rest of this conversation.”

What you have learned in this lesson

Tokens are the units Claude uses to measure and charge for text — both what you send in and what Claude writes back
Input tokens (what you send) cost less than output tokens (what Claude generates)
The /cost command shows your spend at any point during a session — check it regularly
Prompt caching works like a photocopier: the first read costs full price, but repeated identical context can cost up to 90% less
Adding  to the top of your CLAUDE.md enables caching for your project context
A spending limit in .claude/settings.json acts as a safety net against runaway costs
The three biggest causes of unexpectedly high bills are: unnecessary large file reads, long sessions without /compact, and using Opus where Sonnet would do just as well
Simple daily habits — being specific, using /compact, caching context, defaulting to Sonnet — can reduce a typical session’s cost by 80–90% with no loss in quality

Course Index

Table of Contents

What this lesson is about

Core concept: The taxi meter

What tokens are

Input tokens vs output tokens

How to check your spend mid-session

How to estimate cost before running a large task

Prompt caching: the photocopier principle

What prompt caching is

How to enable prompt caching on your CLAUDE.md file

How to set a spending limit

The most common causes of unexpectedly high bills

Reading large files unnecessarily

Very long sessions without `/compact`

Using Opus for tasks that Sonnet handles just as well

Practical daily habits that keep costs low

A real cost comparison

Session with poor habits

Session with good habits

Practical Exercise

Common problems and how to fix them

`/cost` shows a much higher number than I expected

Prompt caching does not seem to be working — costs are still high

I cannot find `.claude/settings.json` to set a spending limit

Claude keeps using Opus even though I want to use Sonnet

What you have learned in this lesson

You missed

Claude: 33 Obsidian Rules To Cut Your Costs By 80%

Claude Cowork: A Comprehensive Guide for Business Owners in 2026

How to Start an AI Consulting Business in 2026: The Complete Guide

Top 10 Emerging AI Technologies Shaping 2026

Lesson 6: Cost & Token Management

Table of Contents

What this lesson is about

Core concept: The taxi meter

What tokens are

Input tokens vs output tokens

How to check your spend mid-session

How to estimate cost before running a large task

Prompt caching: the photocopier principle

What prompt caching is

How to enable prompt caching on your CLAUDE.md file

How to set a spending limit

The most common causes of unexpectedly high bills

Reading large files unnecessarily

Very long sessions without /compact

Using Opus for tasks that Sonnet handles just as well

Practical daily habits that keep costs low

A real cost comparison

Session with poor habits

Session with good habits

Practical Exercise

Common problems and how to fix them

/cost shows a much higher number than I expected

Prompt caching does not seem to be working — costs are still high

I cannot find .claude/settings.json to set a spending limit

Claude keeps using Opus even though I want to use Sonnet

What you have learned in this lesson

You missed

Claude: 33 Obsidian Rules To Cut Your Costs By 80%

Claude Cowork: A Comprehensive Guide for Business Owners in 2026

How to Start an AI Consulting Business in 2026: The Complete Guide

Top 10 Emerging AI Technologies Shaping 2026

Very long sessions without `/compact`

`/cost` shows a much higher number than I expected

I cannot find `.claude/settings.json` to set a spending limit