Lesson 6: Cost & Token Management

What this lesson is about

Every conversation you have with Claude Code has a cost attached to it — and without understanding how that cost is calculated, it is easy to run up a surprisingly large bill without realising why. This lesson explains exactly how Claude charges for its work, which habits quietly inflate your costs, and which simple techniques can reduce your spending by as much as 90% without any loss in quality.


Core concept: The taxi meter

Picture yourself getting into a taxi. The moment the driver pulls away from the kerb, the meter starts running. It does not matter whether you are making polite small talk or giving detailed turn-by-turn directions — the meter runs the same way for every word exchanged. A five-minute chat about the weather costs you just as much per minute as a detailed discussion about the fastest route.

Claude Code works in exactly the same way, except instead of measuring distance and time, it measures tokens — small units that represent words, parts of words, and punctuation. Every token that goes into Claude (your instructions, your files, your context) costs money. Every token that comes out of Claude (its responses, its generated code, its explanations) also costs money.

The meter runs whether you are asking a quick question or processing an entire codebase. Understanding this is the first step to spending wisely.


What tokens are

A token is the basic unit Claude uses to read and write text. It is not exactly one word — it is more like one syllable or short word fragment. As a rough guide:

  • 1 token ≈ 4 characters of text
  • 100 tokens ≈ 75 words
  • A typical work email ≈ 200–400 tokens
  • A 10-page business document ≈ 4,000–6,000 tokens

You do not need to count tokens manually. What matters is developing an intuition for what makes conversations expensive versus cheap.


Input tokens vs output tokens

Not all tokens cost the same. Claude distinguishes between two types:

Token typeWhat it includesRelative cost
Input tokensEverything you send to Claude: your message, uploaded files, the entire conversation history, your CLAUDE.md fileLower cost per token
Output tokensEverything Claude writes back: responses, generated code, explanations, analysisHigher cost per token

What this means in practice:

Asking Claude to read a large file is relatively inexpensive. Asking Claude to rewrite that entire file from scratch is more expensive, because every word it generates is an output token.

This is why it is worth being precise with your instructions. Asking Claude to “fix the broken login function” will cost less than asking it to “rewrite the entire authentication system from scratch” — even if the end result would have been similar.


How to check your spend mid-session

At any point during a Claude Code session, you can type the following command directly into the conversation:

/cost

Claude will respond with a summary of how many tokens have been used and what that has cost so far in the current session. This is the equivalent of glancing at the taxi meter before you reach your destination — it lets you make informed decisions about whether to continue or wrap up.

Make it a habit to check /cost at natural pauses in your work: after completing a task, before starting a new large request, or any time a session has been running for more than 30 minutes.


How to estimate cost before running a large task

Before you ask Claude to do something substantial — process a large folder of files, analyse a lengthy document, or generate a detailed report — it is worth taking a moment to think about what that task will involve in token terms.

Ask yourself:

  • How much am I sending in? If you are about to paste in a large file or ask Claude to read many documents at once, that is a significant number of input tokens.
  • How much will Claude need to write back? A request for a one-sentence summary costs far less than a request for a full 20-page report.
  • Is this task genuinely necessary right now? Sometimes a large expensive task can be broken into smaller, cheaper steps that give you the same outcome.

You can also simply ask Claude directly before committing: “How expensive do you think this task will be? Can we do it in a cheaper way?” Claude will give you an honest assessment.


Prompt caching: the photocopier principle

What prompt caching is

Imagine you need to make 50 copies of a 10-page document. The photocopier has to scan the original pages once — that takes time and uses a small amount of resources. But once the original is scanned, producing copies two through fifty is fast and costs almost nothing, because the machine is just reprinting something it already has stored.

Claude uses the same principle with prompt caching. The first time Claude reads a large piece of context — your CLAUDE.md file, a lengthy system prompt, or a big block of background information — it processes it fully and that costs full price. But if the same content appears at the start of your next message, Claude recognises it from the previous scan and charges you at a dramatically reduced rate for that repeated portion.

The result: repeated context can cost up to 90% less than it would without caching. For people who work on long projects with Claude over multiple sessions, this is the single most powerful cost-saving technique available.

How to enable prompt caching on your CLAUDE.md file

Your CLAUDE.md file (a special file that Claude reads automatically at the start of every session to understand your project and preferences) is the perfect candidate for prompt caching, because it is sent to Claude every single time you open a session.

To enable caching for your CLAUDE.md, add the following line at the very top of the file:

<!-- cache_control: ephemeral -->

Your CLAUDE.md might then look like this:

<!-- cache_control: ephemeral -->

# My Project Context

## About this project
This is a small business website for Nkosi & Associates, a Durban-based
law firm. The site is built with plain HTML and CSS — no complex frameworks.

## My preferences
- Always explain what you are doing before you do it
- Use plain English, not technical jargon
- Ask before deleting anything

That single comment at the top tells Claude to cache the file’s contents after the first read. Every subsequent session that loads this file will cost a fraction of what it would otherwise.


How to set a spending limit

To prevent an unexpectedly large bill — perhaps from an accidental loop, a very long session you forgot to close, or a runaway automated task — you can set a hard spending limit in Claude Code’s configuration.

Open or create your .claude/settings.json file (refer back to Lesson 5 if you need a reminder of where this lives) and add a costLimit value:

{
  "costLimit": {
    "monthly": 50.00,
    "currency": "USD"
  }
}

What this does: Once Claude Code estimates your spending has reached $50 for the month, it will stop and alert you rather than continuing to accumulate charges. You can adjust the number to whatever suits your budget.

This is the equivalent of setting a data cap on your phone plan — a simple safety net that ensures a distracted afternoon cannot turn into an unpleasant surprise at the end of the month.


The most common causes of unexpectedly high bills

Reading large files unnecessarily

Every time Claude reads a file, those file contents are added to the input token count. If you ask Claude to “look through all my project files and tell me what needs updating,” and your project contains dozens of large files, you are paying for all of them — even the ones that turn out to be irrelevant.

Better approach: Be specific about which files you want Claude to look at. “Please read only invoice-template.html and suggest improvements” is far cheaper than “please read all my files.”

Very long sessions without /compact

Claude Code keeps the entire conversation history in what is called the context window — think of it as Claude’s short-term memory. As a session grows longer, that history grows larger, and every new message you send includes the full weight of everything said before it.

The /compact command (covered in depth in Lesson 4) summarises and compresses that history, dramatically reducing the token overhead on subsequent messages. Without it, a long working session can become progressively more expensive with each exchange, even if you are asking simple questions.

Habit to build: Run /compact after completing each distinct task, before starting the next one.

Using Opus for tasks that Sonnet handles just as well

Claude comes in several versions, or models, each with different capabilities and price points:

ModelBest forRelative cost
Claude SonnetEveryday tasks: writing, editing, coding, analysis, Q&AStandard
Claude OpusGenuinely complex reasoning: multi-step strategy, intricate logic, difficult problemsSignificantly higher

Many people default to using Opus for everything, assuming that the most powerful model always produces the best results. In practice, Sonnet handles the vast majority of everyday business tasks just as well as Opus — and at a fraction of the cost.

Rule of thumb: Start with Sonnet. Only switch to Opus if Sonnet’s output genuinely falls short for a specific task.


Practical daily habits that keep costs low

Adopting these habits will keep your Claude Code spending predictable and reasonable without any sacrifice in the quality of work you get back:

HabitWhat to doWhy it helps
Start focusedOpen new sessions for new tasks rather than letting one session grow indefinitelyKeeps context lean from the beginning
Be specificName the exact files or sections you want Claude to work onAvoids unnecessary file reads
Use /compact regularlyRun it after each completed taskCompresses history, reduces ongoing token overhead
Check /cost oftenGlance at it at natural pause pointsKeeps you aware before spending escalates
Cache your CLAUDE.mdAdd the cache control comment at the topCuts the cost of repeated context by up to 90%
Default to SonnetOnly switch to Opus when you genuinely need itAvoids paying a premium for everyday tasks
Summarise, do not pasteInstead of pasting entire documents, summarise the relevant parts in your messageReduces input tokens significantly

A real cost comparison

Here is a side-by-side illustration of the same working afternoon, handled two different ways.

Session with poor habits

What happenedToken impact
Started Claude, pasted in 8 large project files “just in case”Very high input cost upfront
Used Claude Opus throughoutPremium rate on every exchange
Session ran for 3 hours without /compactHistory grew heavier with each message
Never checked /costNo awareness of escalating spend
CLAUDE.md loaded fresh every message, no cachingFull cost on repeated context each time
Estimated session total~$18–24

Session with good habits

What happenedToken impact
Opened session, loaded only the two files actually neededLean input from the start
Used Claude Sonnet for all tasksStandard rate throughout
Ran /compact after completing each taskHistory stayed compressed
Checked /cost twice during the sessionStayed aware and adjusted approach once
CLAUDE.md cached from the first message onwardUp to 90% saving on repeated context
Estimated session total~$2–4

The work produced in both sessions was equivalent. The difference was entirely in approach.


Practical Exercise

a. Open your terminal, navigate to your claude-practice folder (type cd claude-practice and press Enter), and start a Claude Code session by typing claude and pressing Enter. Ask Claude a few questions about anything — keep it conversational for a minute or two. Then type /cost and press Enter. Note what you see. This is your baseline: a sense of how quickly tokens accumulate in ordinary conversation.

b. Create a simple CLAUDE.md file in your practice folder. Open a text editor, paste in the following content, and save it as CLAUDE.md inside claude-practice:

<!-- cache_control: ephemeral -->

# Practice Session Context

This is my Claude Code practice folder. I am learning how Claude Code works.
Please keep all explanations simple and avoid technical jargon.
Always tell me what you are about to do before you do it.

Close your current session by typing /exit, then restart Claude Code by typing claude again. Claude will now automatically read and cache your CLAUDE.md on every session — you do not need to do anything further.

c. Open .claude/settings.json in your text editor (create it if it does not exist — see Lesson 5 for the location). Add a monthly cost limit that feels sensible for your learning phase:

{
  "costLimit": {
    "monthly": 10.00,
    "currency": "USD"
  }
}

Save the file. You now have a safety net in place. Start a new Claude Code session and ask it to confirm it can see your CLAUDE.md file. Then run /cost again to see the starting token cost with your cached context in place.


Common problems and how to fix them

/cost shows a much higher number than I expected

What is happening: The most common culprits are a very long session history, large files that were read earlier in the session, or the use of the Opus model.

How to fix it: Type /compact to compress the session history immediately. Check whether you asked Claude to read any large files you did not actually need. If you are using Opus, consider whether Sonnet would have handled the task equally well. Going forward, check /cost more frequently so you catch escalation earlier.

Prompt caching does not seem to be working — costs are still high

What is happening: Either the cache control comment is not at the very top of your CLAUDE.md file (even a blank line above it can prevent it from being recognised), or the file is being modified between sessions, which invalidates the cached version.

How to fix it: Open your CLAUDE.md and confirm that <!-- cache_control: ephemeral --> is the absolute first line — no spaces, no blank lines above it. Save the file and start a fresh session. If you frequently edit your CLAUDE.md, the cache will reset each time you change it, which is expected behaviour.

I cannot find .claude/settings.json to set a spending limit

What is happening: The .claude folder is hidden (its name starts with a dot, which hides it on Mac and Linux by default), and the settings.json file may not exist yet if you have not created it.

How to fix it: On a Mac, press Command + Shift + . in Finder to reveal hidden files. On Windows, enable “Hidden items” in File Explorer’s View menu. If the file does not exist yet, create it manually: inside your project folder, create a folder named .claude, then inside that folder create a file named settings.json and paste in the JSON structure shown earlier in this lesson.

Claude keeps using Opus even though I want to use Sonnet

What is happening: The model selection may be set in your configuration, or you may have specified Opus in a previous instruction that is still in the session history.

How to fix it: Check your .claude/settings.json for a "model" entry and change it to "claude-sonnet-4-20250514" (or whichever the current Sonnet version is). You can also tell Claude directly during a session: “Please switch to Sonnet for the rest of this conversation.”


What you have learned in this lesson

  • Tokens are the units Claude uses to measure and charge for text — both what you send in and what Claude writes back
  • Input tokens (what you send) cost less than output tokens (what Claude generates)
  • The /cost command shows your spend at any point during a session — check it regularly
  • Prompt caching works like a photocopier: the first read costs full price, but repeated identical context can cost up to 90% less
  • Adding <!-- cache_control: ephemeral --> to the top of your CLAUDE.md enables caching for your project context
  • A spending limit in .claude/settings.json acts as a safety net against runaway costs
  • The three biggest causes of unexpectedly high bills are: unnecessary large file reads, long sessions without /compact, and using Opus where Sonnet would do just as well
  • Simple daily habits — being specific, using /compact, caching context, defaulting to Sonnet — can reduce a typical session’s cost by 80–90% with no loss in quality