Table of Contents
What this lesson is about
This lesson explains prompt caching — a feature that dramatically reduces the cost of using Claude when your sessions involve large, repeated content like your CLAUDE.md file. You will learn how caching works, how to enable it, and how to structure your prompts to take full advantage of it. The savings are real and measurable, and the setup takes less than five minutes.
The photocopier — understanding what caching is
Imagine you need to make 50 copies of a 10-page document. The photocopier has to scan the original once — that scan takes time and uses resources. But once the scan is stored in the machine’s memory, every additional copy is quick and cheap because the machine already has everything it needs. You are not paying for 50 scans. You are paying for one scan and 49 prints.
Claude works the same way with content it has seen before at the start of a conversation. The first time Claude reads your CLAUDE.md file, your system instructions, or any large block of static content, it processes every token at full cost — that is the scan. If the same content appears at the start of your next conversation, and caching is enabled, Claude recognises it and retrieves it from its cache rather than re-processing it from scratch. That retrieval costs approximately 10% of the normal input price.
Token — a token is roughly equivalent to three-quarters of a word. When Claude reads or generates text, it counts tokens rather than words. Your CLAUDE.md file, your system prompts, the files you load into a session — all of these are measured and charged in tokens.
Cache — a cache (pronounced “cash”) is a stored copy of something that has already been processed, kept ready for fast, cheap reuse. You have encountered caches before: your web browser caches images from websites you visit often so pages load faster on return visits. Claude’s prompt cache works on the same principle.
What caching costs — and what it saves
Cached vs non-cached token pricing
Anthropic prices cached input tokens at roughly 10% of the standard input token rate. The exact figures vary by model, but the ratio is consistent: caching reduces your input processing cost by approximately 90% for content that hits the cache.
Here is what that means in practical terms, expressed in ZAR. These figures use approximate exchange rates and Anthropic’s published pricing as a reference — check the current pricing page for the exact USD rate and convert at the prevailing rate.
| Token type | Approximate cost per 1 000 tokens (ZAR) | Notes |
|---|---|---|
| Standard input tokens | R0.28 | Full processing cost, every time |
| Cached input tokens | R0.03 | ~10% of standard — paid on cache hits |
| Cache write tokens | R0.42 | Slightly higher cost on the first read that creates the cache |
| Output tokens | R1.12 | Not affected by caching — outputs always cost full price |
The cache write cost is higher than standard input for the first call — think of it as the machine doing the initial scan. But from the second call onwards, every hit saves 90%.
The worked example — 50 sessions per day
Let us make this concrete. Suppose your CLAUDE.md file is 2 000 tokens long. You use Claude Code 50 times per day — each session loads CLAUDE.md automatically as part of the system context.
Without caching:
| Calculation | Cost (ZAR) | |
|---|---|---|
| Tokens per session | 2 000 | — |
| Sessions per day | 50 | — |
| Daily input tokens | 100 000 | — |
| Daily cost at R0.28 / 1 000 tokens | 100 000 ÷ 1 000 × R0.28 | R28.00 |
| Monthly cost (30 days) | R28.00 × 30 | R840.00 |
With caching enabled:
| Calculation | Cost (ZAR) | |
|---|---|---|
| Cache write (first session only) | 2 000 ÷ 1 000 × R0.42 | R0.84 |
| Cache hits (remaining 49 sessions/day) | 2 000 × 49 ÷ 1 000 × R0.03 | R2.94 |
| Daily cost | R0.84 + R2.94 | R3.78 |
| Monthly cost (30 days) | R3.78 × 30 | R113.40 |
Monthly saving with caching enabled: approximately R726.60
That is a reduction of over 86% on the cost of loading your CLAUDE.md alone — before any changes to prompt structure or other optimisations. At higher usage volumes, the saving scales proportionally.
What gets cached and what does not
Not all content is eligible for caching. Claude caches content that is static — content that stays the same across multiple sessions. It does not cache content that changes with each call.
| Content type | Cached? | Why |
|---|---|---|
CLAUDE.md file contents | Yes | Same content every session |
| System prompt / instructions | Yes | Set once, repeated every call |
| Large reference documents loaded at session start | Yes | Static, does not change |
| Your question or task for this session | No | Different every time |
| File contents that change between sessions | No | Cannot guarantee a match |
| Claude’s responses | No | Outputs are never cached |
The rule of thumb: static at the top, dynamic at the bottom. Content that stays the same goes first. Content that changes — your actual question, the specific file you are working on today, the variable inputs — goes last.
How to structure prompts for maximum cache hits
The problem with unstructured prompts
Many people write prompts the way they speak — context first, then background, then the actual question. That is natural conversation. It is poor cache strategy, because the dynamic content (the question) appears mixed in with or before the static content (the instructions), breaking the cache match.
Here is an example of a poorly structured prompt — the kind that will not benefit from caching:
Please help me with something. I run a food delivery business
with four brands on Mr D and UberEats.
[CLAUDE.md contents — 2 000 tokens of static context]
Today I need you to review this week's order report and
identify which brand had the highest growth.
The problem: the introductory sentence appears before the static content. Even a small change to that opening line breaks the cache match, because Claude compares the prompt from the very beginning.
The correctly structured prompt
Here is the same request, restructured for caching:
[CLAUDE.md contents — 2 000 tokens of static context]
Task for this session:
Review this week's order report and identify which brand
had the highest growth. Compare against last week.
[Order report pasted here]
The static content — CLAUDE.md — appears first, unchanged, every single session. The dynamic content — today’s specific task and today’s data — appears at the end. Claude reads the opening block, recognises it from the cache, retrieves it cheaply, then processes only the new material at full cost.
The golden rule
Put everything that stays the same at the very top of every prompt.
Put everything that changes at the very bottom.
This single habit, applied consistently, is responsible for most of the savings in the worked example above.
How to enable prompt caching on your CLAUDE.md
Enabling caching for your CLAUDE.md file requires adding a single setting to your Claude Code configuration. Open your CLAUDE.md file and add the following line at the very top, before any other content:
---
cache: true
---
This block is called YAML front matter — a short configuration section wrapped in triple dashes that Claude Code reads before processing the rest of the file. Setting cache: true tells Claude to treat this file’s content as cacheable.
If you are using the Claude API directly in a script, caching is enabled by adding a cache_control parameter to the content block. Claude can write this for you — ask it to add prompt caching to any script you are working with.
How to confirm caching is working
Once caching is enabled, you can verify it is functioning by checking your session cost output. In the Claude Code terminal, run:
/cost
This command displays a breakdown of token usage and cost for the current session. When caching is working correctly, you will see separate line items for cache write tokens and cache read tokens alongside the standard input and output figures.
What to look for:
| What you see | What it means |
|---|---|
cache_write_tokens: 2 000 | Claude scanned and stored this content — you paid the write rate |
cache_read_tokens: 2 000 | Claude retrieved from cache — you paid ~10% of standard rate |
Only input_tokens with no cache lines | Caching is not enabled or the cache did not match |
If you see only input tokens and no cache lines after your second or third session, check that the cache: true front matter is correctly placed at the top of CLAUDE.md, and that the file has not been modified between sessions (any change invalidates the cache for that file).
Other optimisation habits
Caching is the highest-impact single change you can make. These habits compound the saving further:
Keep your CLAUDE.md under 500 words
A leaner CLAUDE.md means fewer tokens on every call — cached or not. Review yours regularly and remove anything that is out of date, redundant, or rarely relevant. Aim for 400–500 words of dense, precise context rather than 1 500 words of loosely written background.
Write shorter prompts
Every word in your prompt costs tokens. Before sending a long prompt, ask yourself: what is the minimum context Claude needs to answer this well? Cut the rest. You are not being rude — you are being efficient. Claude does not need preamble or social niceties. It needs clear, specific instructions.
Load fewer files per session
Every file you load into a session adds to your input token count. Load only the files that are directly relevant to today’s task. If you find yourself habitually loading the same five files every session, consider consolidating the most important content into CLAUDE.md directly.
Run /compact regularly
As a session extends, the conversation history grows and accumulates tokens. The /compact command compresses that history into a concise summary, discarding the detailed back-and-forth while preserving the important context. Run it whenever a session starts to feel slow or you notice your /cost output climbing. You covered this in Module 2 — the habit becomes even more valuable once you are tracking costs closely.
The single most impactful optimisation
If you do only one thing after reading this lesson, do this:
Enable caching on your CLAUDE.md file and keep it under 500 words.
Everything else — shorter prompts, fewer file loads, regular /compact — adds marginal improvements on top of a well-configured foundation. But a cached CLAUDE.md reduces the cost of your single most repeated token expense by 90%, from the very next session onwards.
The return on the five minutes it takes to add cache: true and trim your file is measurable in rands every day.
Practical Exercise
a. Open your CLAUDE.md file and count the words. If you do not have an easy way to count them, paste the content into a Claude conversation and ask: “How many words and tokens is this?” Note both figures. If the file is over 500 words, identify which sections could be shortened or removed without losing important context.
b. Add the YAML front matter block to the very top of your CLAUDE.md:
---
cache: true
---
Save the file. Open a new Claude Code session, run a typical task, then type /cost and review the output. Look for cache_write_tokens in the breakdown — this confirms the file was scanned and stored.
c. Open a third Claude Code session, run another task, and type /cost again. This time, look for cache_read_tokens instead of cache_write_tokens. If you see it, caching is working. Calculate your estimated monthly saving using the formula from the worked example: multiply your cache_read_tokens per session by your typical number of daily sessions, then by 30, and compare the cost at the standard rate versus the cached rate.
Common problems and how to fix them
The /cost output shows no cache lines after enabling caching
The most common cause is that the YAML front matter is not positioned correctly. It must be the absolute first thing in the file — no blank lines, no text, nothing before the opening ---. Open CLAUDE.md, check that the block looks exactly like this at the top, and save again:
---
cache: true
---
If the formatting is correct but cache lines still do not appear, try closing the session completely and starting a fresh one.
The cache worked once but stopped appearing
The cache is invalidated whenever the cached content changes. If you edited CLAUDE.md between sessions — even a single word — Claude will treat it as new content and write a fresh cache entry. This is not a problem; the first session after any edit will show cache_write_tokens, and subsequent sessions will show cache_read_tokens again. If you are seeing this pattern after edits you did not intend to make, check whether any automated tool or sync process is modifying your file.
The file is cached but costs are still higher than expected
Caching reduces input token cost — it does not affect output token cost. If your sessions involve long responses from Claude, output tokens will remain your primary expense. The optimisation for output cost is different: write tighter task briefs that require less explanation, and use structured output formats (tables, numbered lists) that Claude can produce efficiently.
YAML front matter is breaking the formatting of my CLAUDE.md
Some markdown editors display YAML front matter as a visible table rather than treating it as metadata. This is a display issue only — Claude Code reads it correctly regardless of how your editor renders it. If it bothers you, switch to a renderer or editor that supports front matter natively. Obsidian, for example, displays front matter cleanly as a collapsed metadata block.
Caching is not available in my current plan or setup
Prompt caching is available on Claude’s API and in Claude Code. If you are using the basic claude.ai web interface without a Pro plan or API access, some features may not be accessible. Check your current plan at claude.ai/settings and refer to Anthropic’s current pricing page for feature availability by tier.
What you have learned in this lesson
- Prompt caching stores static content after its first use so that subsequent sessions retrieve it cheaply — like a photocopier scanning once and printing many times
- Cached input tokens cost approximately 10% of standard input tokens — a 90% reduction on repeated static content
- A 2 000-token
CLAUDE.mdused 50 times per day costs approximately R840 per month without caching and approximately R113 per month with caching — a saving of around R727 per month - Static content (your
CLAUDE.md, system instructions, large reference files) is eligible for caching; dynamic content (your task, variable inputs, session-specific data) is not - The golden rule of cache-optimised prompts: static content at the top, dynamic content at the bottom
- Caching is enabled by adding
cache: trueYAML front matter to the top of yourCLAUDE.md - You can verify caching is working by running
/costand looking forcache_write_tokenson the first session andcache_read_tokenson subsequent ones - Supporting optimisations — shorter prompts, fewer file loads, regular
/compact— compound the savings from caching - The single most impactful action is to enable caching on
CLAUDE.mdand keep the file under 500 words