AI Coach

What this lesson is about

This lesson explains prompt caching — a feature that dramatically reduces the cost of using Claude when your sessions involve large, repeated content like your CLAUDE.md file. You will learn how caching works, how to enable it, and how to structure your prompts to take full advantage of it. The savings are real and measurable, and the setup takes less than five minutes.

The photocopier — understanding what caching is

Imagine you need to make 50 copies of a 10-page document. The photocopier has to scan the original once — that scan takes time and uses resources. But once the scan is stored in the machine’s memory, every additional copy is quick and cheap because the machine already has everything it needs. You are not paying for 50 scans. You are paying for one scan and 49 prints.

Claude works the same way with content it has seen before at the start of a conversation. The first time Claude reads your CLAUDE.md file, your system instructions, or any large block of static content, it processes every token at full cost — that is the scan. If the same content appears at the start of your next conversation, and caching is enabled, Claude recognises it and retrieves it from its cache rather than re-processing it from scratch. That retrieval costs approximately 10% of the normal input price.

Token — a token is roughly equivalent to three-quarters of a word. When Claude reads or generates text, it counts tokens rather than words. Your CLAUDE.md file, your system prompts, the files you load into a session — all of these are measured and charged in tokens.

Cache — a cache (pronounced “cash”) is a stored copy of something that has already been processed, kept ready for fast, cheap reuse. You have encountered caches before: your web browser caches images from websites you visit often so pages load faster on return visits. Claude’s prompt cache works on the same principle.

What caching costs — and what it saves

Cached vs non-cached token pricing

Anthropic prices cached input tokens at roughly 10% of the standard input token rate. The exact figures vary by model, but the ratio is consistent: caching reduces your input processing cost by approximately 90% for content that hits the cache.

Here is what that means in practical terms, expressed in ZAR. These figures use approximate exchange rates and Anthropic’s published pricing as a reference — check the current pricing page for the exact USD rate and convert at the prevailing rate.

Token type	Approximate cost per 1 000 tokens (ZAR)	Notes
Standard input tokens	R0.28	Full processing cost, every time
Cached input tokens	R0.03	~10% of standard — paid on cache hits
Cache write tokens	R0.42	Slightly higher cost on the first read that creates the cache
Output tokens	R1.12	Not affected by caching — outputs always cost full price

The cache write cost is higher than standard input for the first call — think of it as the machine doing the initial scan. But from the second call onwards, every hit saves 90%.

The worked example — 50 sessions per day

Let us make this concrete. Suppose your CLAUDE.md file is 2 000 tokens long. You use Claude Code 50 times per day — each session loads CLAUDE.md automatically as part of the system context.

Without caching:

	Calculation	Cost (ZAR)
Tokens per session	2 000	—
Sessions per day	50	—
Daily input tokens	100 000	—
Daily cost at R0.28 / 1 000 tokens	100 000 ÷ 1 000 × R0.28	R28.00
Monthly cost (30 days)	R28.00 × 30	R840.00

With caching enabled:

	Calculation	Cost (ZAR)
Cache write (first session only)	2 000 ÷ 1 000 × R0.42	R0.84
Cache hits (remaining 49 sessions/day)	2 000 × 49 ÷ 1 000 × R0.03	R2.94
Daily cost	R0.84 + R2.94	R3.78
Monthly cost (30 days)	R3.78 × 30	R113.40

Monthly saving with caching enabled: approximately R726.60

That is a reduction of over 86% on the cost of loading your CLAUDE.md alone — before any changes to prompt structure or other optimisations. At higher usage volumes, the saving scales proportionally.

What gets cached and what does not

Not all content is eligible for caching. Claude caches content that is static — content that stays the same across multiple sessions. It does not cache content that changes with each call.

Content type	Cached?	Why
`CLAUDE.md` file contents	Yes	Same content every session
System prompt / instructions	Yes	Set once, repeated every call
Large reference documents loaded at session start	Yes	Static, does not change
Your question or task for this session	No	Different every time
File contents that change between sessions	No	Cannot guarantee a match
Claude’s responses	No	Outputs are never cached

The rule of thumb: static at the top, dynamic at the bottom. Content that stays the same goes first. Content that changes — your actual question, the specific file you are working on today, the variable inputs — goes last.

How to structure prompts for maximum cache hits

The problem with unstructured prompts

Many people write prompts the way they speak — context first, then background, then the actual question. That is natural conversation. It is poor cache strategy, because the dynamic content (the question) appears mixed in with or before the static content (the instructions), breaking the cache match.

Here is an example of a poorly structured prompt — the kind that will not benefit from caching:

Please help me with something. I run a food delivery business
with four brands on Mr D and UberEats.

[CLAUDE.md contents — 2 000 tokens of static context]

Today I need you to review this week's order report and
identify which brand had the highest growth.

The problem: the introductory sentence appears before the static content. Even a small change to that opening line breaks the cache match, because Claude compares the prompt from the very beginning.

The correctly structured prompt

Here is the same request, restructured for caching:

[CLAUDE.md contents — 2 000 tokens of static context]

Task for this session:
Review this week's order report and identify which brand
had the highest growth. Compare against last week.

[Order report pasted here]

The static content — CLAUDE.md — appears first, unchanged, every single session. The dynamic content — today’s specific task and today’s data — appears at the end. Claude reads the opening block, recognises it from the cache, retrieves it cheaply, then processes only the new material at full cost.

The golden rule

Put everything that stays the same at the very top of every prompt.
Put everything that changes at the very bottom.

This single habit, applied consistently, is responsible for most of the savings in the worked example above.

How to enable prompt caching on your CLAUDE.md

Enabling caching for your CLAUDE.md file requires adding a single setting to your Claude Code configuration. Open your CLAUDE.md file and add the following line at the very top, before any other content:

---
cache: true
---

This block is called YAML front matter — a short configuration section wrapped in triple dashes that Claude Code reads before processing the rest of the file. Setting cache: true tells Claude to treat this file’s content as cacheable.

If you are using the Claude API directly in a script, caching is enabled by adding a cache_control parameter to the content block. Claude can write this for you — ask it to add prompt caching to any script you are working with.

How to confirm caching is working

Once caching is enabled, you can verify it is functioning by checking your session cost output. In the Claude Code terminal, run:

/cost

This command displays a breakdown of token usage and cost for the current session. When caching is working correctly, you will see separate line items for cache write tokens and cache read tokens alongside the standard input and output figures.

What to look for:

What you see	What it means
`cache_write_tokens: 2 000`	Claude scanned and stored this content — you paid the write rate
`cache_read_tokens: 2 000`	Claude retrieved from cache — you paid ~10% of standard rate
Only `input_tokens` with no cache lines	Caching is not enabled or the cache did not match

If you see only input tokens and no cache lines after your second or third session, check that the cache: true front matter is correctly placed at the top of CLAUDE.md, and that the file has not been modified between sessions (any change invalidates the cache for that file).

Other optimisation habits

Caching is the highest-impact single change you can make. These habits compound the saving further:

Keep your CLAUDE.md under 500 words

A leaner CLAUDE.md means fewer tokens on every call — cached or not. Review yours regularly and remove anything that is out of date, redundant, or rarely relevant. Aim for 400–500 words of dense, precise context rather than 1 500 words of loosely written background.

Write shorter prompts

Every word in your prompt costs tokens. Before sending a long prompt, ask yourself: what is the minimum context Claude needs to answer this well? Cut the rest. You are not being rude — you are being efficient. Claude does not need preamble or social niceties. It needs clear, specific instructions.

Load fewer files per session

Every file you load into a session adds to your input token count. Load only the files that are directly relevant to today’s task. If you find yourself habitually loading the same five files every session, consider consolidating the most important content into CLAUDE.md directly.

Run `/compact` regularly

As a session extends, the conversation history grows and accumulates tokens. The /compact command compresses that history into a concise summary, discarding the detailed back-and-forth while preserving the important context. Run it whenever a session starts to feel slow or you notice your /cost output climbing. You covered this in Module 2 — the habit becomes even more valuable once you are tracking costs closely.

The single most impactful optimisation

If you do only one thing after reading this lesson, do this:

Enable caching on your CLAUDE.md file and keep it under 500 words.

Everything else — shorter prompts, fewer file loads, regular /compact — adds marginal improvements on top of a well-configured foundation. But a cached CLAUDE.md reduces the cost of your single most repeated token expense by 90%, from the very next session onwards.

The return on the five minutes it takes to add cache: true and trim your file is measurable in rands every day.

Practical Exercise

a. Open your CLAUDE.md file and count the words. If you do not have an easy way to count them, paste the content into a Claude conversation and ask: “How many words and tokens is this?” Note both figures. If the file is over 500 words, identify which sections could be shortened or removed without losing important context.

b. Add the YAML front matter block to the very top of your CLAUDE.md:

---
cache: true
---

Save the file. Open a new Claude Code session, run a typical task, then type /cost and review the output. Look for cache_write_tokens in the breakdown — this confirms the file was scanned and stored.

c. Open a third Claude Code session, run another task, and type /cost again. This time, look for cache_read_tokens instead of cache_write_tokens. If you see it, caching is working. Calculate your estimated monthly saving using the formula from the worked example: multiply your cache_read_tokens per session by your typical number of daily sessions, then by 30, and compare the cost at the standard rate versus the cached rate.

Common problems and how to fix them

The `/cost` output shows no cache lines after enabling caching

The most common cause is that the YAML front matter is not positioned correctly. It must be the absolute first thing in the file — no blank lines, no text, nothing before the opening ---. Open CLAUDE.md, check that the block looks exactly like this at the top, and save again:

---
cache: true
---

If the formatting is correct but cache lines still do not appear, try closing the session completely and starting a fresh one.

The cache worked once but stopped appearing

The cache is invalidated whenever the cached content changes. If you edited CLAUDE.md between sessions — even a single word — Claude will treat it as new content and write a fresh cache entry. This is not a problem; the first session after any edit will show cache_write_tokens, and subsequent sessions will show cache_read_tokens again. If you are seeing this pattern after edits you did not intend to make, check whether any automated tool or sync process is modifying your file.

The file is cached but costs are still higher than expected

Caching reduces input token cost — it does not affect output token cost. If your sessions involve long responses from Claude, output tokens will remain your primary expense. The optimisation for output cost is different: write tighter task briefs that require less explanation, and use structured output formats (tables, numbered lists) that Claude can produce efficiently.

YAML front matter is breaking the formatting of my CLAUDE.md

Some markdown editors display YAML front matter as a visible table rather than treating it as metadata. This is a display issue only — Claude Code reads it correctly regardless of how your editor renders it. If it bothers you, switch to a renderer or editor that supports front matter natively. Obsidian, for example, displays front matter cleanly as a collapsed metadata block.

Caching is not available in my current plan or setup

Prompt caching is available on Claude’s API and in Claude Code. If you are using the basic claude.ai web interface without a Pro plan or API access, some features may not be accessible. Check your current plan at claude.ai/settings and refer to Anthropic’s current pricing page for feature availability by tier.

What you have learned in this lesson

Prompt caching stores static content after its first use so that subsequent sessions retrieve it cheaply — like a photocopier scanning once and printing many times
Cached input tokens cost approximately 10% of standard input tokens — a 90% reduction on repeated static content
A 2 000-token CLAUDE.md used 50 times per day costs approximately R840 per month without caching and approximately R113 per month with caching — a saving of around R727 per month
Static content (your CLAUDE.md, system instructions, large reference files) is eligible for caching; dynamic content (your task, variable inputs, session-specific data) is not
The golden rule of cache-optimised prompts: static content at the top, dynamic content at the bottom
Caching is enabled by adding cache: true YAML front matter to the top of your CLAUDE.md
You can verify caching is working by running /cost and looking for cache_write_tokens on the first session and cache_read_tokens on subsequent ones
Supporting optimisations — shorter prompts, fewer file loads, regular /compact — compound the savings from caching
The single most impactful action is to enable caching on CLAUDE.md and keep the file under 500 words

Course Index

Table of Contents

What this lesson is about

The photocopier — understanding what caching is

What caching costs — and what it saves

Cached vs non-cached token pricing

The worked example — 50 sessions per day

What gets cached and what does not

How to structure prompts for maximum cache hits

The problem with unstructured prompts

The correctly structured prompt

The golden rule

How to enable prompt caching on your CLAUDE.md

How to confirm caching is working

Other optimisation habits

Keep your CLAUDE.md under 500 words

Write shorter prompts

Load fewer files per session

Run `/compact` regularly

The single most impactful optimisation

Practical Exercise

Common problems and how to fix them

The `/cost` output shows no cache lines after enabling caching

The cache worked once but stopped appearing

The file is cached but costs are still higher than expected

YAML front matter is breaking the formatting of my CLAUDE.md

Caching is not available in my current plan or setup

What you have learned in this lesson

You missed

Claude: 33 Obsidian Rules To Cut Your Costs By 80%

Claude Cowork: A Comprehensive Guide for Business Owners in 2026

How to Start an AI Consulting Business in 2026: The Complete Guide

Top 10 Emerging AI Technologies Shaping 2026

Lesson 27: Prompt Caching & Optimisation

Table of Contents

What this lesson is about

The photocopier — understanding what caching is

What caching costs — and what it saves

Cached vs non-cached token pricing

The worked example — 50 sessions per day

What gets cached and what does not

How to structure prompts for maximum cache hits

The problem with unstructured prompts

The correctly structured prompt

The golden rule

How to enable prompt caching on your CLAUDE.md

How to confirm caching is working

Other optimisation habits

Keep your CLAUDE.md under 500 words

Write shorter prompts

Load fewer files per session

Run /compact regularly

The single most impactful optimisation

Practical Exercise

Common problems and how to fix them

The /cost output shows no cache lines after enabling caching

The cache worked once but stopped appearing

The file is cached but costs are still higher than expected

YAML front matter is breaking the formatting of my CLAUDE.md

Caching is not available in my current plan or setup

What you have learned in this lesson

You missed

Claude: 33 Obsidian Rules To Cut Your Costs By 80%

Claude Cowork: A Comprehensive Guide for Business Owners in 2026

How to Start an AI Consulting Business in 2026: The Complete Guide

Top 10 Emerging AI Technologies Shaping 2026

Run `/compact` regularly

The `/cost` output shows no cache lines after enabling caching