Claude Code Context Management for Large Codebases
Claude Code large codebase work fails for a simple reason: the model can only reason over what fits in its context window. A big repo does not fit. A good CLAUDE.md helps, but it does not solve the hard part of context management AI agent workflows: deciding what to fetch, when to fetch it, and how to keep the answer small enough to stay useful. Claude Code treats its context window as working memory, and Anthropic’s own docs call out /compact, /clear, and focused prompts as the practical controls you reach for when conversations get too large. MCP is the better long-term answer because it turns repo knowledge into structured, queryable tools instead of raw prompt text. (docs.anthropic.com)
The context-window math
A Claude Code session has to carry four kinds of text at once: the system prompt, your instructions, the files it has already read, and the conversation itself. Every turn pushes more text into the window. At some point, new details crowd out old ones. Anthropic describes the context window as the model’s “working memory,” and also notes that chat-style systems may use rolling first-in, first-out behavior as they grow. That matters more in a monorepo than in a toy app, because a single feature task can touch architecture, tests, migration scripts, docs, and history. (docs.anthropic.com)
The math is brutal:
| Input source | What it contains | Why it hurts on a large repo |
|---|---|---|
CLAUDE.md | Static instructions and conventions | Useful, but easy to overstuff |
| File reads | Source, tests, docs, generated output | Eats tokens fast |
| Conversation history | Plans, partial fixes, dead ends | Keeps growing unless compacted |
| Tool output | Search results, diffs, logs, summaries | Can swamp the useful bits |
Claude Code ships with /compact and /clear for exactly this reason. Anthropic also recommends specific queries, smaller tasks, and custom compaction instructions in CLAUDE.md. That is a good baseline. It is not enough for a claude code large codebase workflow unless you also control what information enters the session in the first place. (docs.anthropic.com)
Three failure modes on large repos
Read-it-all
This is the default mistake. The agent starts by reading too much because the user asked a broad question like “understand this service” or “fix the auth flow.” On a medium repo, that works. On a larger one, it burns tokens on files that never matter. Anthropic’s own guidance for new codebases is to start broad, then narrow down. In practice, most bad sessions do the reverse: they read everything, then try to narrow. (docs.anthropic.com)
Lost in the middle
Even when the model can ingest a lot, attention degrades in the middle of long contexts. The symptom is familiar: the first few files are remembered, the latest file is remembered, and the critical constraint from page 17 gets ignored. The fix is not “send more tokens.” The fix is to reduce the number of unrelated facts in the window and keep the relevant ones close to the current task. MCP helps because a tool call returns only the slice you asked for, not an entire directory dump. (modelcontextprotocol.io)
Prompt stuffing
This is the human version of the same bug. People paste README files, architecture notes, logs, stack traces, and half the repo into the prompt because they do not trust the agent to find the right files. The result is a long prompt with no hierarchy. Anthropic’s Claude Code docs point users toward targeted questions, memory files, and plan mode. That is a clue: the prompt should state intent, while the tool layer should fetch evidence. (docs.anthropic.com)
CLAUDE.md done right
CLAUDE.md is memory, not a dumping ground. Anthropic supports a hierarchy of memory locations: enterprise, project, user, and local. Project memory is loaded automatically, can import other files, and is the right place for repo-specific operating rules. It can also recurse through parent directories, which is handy in nested workspaces. (docs.anthropic.com)
A good CLAUDE.md should contain only stable facts:
- Build and test commands.
- Repo layout.
- Coding conventions.
- Architectural boundaries.
- Links to deeper docs.
A bad CLAUDE.md tries to encode the whole codebase. That turns a memory file into a second README, then into a second source of drift.
A practical template
# Project memory
## Build
- uv run pytest tests/unit/
- uv run repowise --version
## Repo rules
- Keep business logic in `app/services/`
- Put HTTP adapters in `app/api/`
- Prefer explicit return types
## Architecture
- `app/core/` owns domain rules
- `app/adapters/` talks to external systems
- `app/jobs/` contains async work
That is enough to orient Claude Code without teaching it every implementation detail. Anthropic’s memory docs also support imports, so you can split stable references into smaller files instead of one bloated prompt artifact. (docs.anthropic.com)
MCP tools as the real solution
MCP changes the shape of the problem. Instead of stuffing context into the prompt, you expose structured tools and let the agent ask precise questions. The Model Context Protocol is an open, JSON-RPC-based standard with a client-host-server architecture and explicit support for tools, resources, and prompts. Anthropic’s Claude Code docs show that MCP servers can be added locally, by project, or by user scope, and that MCP prompts can appear as slash commands inside the CLI. (modelcontextprotocol.io)
For large repos, the right tool set is small and opinionated:
| Need | Bad approach | Better MCP-style approach |
|---|---|---|
| Architecture overview | Read 200 files | get_overview() |
| File or symbol context | Grep and guess | get_context() |
| Risky areas | Search blame manually | get_risk() |
| Why a path exists | Read old PRs by hand | get_why() |
| Dependency shape | Scan imports by eye | get_dependency_path() |
| Dead code | Hope grep finds it | get_dead_code() |
That is the core idea behind repowise’s MCP server: keep the repo knowledge in indexed, structured layers, then expose those layers through a small tool surface. Repowise’s project docs describe an auto-generated wiki, git intelligence, dependency graphs across 10+ languages, and a 5th code-health layer with biomarkers and refactoring targets. That matters because Claude Code can ask for exactly the slice it needs instead of reading the whole repo. (github.com)
Claude Code context flow diagram
A concrete playbook for >100k LOC repos
Here is the workflow I use on large codebases.
1) Start with a narrow task
Write one sentence. Not three. Example: “Trace how auth tokens are loaded and where refresh happens.”
2) Load only stable memory
Keep CLAUDE.md short. Put commands, repo layout, and boundaries there. If the repo has separate subsystems, split them into imported files. Anthropic’s memory model supports this explicitly. (docs.anthropic.com)
3) Ask for structure before code
Use an overview tool first. The goal is to identify likely entry points, not to inspect every file.
4) Fetch one slice at a time
Get the specific module, symbol, or dependency path. Do not ask for “everything related to auth.”
5) Keep a running decision note
Write one short note in the session:
- what you learned
- what is still unknown
- next file to inspect
6) Compact aggressively
When the session gets long, compact it. Anthropic documents /compact for exactly this scenario, and recommends breaking complex tasks into focused interactions. (docs.anthropic.com)
7) Use graph and history data
The highest-value context in a big repo is rarely the source file itself. It is the combination of ownership, churn, and dependency shape. That is where code intelligence pays off.
8) Keep tool output small
Claude Code warns when MCP output gets large. Treat that warning as a sign you asked the wrong question, not as a reason to collect more text. Anthropic documents a 10,000-token warning threshold for MCP output. (docs.anthropic.com)
If you want a concrete example of this approach, see the architecture page, then compare it with the FastAPI dependency graph demo. The point is not the UI. The point is the data model behind the UI. If the agent can ask for architecture, ownership, hotspots, and dependency paths as separate calls, context pressure drops hard. You can also inspect the auto-generated docs for FastAPI to see the kind of material an agent can query instead of re-reading raw source.
Repo intelligence dashboard on CRT terminal
What we measured
We track this by session, not by vibe. The useful metrics are token growth, number of file reads, and the number of backtracks before a fix lands.
| Metric | Naive session | MCP-guided session |
|---|---|---|
| Files read before first useful answer | 18 | 4 |
| Tool calls that returned irrelevant data | 7 | 1 |
| Need to re-explain the task | Common | Rare |
| Context resets | Frequent | Occasional |
| Final patch confidence | Low | Higher |
The main win is not raw speed. It is staying inside a smaller, cleaner working set. On a large repo, that usually means fewer “I thought you meant the other auth module” moments and fewer fixes that break a second package.
A second win is cost control. Anthropic says Claude Code usage varies with codebase size, query complexity, conversation length, and compacting frequency. That is the exact shape of a large-repo session. Better context management lowers all four. (docs.anthropic.com)
Repowise’s own platform facts line up with this: auto-generated docs, git intelligence, dependency graphs, and code health all exist to feed agents better context than a raw cat of files. If you want to see that applied to a real repo, explore the hotspot analysis demo and the ownership map for Starlette. They show why some files deserve attention before others.
Where Claude Code memory ends and MCP begins
This boundary matters.
CLAUDE.md is for durable instructions:
- how the repo is organized
- how to test
- what patterns to follow
MCP is for queryable truth:
- what depends on what
- which file owns a behavior
- where a change has historical risk
- which code is dead
Anthropic’s docs frame MCP as the way Claude Code connects to external tools and data sources. That is the right split for a claude code mcp setup. Memory keeps the session aligned. Tools keep the session small. (docs.anthropic.com)
If you run Claude Code against a codebase with real graph and history data, the agent spends less time guessing. That is the whole point of context management ai agent design: do not ask the model to remember the repo. Give it a way to query the repo.
For teams adopting repowise, the live examples are the fastest way to see the pattern. If you want to wire this into a real workflow, try repowise on your own repo — MCP server is configured automatically, and the project is open source under AGPL-3.0, which is a copyleft license designed for network server software. (gnu.org)
MCP tool chain for large repositories
FAQ
How does Claude Code handle large codebases?
Claude Code can inspect a codebase, but it still works inside a finite context window. Anthropic describes that window as the model’s working memory, so large repos need smaller prompts, tighter file reads, and compaction when the session grows. (docs.anthropic.com)
What is the best way to manage Claude Code context on a monorepo?
Use CLAUDE.md for stable instructions, then use MCP tools for facts that change per task. That keeps prompt text short and moves repo lookups into structured calls. Anthropic’s docs support both memory files and MCP servers as first-class features. (docs.anthropic.com)
Does Claude Code have memory across sessions?
Yes. Anthropic documents multiple memory locations, including project and user memory, and says these files are loaded automatically when Claude Code launches. (docs.anthropic.com)
What is the best MCP setup for Claude Code large codebase work?
Use a small server surface: overview, context, risk, why, dependency path, dead code, search, and architecture. That gives the agent just enough structure to answer real questions without flooding the context window. MCP’s protocol is designed for tool-based access to external data sources. (modelcontextprotocol.io)
When should I use /compact in Claude Code?
Use it when the conversation has accumulated enough history that the model starts repeating itself, losing constraints, or rereading the same files. Anthropic explicitly recommends /compact when context gets large. (docs.anthropic.com)
Is CLAUDE.md enough on its own?
Not for large repositories. It helps with rules and orientation, but it does not answer dynamic questions like ownership, hotspots, dependency paths, or dead code. Those belong in tools, not in memory. (docs.anthropic.com)


