Generate claude.md from a real repo, not a blank page
A new hire opens Claude Code, points it at a repo, and gets the usual performance art: file re-reads, confident guesses about ownership, and a lot of context that is technically relevant and operationally useless. If you want to generate claude.md that changes that behavior on day one, start from the repo itself, not from a template. The file should tell Claude Code what this system is, who tends to touch it, and what recently changed enough to make old habits dangerous.
Day one claude.md should answer three questions: what this repo is, who owns it, and what changed recently
The blank-start onboarding problem is simple enough to describe and annoying enough to repeat: a new hire lands in Claude Code with no repo memory, so the model re-reads the same files, misses obvious ownership cues, and treats stale patterns as current truth.
That is why a useful claude.md is not a generic onboarding note. It is a repo-specific starter brief assembled from architecture, ownership, and recent decisions so Claude Code stops guessing on day one.
The minimum viable version should answer six things, in this order:
- repo purpose
- entry points
- key modules
- ownership cues
- recent changes
- decision notes
That is enough to orient an agent without turning the file into a second codebase. claude.md is a working brief for Claude Code, not a full architecture doc or a dump of every folder. If it starts to look like a wiki export, it has already gone too far.
A good mental model is “what would I want a pair-programming partner to know before editing anything?” Not “what could I possibly say about this repository.”
BLANK START VS REPO BRIEF
Start with get_overview, then use get_context to pin down the modules, entry points, and owners
If you are using Claude Code with MCP, the first unfamiliar-repo call should be get_overview. Not because it is polite, but because it gives you the map before you start zooming into streets.
get_overview should pull the architecture summary, module map, entry points, git health, and communities. That is the shortest path to answering “where does work begin?” and “what clusters of code move together?”
From there, use get_context on the files and symbols that matter most. The point is not to inspect everything; it is to identify the parts of the codebase that deserve mention in claude.md.
Use get_context with options like ownership, freshness, callers, callees, and community. That combination tells you more than a file path ever will. A module with high freshness and many callers is probably worth naming. A leaf utility with no ownership signal probably is not.
This is where graph-aware repo map matters. A graph-aware repo map is not trivia; it is the difference between “the API layer” and “the three files that actually orchestrate requests.” Repowise’s graph intelligence gives you file and symbol structure, call resolution, communities, and execution-flow tracing, which is exactly the kind of repo memory Claude Code does not have on its own.
A practical rule: if get_overview cannot explain the repo in one screen, do not write claude.md yet. Expand the map first. If get_context keeps surfacing the same modules from different angles, those are the ones that belong in the brief.
OVERVIEW TO CONTEXT FLOW
Use recent git history and decision records to explain why the repo looks the way it does
Claude Code is especially bad at one thing: assuming the current shape of the code is the natural shape of the code.
That is why recent git history belongs in claude.md. Not as a changelog, and not as a timeline of every merge. Use recent commits, co-change patterns, and significant commit messages as evidence for current behavior. If a module has been rewritten twice in the last month, the brief should say so. If two files keep changing together, that is an ownership and coupling signal, not a coincidence.
This is where ownership and co-change signals earns its keep. Repowise’s git intelligence turns commit history into hotspots, ownership percentages, co-change pairs, bus factor, and meaningful commit messages. That is much closer to how a human maintainer thinks than “here are the last 500 commits.”
Then use get_why to surface decisions tied to paths or modules. This is the part most onboarding docs miss. A file can look odd for perfectly good reasons. A refactor, an auth boundary, a protocol change, or a deleted helper may all be encoded in decisions that are invisible if you only read the code.
A short decision note is often enough:
- “This module was split during the auth refactor; do not reintroduce the old shared helper.”
- “The new request path exists because the old one violated the provider contract.”
- “This directory is intentionally duplicated until the migration finishes.”
That is repo-aware context synthesis, not a chronological changelog. The goal is to explain why the repo looks the way it does so Claude Code does not confidently reconstruct a pattern that the team already rejected.
decision history tied to code is the right shape here: path-based decisions, no-arg health dashboards, and a direct line from architectural intent to the files the agent is about to touch.
The MCP call sequence that produces a usable claude.md in one pass
Here is the order I would actually use for generate claude.md in a new repo.
-
get_overview- Write down the architecture summary, module map, entry points, and communities.
- If the repo has obvious top-level boundaries, those become the first bullets in claude.md.
-
get_contexton top modules and entry points- Ask for ownership, freshness, callers, callees, and community.
- Pull only the files and symbols that are central to how work starts.
-
get_risk- Use this when you need hotspots, reviewers, blast radius, or test gaps.
- The output belongs in the “be careful here” part of the brief, not in the architecture section.
-
get_why- Pull decisions tied to the files or paths you already identified.
- Use it to explain refactors, API shifts, and removed helpers.
-
search_codebaseorget_answerwhen confidence is low- If the wiki answer is weak, search more.
- If confidence is high enough, use the cited answer and stop.
The stop condition matters. Do not keep expanding scope just because more data exists. Start writing the brief when you can answer: what the repo is, where work starts, who tends to own the moving parts, and which recent decisions changed the rules.
Tie each call to a specific output in claude.md:
| MCP call | Signal extracted | claude.md line to write |
|---|---|---|
| get_overview | entry points, communities, architecture summary | “Start in app/server.py and workers/*; the API layer owns request orchestration.” |
| get_context | ownership, freshness, callers, callees | “auth/ and billing/ are the most touched modules; ask the platform team before changing them.” |
| get_risk | hotspots, reviewers, blast radius | “payments/ is a hotspot with broad dependents and thin tests.” |
| get_why | decisions tied to paths | “This split exists because the auth refactor removed the shared helper.” |
| get_answer | cited repo summary | “The docs say this service owns webhook ingestion, not outbound retries.” |
A short sample claude.md starter brief might read like this:
- Repo purpose: API service for inbound requests and async workers.
- Start here:
app/server.py,workers/, and the main request router. - Likely owners: platform for transport, product team for business logic, infra for deployment paths.
- Recent changes: auth handling was split from the shared helper; old assumptions about request shape are stale.
- Risky areas: payment and webhook paths have broad dependents and should be changed with care.
- Decision notes: the team intentionally kept the worker boundary because the synchronous path was causing timeouts.
- Ask first: if you are unsure, check the module community and recent decision notes before editing.
That is enough for Claude Code to behave like it has read the room.
Worked example: turning a real repo scan into a claude.md starter brief
Here is the transformation I want: raw MCP output on the left, short onboarding language on the right.
| MCP call | Signal extracted | claude.md line to write |
|---|---|---|
| get_overview | API entry point, worker queue, two major communities | “Primary flow starts in the API layer and fans out to workers; most changes land in those two communities.” |
| get_context | app/server.py has high caller count and strong freshness | “Start in app/server.py; it is the request orchestration boundary.” |
| get_context | payments/processor.py has ownership and many callees | “Payments logic is concentrated in payments/processor.py; check ownership before changing it.” |
| get_risk | hotspot score high, reviewers concentrated, blast radius broad | “Payments is a hotspot with broad blast radius; expect review from the same small group.” |
| get_why | auth refactor split the shared helper | “Do not reintroduce the old shared auth helper; the split was deliberate.” |
| search_codebase | wiki says webhook retries are owned by infra | “Webhook retries are infrastructure-owned, not app-owned.” |
The point is not to preserve raw data. The point is to compress it into what a new hire needs when they are about to ask Claude Code for help.
If you want to see how this looks in a larger multi-repo setting, the same pattern shows up in workspace-level CLAUDE.md: one workspace brief, then repo-specific briefs underneath it. Repowise’s workspace mode is useful because many onboarding mistakes come from not knowing which repo owns which boundary in the first place.
RAW SIGNAL TO BRIEF
Regenerate claude.md after merges, refactors, or ownership shifts — not on every commit
We got this wrong initially by treating claude.md like a file that should always be fresh in the same way a build artifact is fresh. That was a mistake. It created churn, and churn made the brief less trustworthy.
Use PostToolUse or a post-commit change as the trigger for stale context detection. That gives you a cheap way to know when the brief might be lying.
Regenerate after the events that actually change how Claude Code should think:
- merge
- refactor
- API contract change
- ownership change
Those are the moments when old guidance becomes actively misleading. A small patch in a leaf file usually does not justify rewriting the brief. A merge that re-routes a request path probably does.
The difference between incremental refresh and full rewrite is straightforward:
- incremental refresh: update the changed module, decision note, or ownership cue
- full rewrite: rebuild the top-level structure when boundaries, entry points, or owners shift
This is also where automatic context injection belongs conceptually. Repowise’s PreToolUse hook intercepts Grep and Glob, then injects related files and symbols before the agent goes off in the wrong direction. That is not the same thing as claude.md, but it is the same design instinct: give the model context before it has invented its own.
The practical rule is simple. If the repo changed in a way that would alter Claude Code’s first move, refresh the brief. If not, leave it alone.
Leave out trivia, implementation noise, and speculative guidance so Claude does not hallucinate
The hardest part of generate claude.md is not adding signal. It is refusing to add the wrong kind.
Exclude low-signal file lists, exhaustive dependency dumps, and stale guesses about ownership. Those look thorough and behave like noise. If a person would skip past the line on first read, Claude Code probably will too.
Do not copy raw wiki pages into the brief. Do not paste long commit logs. Do not turn the file into a second source of truth for facts that already live elsewhere. The brief should point to the important shape of the repo, not reprint it.
The principle I use is blunt: if a detail will change the next time Claude Code touches the repo, it probably does not belong in claude.md unless it explains a current constraint.
That is especially true for ownership. A stale ownership guess is worse than no ownership note at all, because it sounds actionable. If the signal is weak, say so. If it changes often, say how to verify it.
When the repo is large enough that a single brief is not enough, keep the same standard and move up a level. A workspace brief can hold the cross-repo shape, while repo-level claude.md files stay short and current. That is where workspace-level CLAUDE.md and a repo-specific brief work together instead of competing.
And if you want a sanity check that the brief is doing real work, compare it to the token waste you avoid by not making Claude Code rediscover the repo on every task. Repowise’s token-efficiency benchmark on pallets/flask showed 36% lower cost, 49% fewer tool calls, 19% faster wall time, and 27× fewer pooled tokens versus the naive approach, with parity answer quality. That is not a claude.md benchmark, but it is the same anti-pattern: too much reading, not enough useful context.
FAQ
How do I generate claude.md from a real repo with Claude Code?
Start with MCP, not a template. Use get_overview first, then get_context for the modules and entry points that matter, then get_risk and get_why for hotspots and decisions. Write the brief only after you can explain the repo’s shape, the likely owners, and the recent changes that changed behavior.
What should go into claude.md for a new codebase?
Keep it short and opinionated: repo purpose, entry points, key modules, ownership cues, recent changes, and decision notes. If it reads like an architecture wiki or a file inventory, it is too big.
Which MCP calls should I run first to build a repo brief?
get_overview first, always. Then get_context on top modules and entry points, followed by get_risk for hotspots and reviewers, and get_why for decision history tied to paths. Use search_codebase or get_answer only when the signal is still fuzzy.
When should I regenerate claude.md after a refactor or merge?
Regenerate after merges, refactors, API contract changes, or ownership shifts. Use PostToolUse or a post-commit trigger to detect stale context, then choose incremental refresh or a full rewrite based on whether the repo’s boundaries or owners changed.
What should not go into claude.md?
Leave out exhaustive file lists, dependency dumps, raw wiki pages, long commit logs, and speculative ownership guesses. If the detail is likely to change before the next edit, it probably belongs elsewhere unless it explains a current constraint.


