How to Audit an Unfamiliar Codebase in One Afternoon
Audit an unfamiliar codebase fast by treating the first afternoon like an incident review, not a refactor. Your goal is not mastery. Your goal is a defensible map: what the system does, how it is wired, where changes will hurt, and what signals tell you the code is aging badly. That means a narrow codebase audit workflow, a tech audit checklist you can repeat, and a legacy code review that produces notes someone else can act on. If you want a concrete model for the tooling side, repowise’s auto-generated wiki, dependency graph, and git intelligence sit on top of the same basic ideas this post uses. See repowise's architecture for the shape of that system, and live examples if you want to compare outputs against a real repo.
What "good enough" looks like in 4 hours
You are done with an afternoon audit when you can answer six questions without guessing:
- What is the product boundary?
- Where does control enter the system?
- What are the main modules or services?
- Which parts are risky to change?
- Which parts are already unhealthy?
- What should I verify before I ship a change?
That is enough to explore new codebase fast. It is not enough to design a rewrite, merge a large migration, or sign off on architecture quality. For that, you need more history and more runtime data.
A good four-hour pass produces three artifacts:
- a one-page architecture summary,
- a ranked risk list,
- a short set of follow-up questions.
If you have those, you can move with discipline instead of vibes.
The core rule: do not read files in order. Read by lens. Start broad, then follow evidence. That is the only way a legacy code review stays bounded.
The four lenses
I use four lenses for every audit unfamiliar codebase task: Surface, Shape, Risk, and Health. Each lens gives you a different answer. Read them in that order.
| Lens | Question | Output |
|---|---|---|
| Surface | What is this thing? | Product summary, entry points, runtime model |
| Shape | How is it organized? | Module map, dependency edges, key boundaries |
| Risk | What will hurt to touch? | Hotspots, owners, co-change clusters, fan-in/fan-out |
| Health | What is rotting? | Dead code, churn, test gaps, low-confidence areas |
Repowise’s MCP server exposes this separation directly through tools like get_overview(), get_dependency_path(), get_risk(), and get_dead_code(). That is the right mental model even if you never use repowise. The tools matter less than the questions they answer. If you want to see the outputs rather than the theory, the FastAPI dependency graph demo and hotspot analysis demo show the same workflow on a real codebase.
Surface — what is this thing
Start with the repo root. Read only these files first:
README.mdpyproject.toml,package.json,go.mod, or the primary build filedocker-compose.yml,Dockerfile, Helm charts, or deployment manifests- one obvious entrypoint, such as
main.py,app.ts,cmd/server/main.go, orsrc/index.ts
You are trying to answer:
- Is this a library, service, CLI, worker, monorepo, or plugin?
- What is the runtime?
- What external systems does it touch?
- What is the happy path?
Make a tiny note for each file: purpose, entrypoint status, external dependencies, and whether it is safe to ignore for now.
If the repo has generated docs, architecture notes, or package-level docs, read those before code. They compress the surface area. That is why documentation-first code intelligence is useful. Repowise’s auto-generated wiki is built around that idea: file, module, and symbol docs with freshness and confidence scoring. See auto-generated docs for FastAPI for a concrete example of what that looks like in practice.
Shape — how is it organized
Now build the module map. Do not chase every folder. Chase boundaries.
Look for:
- top-level packages or services,
- shared libraries,
- adapters to databases, queues, third-party APIs, or UI frameworks,
- obvious domain modules,
- tests that mirror production structure.
Your goal is to identify the “spine” of the codebase. In most unfamiliar systems, 20% of the files explain 80% of the behavior.
A fast shape pass usually includes:
- list top-level directories,
- identify entrypoints and bootstrap code,
- follow imports from entrypoint to core modules,
- note where business logic lives,
- mark infrastructure edges separately.
If the codebase has a dependency graph, use it. Sourcegraph’s documentation calls out code navigation and cross-repository search as first-class features, and its Code Insights can track codebase state over time across many repositories. (sourcegraph.com)
Risk — what will hurt to touch
Risk is not the same as size. The files that hurt most are usually the ones with the most churn, the most incoming edges, or the most repeated edits across adjacent commits.
In a legacy code review, I rank risk by four signals:
- high churn,
- high complexity,
- many dependents,
- repeated co-changes.
That is the fastest path to a real tech audit checklist. If a file is touched often, sits near critical paths, and changes with many unrelated files, it deserves attention even if it looks small.
CodeScene defines hotspots around the same idea: code with frequent change and structural complexity deserves more scrutiny. (community.codescene.com)
A good risk note looks like this:
payments/ledger.py: high churn, many dependents, test coverage unknownauth/middleware.ts: central path, difficult to mock, likely production impactmigrations/: low frequency, high blast radius, check deployment process
Do not confuse “I understand this file” with “this file is safe.”
Health — what is rotting
Health is the part most people skip because it feels less urgent. It is also the part that saves the most time later.
Health signals I look for:
- dead files and unused exports,
- modules with no tests,
- hotspots with low confidence or stale docs,
- large packages with no clear owner,
- circular dependencies,
- architecture comments that disagree with the code.
If you can identify dead code early, you avoid wasting your first afternoon on things that do not matter. If you can spot an untested hotspot, you know where a small change can become a long week.
This is where code health layers matter. Repowise added a fifth intelligence layer focused on per-file health, module rollups, untested-hotspot detection, refactoring targets ranked by impact per effort, and declining-health trend alerts. That is exactly the kind of signal you want when you are trying to decide whether a repo needs maintenance or a rescue. If you want to see the output format, look at the ownership map for Starlette and the FastAPI hotspot analysis demo.
Hour-by-hour workflow
This is the workflow I use when I have one afternoon and a codebase I do not know.
Hour 1: orient
Start with the repo root and find the entrypoint.
Checklist:
- open the README,
- identify the build and run command,
- find the main process entry,
- note external services,
- sketch the runtime surface in plain English.
Write one sentence: “This repo appears to be a ___ that does ___ for ___.” If you cannot write that sentence, stop and keep reading the root files until you can.
Hour 2: map the spine
Now follow imports and calls from the entrypoint into the core.
Checklist:
- find the top 5 imported internal modules,
- trace the request or job path,
- mark where data is validated,
- mark where side effects happen,
- mark where errors are handled.
A dependency graph is faster than manual grep once the repo crosses a few dozen modules. Repowise’s architecture page explains how the dependency graph and wiki layers fit together, and the graph demo shows the result on FastAPI in a way you can inspect visually. That kind of graph is especially useful when a repo spans multiple languages or has shared packages reused across services.
Hour 3: rank risk
Now switch from structure to change cost.
Checklist:
- inspect git history for frequently edited files,
- identify files with many dependents,
- find modules that appear in many co-change sets,
- look for low-test, high-churn intersections,
- mark one or two “do not touch casually” areas.
This is where git intelligence saves time. Churn alone is noisy. Churn plus fan-in plus repeated co-change is a better signal. That is the difference between “old code” and “code that will hurt me if I edit it.”
Hour 4: check health and write the memo
End by writing the memo, not by reading more code.
Checklist:
- list dead or unreachable areas,
- note obvious duplication,
- record test gaps,
- record documentation gaps,
- call out health trends,
- write open questions.
At the end of hour four, your output should be short enough to fit in a PR description or a ticket comment.
Tools — repowise, IDE, git
You can do this audit with a plain editor, git, and a terminal. That is enough. But the right tooling changes the shape of the work.
| Tool | Best use | What it gives you |
|---|---|---|
| IDE | Read code, jump to symbols, inspect references | Local context |
git log, git blame, git diff | Churn, authorship, co-change | Change history |
| Dependency graph | Structural map | Boundaries and paths |
| Auto-generated docs | Fast comprehension | File and symbol summaries |
| Health/risk tools | Prioritization | What to fix first |
MCP matters here because it standardizes how tools and AI clients talk to a codebase. The official MCP docs describe it as an open-source standard for connecting AI applications to external systems, with a client-server model and versioned spec. (modelcontextprotocol.io) That is why repowise’s MCP server can expose the same repo through structured tools instead of ad hoc prompts. If you want the open-source angle, repowise is AGPL-3.0 licensed, which means it is designed around copyleft terms for network server software. (gnu.org)
For a practical comparison, Sourcegraph focuses on code search, navigation, deep search, and code insights across repositories, which makes it a useful reference point for this category of tooling. (sourcegraph.com)
What to write down
An audit is useless if the notes are vague. I keep a strict template.
1. One-sentence system summary
Example:
This is a Python API service that ingests events, normalizes them, and writes results to Postgres and S3.
2. Entry points
List exact files and commands:
src/main.pymake devuvicorn app:appcmd/worker/main.go
3. Critical paths
Write the three flows that matter most.
Example:
- request validation
- persistence
- billing or notification dispatch
4. Risk list
Rank by probable blast radius, not by emotion.
Example:
- auth middleware
- payment adapter
- migration runner
5. Health list
Track rot, not style.
Example:
- no tests under
billing/ - dead exports in
shared/ - cyclic dependency between
api/anddomain/
6. Questions for the owner
This is the most valuable part.
Examples:
- Which module is safest to change first?
- Which path has the most production incidents?
- What is intentionally left out of tests?
- Which directories are stale and which are active?
When to bail vs proceed
You should bail on a full audit and switch to targeted investigation when one of these is true:
- the repo has too many services to map in one afternoon,
- the entrypoint fans out into several subsystems immediately,
- the code has no tests and no docs,
- the business logic depends on external systems you cannot run locally,
- the repo is mostly generated code or vendored dependencies.
In those cases, do not pretend you have a complete view. Write a partial audit and ask for the missing artifact: logs, architecture doc, ownership list, or staging access.
Proceed with the same afternoon workflow when:
- the repo has one main product boundary,
- the runtime path is easy to trace,
- the dependency graph is shallow enough to sketch,
- the risky files are obvious after history review.
The decision is simple: if you can name the top three risk zones and the top three health issues, proceed. If you cannot, stop and narrow scope.
A minimal tech audit checklist
Use this as the shortest possible checklist for an unfamiliar repository:
- Identify the entrypoint.
- Read the README and build files.
- Map the module spine.
- Trace the happy path.
- Inspect history for churn.
- Rank hotspots by impact.
- Find dead code.
- Note test gaps.
- Write open questions.
- Decide whether the repo is safe to change.
That is enough to audit unfamiliar codebase behavior without drifting into endless reading.
FAQ
How do I audit an unfamiliar codebase without reading every file?
Start at the entrypoint, then follow imports, then inspect history. You want the shape of the system, not a catalog of every file. A dependency graph and a compact doc layer can cut the search space sharply. (sourcegraph.com)
What is the fastest way to explore a new codebase fast?
Read the README, build file, and entrypoint first. Then follow the hottest path through the code, not the alphabet. If the repo has code search or code navigation, use that to jump between references instead of opening files one by one. (sourcegraph.com)
What should be in a tech audit checklist for legacy code review?
Include entrypoints, runtime dependencies, churn, hotspots, dead code, test gaps, and ownership. The checklist should tell you where change is dangerous, not just where code exists.
How do I know which files are risky?
Look for churn, complexity, dependents, and co-change patterns together. A file that changes often and sits on a central path is a better risk candidate than a large file that rarely moves. Hotspot analysis is built around that idea. (community.codescene.com)
When should I stop and ask for help?
Stop when you cannot explain the system in one sentence, when the entrypoint is unclear, or when the likely blast radius is too large for a one-afternoon pass. Ask for architecture notes, test fixtures, or ownership data before you keep digging.
Can MCP help with codebase audits?
Yes. MCP is designed as a standard way for AI clients to connect to external tools and data sources, and the current spec defines the protocol, lifecycle, and authorization model. That makes it a good fit for exposing repo intelligence in a structured way. (modelcontextprotocol.io)
Four-Lens Audit Map
One-Afternoon Workflow
Codebase Audit Checklist


