How to Audit a Codebase You've Never Seen Before (in 30 Minutes)

repowise team··9 min read
codebase auditunderstand unfamiliar codebasecode review new repoassess code qualitydue diligence code

You’ve just been handed a repository with 250,000 lines of code across 1,400 files. You have a meeting with the CTO in an hour to discuss the risks of acquiring this company, or perhaps you’ve just started a new role and need to submit your first PR by tomorrow. The traditional approach—opening main.ts and following the imports—is a recipe for cognitive overload. To perform a high-quality codebase audit under pressure, you need a systematic framework that moves from the macro to the micro.

Understanding an unfamiliar codebase isn't about reading every line; it’s about identifying the "gravity centers" where complexity, risk, and value intersect. By leveraging automated intelligence, you can compress weeks of manual discovery into a 30-minute deep dive.

When You Need to Understand a Codebase Fast

The need to understand an unfamiliar codebase usually arises in high-stakes environments where time is the scarcest resource.

Due Diligence (Acquisitions, Investments)

In M&A, you aren't just buying features; you’re buying technical debt. A rapid audit helps you identify if the "proprietary AI engine" is actually a spaghetti-mess of unmaintained Python scripts or a robust, well-architected system. You need to assess code quality to determine the true cost of integration.

Joining a New Team

The "onboarding period" is shrinking. Senior engineers are expected to contribute within days, not months. Fast-tracking your understanding of the module boundaries and internal conventions is the difference between a smooth start and a week of "where is this defined?" questions on Slack.

Consulting Engagement

As a consultant, your hourly rate is high, and your ramp-up time should be low. Being able to visualize a client’s architecture and point out their highest-risk modules within the first hour builds immediate trust and authority.

Open Source Evaluation

Before introducing a new dependency into your stack, you must evaluate its health. Is the "bus factor" concentrated in a single developer? Are there circular dependencies that will make debugging a nightmare? A quick audit ensures you aren't importing a liability.

The 30-Minute Codebase Audit Framework

This framework relies on the principle of "Progressive Disclosure." We start with the 10,000-foot view and only zoom in when the data suggests a point of interest. To execute this, we will use repowise, an open-source codebase intelligence platform designed to map and document repositories automatically.


Minute 0-5: Architecture Overview

The first five minutes are about orientation. You need to know what the codebase is before you can judge how it’s built.

Run repowise init

The first step is to index the repository. Repowise scans the file tree, parses imports across 10+ languages, and begins mining the git history.

# Initialize repowise in the target repository
npx repowise@latest init

During this phase, the engine builds a directed dependency graph and generates a "freshness score" for existing documentation. If the project lacks docs, it uses LLMs (via OpenAI, Anthropic, or local Ollama) to generate a comprehensive wiki for every module.

Read get_overview() Output

Using the Model Context Protocol (MCP), you can query the codebase intelligence directly. The get_overview() tool provides the architectural summary. You aren't looking for code here; you're looking for the Tech Stack, Entry Points, and Module Map.

Understand the Module Map

A healthy codebase has clear boundaries. Look for:

  • Core Logic: Where the business rules live.
  • Adapters/Interfaces: How the system talks to the outside world (APIs, DBs).
  • Infrastructure: Boilerplate and setup code.

If get_overview() shows a "Big Ball of Mud" where every folder imports from every other folder, you’ve already found your first red flag.

Codebase Architecture MapCodebase Architecture Map


Minute 5-10: Tech Stack and Dependencies

Now that you have the map, look at the materials used to build the structure.

Language Breakdown

A codebase with five different languages for a simple CRUD app is a maintenance nightmare. Repowise provides a distribution map. See how this looks in practice with the FastAPI dependency graph demo.

Framework Choices

Are they using industry standards (React, NestJS, FastAPI) or "homegrown" frameworks? Homegrown frameworks often imply a high "learning tax" for new hires and a lack of community support for security patches.

Dependency Graph Shape

The "shape" of the dependency graph tells a story.

  • Layered: Dependencies flow in one direction (Good).
  • Circular: Module A depends on B, which depends on A (Bad).
  • God Modules: A single file that is imported by 90% of the codebase (Risk).

Repowise uses PageRank and Community Detection algorithms on the import graph to find the most influential files. If a "God Module" has low test coverage, you've identified a critical failure point.


Minute 10-15: Code Health Signals

This is where we move from "what is it?" to "how healthy is it?" We use Git Intelligence to mine the "social" life of the code.

Hotspot Analysis: Where's the Risk?

A "Hotspot" is a file with high Complexity (cyclomatic complexity) and high Churn (frequent changes). If a file is complex but never changes, it’s stable. If it changes often but is simple, it’s low risk. But a complex file that changes every week is where bugs are born. Check out the hotspot analysis demo to see how these are visualized.

Dead Code Volume: How Much Cruft?

Using the get_dead_code() tool, we identify unreachable files and unused exports. In a due diligence code review, finding 20% dead code suggests the team has stopped pruning the garden, which often correlates with high technical debt elsewhere.

Bus Factor: Knowledge Distribution

The "Bus Factor" is the number of developers who would need to be hit by a bus before the project stalls. Repowise mines git ownership to see if 90% of the core logic was written by one person who left six months ago. You can view the ownership map for Starlette to see how knowledge distribution is mapped.


Minute 15-20: Documentation and Decisions

A codebase without context is just a pile of syntax. In this phase, we assess the "Why" behind the "What."

Documentation Freshness

Most wikis are lies because they are out of date. Repowise assigns a Freshness Score to documentation by comparing the last edit date of the doc to the last edit date of the code it describes. If the code has changed 50 times since the doc was updated, the doc is useless.

Architectural Decision Records (ADRs)

If the team hasn't kept ADRs, we use get_why() to search for "decision patterns" in commit messages and comments. We want to know why they chose NoSQL over Postgres, or why they implemented a custom auth logic.

Convention Consistency

Does the codebase follow a consistent pattern? Are they using functional components in one place and class components in another? Inconsistency is a leading indicator of a "feature factory" culture where speed was prioritized over quality.

Code Health & Risk DashboardCode Health & Risk Dashboard


Minute 20-25: Deep Dive on Critical Paths

Now that we’ve identified the hotspots and the architecture, we zoom into specific files.

get_context() on Entry Points

Using the get_context() MCP tool, we pull the LLM-generated summary, ownership history, and recent decisions for the main entry points (e.g., app/main.py or src/index.ts). This gives us a functional understanding of how data enters the system.

get_risk() on Hottest Files

We run get_risk() on the top 3 hotspots identified in Minute 10. This tool provides a summary of:

  • Co-change partners: "When you change this file, you almost always have to change these other 4 files."
  • Dependents: How many modules will break if this file is refactored?
  • Historical bugs: Does the commit history mention "fix" or "bug" frequently for this file?

Dependency Path Analysis

If you’re worried about how a sensitive module (like payments or auth) interacts with the rest of the system, use get_dependency_path().

# How does the API talk to the Database?
repowise tool get_dependency_path --from "api/v1/user" --to "db/models/user"

This reveals the "hidden" coupling that isn't obvious from looking at a single folder. To understand how the platform facilitates this, you can learn about repowise's architecture.


Minute 25-30: Summary and Recommendations

The final five minutes are for synthesis. An audit is useless if it doesn't result in actionable insights.

The Audit Report Template

Your 30-minute audit should produce a brief report with the following sections:

  1. Architecture Score: (1-10) Is it modular or monolithic?
  2. Top 3 Risks: The specific files/modules that will cause the most trouble.
  3. Knowledge Gaps: Areas where the bus factor is dangerously low.
  4. Tech Debt Estimate: High/Medium/Low based on dead code and hotspots.

Red Flags to Look For

  • Circular Dependencies: Indicates a lack of clear abstraction.
  • High Churn in Complex Files: Indicates a "bug factory."
  • Low Freshness Docs: Indicates the team has abandoned maintenance.
  • Single-Contributor Core: High risk of project death if that person leaves.

Green Flags That Indicate Health

  • High PageRank for Utilities: Common logic is centralized and reused.
  • Low Dead Code: The team actively refactors.
  • Consistent Conventions: High discipline in the engineering culture.
  • Automated Docs: The team uses tools like Repowise to maintain visibility.

MCP Audit Tool RegistryMCP Audit Tool Registry

Key Takeaways

Performing a codebase audit on an unfamiliar repository doesn't have to be an exercise in frustration. By moving from a high-level architecture overview to targeted risk analysis, you can form a sophisticated opinion of a project's health in the time it takes to grab a coffee.

  1. Automate the boring stuff: Use repowise to generate the map and the docs so you can focus on analysis.
  2. Follow the heat: Use hotspot analysis to find where the bugs live.
  3. Check the "social" health: Bus factor and ownership are just as important as the code itself.
  4. Use MCP tools: Don't just read code; query it. Use get_risk() and get_context() to get immediate answers.

Whether you are performing due diligence code reviews or just trying to understand an unfamiliar codebase for a new job, having a repeatable framework ensures you never miss the forest for the trees.

To see these principles in action, you can see all 8 MCP tools in action on a real-world repository.


FAQ

Q: Can I run repowise on private repositories? A: Yes. Repowise is open-source (AGPL-3.0) and designed to be self-hosted. Your code stays on your infrastructure.

Q: Which LLMs are supported? A: You can use OpenAI, Anthropic, Google Gemini, or local models via Ollama for a completely air-gapped audit.

Q: How many languages does it support? A: Currently, Repowise supports Python, TypeScript, JavaScript, Go, Rust, Java, C++, C, Ruby, and Kotlin.

Q: Does it work with AI coding assistants? A: Yes, the MCP server allows Claude Code, Cursor, and Cline to "see" the entire codebase through the lens of the Repowise intelligence engine.

Try repowise on your repo

One command indexes your codebase.