What Happens When You Run repowise init on React (847 Files)

repowise team··10 min read
react codebase analysisreact architecturereact dependency graphlarge codebase analysisreact code documentation

Cloning a repository like React is a rite of passage for many engineers. It is the gold standard of modern UI library architecture, but it is also a formidable beast. With 847 files (excluding tests and documentation) and a decade of history, the codebase is a dense forest of reconcilers, schedulers, and fiber nodes. For a new contributor or an architect trying to understand how a change in react-reconciler might ripple through the system, the initial "onboarding" phase can take weeks.

This is the exact problem we built repowise to solve. When you run repowise init on a project of this scale, you aren't just generating a config file; you are triggering a deep-tissue scan of the entire codebase's nervous system.

In this post, we’ll look at the technical telemetry and architectural insights generated when we pointed repowise at the React source code. We’ll explore how automated react codebase analysis can transform a massive directory of files into a searchable, agent-ready intelligence layer.

Running repowise on a Real-World Codebase

Why React? (847 Files, Complex Architecture)

React is the perfect stress test for codebase intelligence. It isn't just "large"; it's structurally complex. It uses a monorepo pattern with dozens of packages, complex build-time injections, and cross-package dependencies that defy simple grep-based searching.

Understanding react architecture requires more than just reading the code; it requires understanding the intent behind the code—why certain optimizations exist in the Scheduler, or how the Fiber reconciler manages state updates without blocking the main thread.

What We're Looking For

When we run repowise init, we are looking to extract four specific dimensions of intelligence:

  1. Structural Intelligence: How are the 847 files physically and logically grouped?
  2. Relational Intelligence: What is the "gravity" of each file? Which modules are the most critical?
  3. Temporal Intelligence: What does the Git history tell us about risk and stability?
  4. Semantic Intelligence: Can an LLM accurately describe the responsibility of a file like ReactFiberBeginWork.js without human intervention?

The Numbers: A High-Level Audit

The first thing repowise does is index the repository. For React, this involves parsing 10+ years of Git history and 847 active source files across TypeScript and JavaScript.

MetricValue
Total Files Indexed847
Total Symbols Detected12,402 (Functions, Classes, Types)
Indexing Time (Local)42 seconds (Parsing & Graph Construction)
LLM Documentation Time~14 minutes (Parallelized via Anthropic Claude 3.5 Sonnet)
Dependency Edges3,104
Community Clusters14 distinct architectural domains

The speed of the initial indexing is critical. By using a combination of tree-sitter for parsing and a high-performance graph database (LanceDB), we can map the entire symbol table of React in under a minute. You can see what repowise generates on real repos in our live examples to compare these metrics against other popular frameworks.

React Indexing TelemetryReact Indexing Telemetry


Architecture Overview: What repowise Sees

Once the scan is complete, the get_overview() tool (one of our 8 MCP tools) provides a high-level map. Instead of a flat list of files, repowise identifies the "centers of gravity" in the codebase.

Module Structure

React is organized as a monorepo. Repowise correctly identifies the primary packages:

  • react: The top-level API.
  • react-reconciler: The engine that manages the diffing algorithm.
  • react-dom: The platform-specific glue.
  • scheduler: The priority-based task runner.

Entry Points

Repowise identifies entry points not just by looking at package.json, but by analyzing the incoming vs. outgoing edge ratio in the dependency graph. In React, it flags ReactFiberWorkLoop.js as a critical "hub" because of its massive PageRank score—it is the heartbeat of the entire library.

Tech Stack Detection

The platform automatically detects the environment. For React, it flags the codebase as a high-complexity TypeScript/JavaScript project using Flow (historically) and custom build tooling (Rollup/Gulp). This context is automatically injected into the react code documentation generated for AI agents.


The Dependency Graph: Mapping the Web

The react dependency graph is where things get interesting. Most tools show you a "spaghetti" diagram. Repowise uses PageRank and community detection (Louvain algorithm) to find the logical clusters.

Most Important Files (PageRank)

By calculating which files are most "depended upon" by other important files, we identified the top 3 most influential files in the React source:

  1. ReactInternalTypes.js: The shared type definitions that bind the reconciler.
  2. ReactFiberConfig.js: The abstraction layer that allows React to run on DOM, Canvas, or Native.
  3. ReactWorkTags.js: The constants that define what a "Fiber" actually is.

Community Clusters

Repowise detected 14 distinct communities. One cluster, for example, strictly contains the "Hook" logic (ReactFiberHooks.js, ReactFiberDispatcher.js, etc.). Another cluster isolates the "Server Components" logic. This modular understanding is vital for checking our architecture page to understand how repowise handles cross-package imports.

React Dependency CommunitiesReact Dependency Communities


Hotspot Analysis: Where the Risk Lives

Not all files are created equal. Some are complex but stable. Others are simple but change every day. The dangerous ones are Complex and Frequently Changed. This is what we call a "Hotspot."

Using Git intelligence, repowise maps every file on a 2D matrix: Churn vs. Cyclomatic Complexity.

Top 3 Riskiest Files in React:

  1. ReactFiberBeginWork.js: Extremely high complexity (the massive switch statement for all component types) and high churn.
  2. ReactFiberCommitWork.js: High churn due to evolving side-effect logic (Suspense, Transitions).
  3. ReactChildFiber.js: The core of the diffing algorithm.

Why These Files?

These files are the "hot zones" of the library. If a PR touches ReactFiberBeginWork.js, repowise’s get_risk() tool will flag it. It will also list the "Co-change partners"—files that almost always change whenever this file changes—helping engineers avoid "breaking the world" with a seemingly local change. You can see this in action by exploring the hotspot analysis demo for a similar real-world example.


Ownership Map: The Human Element

Who actually knows how the Scheduler works? If you run repowise init, the platform mines the Git "blame" and "contribution" data to build an ownership map.

  • Core Contributors: Identifies the top 5 engineers with the most "Deep Knowledge" (measured by lines of code survived over time, not just total commits).
  • Bus Factor Analysis: Repowise flags modules where 90% of the code was written by a single person who hasn't committed in over 6 months.
  • Knowledge Distribution: In React, the knowledge is surprisingly distributed in the core reconciler but highly concentrated in the scheduler and react-art packages.

To see how this looks in practice, you can view the ownership map for Starlette to see git intelligence in action.


Generated Documentation Quality

One of the most powerful features of running repowise init is the Auto-Generated Wiki. For every one of the 847 files, repowise uses an LLM to generate a "Freshness-Aware" documentation page.

File Page Example: ReactFiberHooks.js

  • Summary: "Manages the state hook linked list during the render phase."
  • Key Responsibilities: Handling useMemo, useEffect, and useState transitions.
  • Confidence Score: 0.94 (High).
  • Freshness Score: 0.88 (Matches the latest major architectural shift).

Architecture Diagram

Using the get_architecture_diagram() tool, repowise can generate a Mermaid diagram of how a specific module interacts with the rest of the system. For a developer trying to understand how a "Hook" actually triggers a "Re-render," this visual path is worth a thousand lines of code.


Dead Code Found: The Zombie Hunt

Even in a project as polished as React, dead code accumulates. repowise init runs a reachability analysis across the dependency graph.

Using the get_dead_code() tool, we found:

  • Unused Exports: Several internal utility functions in react-reconciler that are exported but never imported by any other file in the monorepo.
  • Zombie Files: Experimental files or old "scripts" folders that are no longer part of the build pipeline but still occupy the source tree.

Cleaning these up reduces the "cognitive load" for new developers and speeds up build times.


What Happens Next: The MCP Server

The real magic happens after the initialization. Once the 847 files are indexed, repowise starts an MCP (Model Context Protocol) Server.

This allows you to open Claude Code, Cursor, or Cline and ask questions like:

"Where is the logic that handles priority-based interruption in the Fiber loop?"

Instead of the AI guessing or trying to read 847 files into its context window (which would fail), it uses the 8 structured tools:

  1. get_overview() to find the right module.
  2. search_codebase() to find the specific function.
  3. get_context() to read the LLM-generated docs and history.
  4. get_dependency_path() to see how that function connects to the public API.

This is how you turn a massive codebase into a partner you can chat with. You can see all 8 MCP tools in action to understand how this changes the developer experience.

AI Agent Context RetrievalAI Agent Context Retrieval


What We Learned

Running repowise on React taught us that even the most well-documented projects in the world have "shadow knowledge"—architecture that exists in the minds of the maintainers but isn't explicitly mapped.

By running repowise init, we were able to:

  • Identify the React architecture hotspots that require the most rigorous testing.
  • Map the React dependency graph to understand how the reconciler is decoupled from the renderer.
  • Provide a "GPS" for AI agents to navigate 847 files without getting lost.

Try It Yourself

You don't need a project as big as React to see the value. Whether you have 50 files or 5,000, repowise provides the same level of deep intelligence.

  1. Install: npm install -g @repowise/cli
  2. Initialize: repowise init
  3. Explore: Open the generated wiki or connect it to your favorite AI agent via MCP.

Key Takeaways

  • Scale isn't the enemy; lack of map is. 847 files are manageable if you know where the "hubs" are.
  • Git is a goldmine. History tells you more about code quality than the code itself (via Churn vs. Complexity).
  • AI needs structure. Don't just feed your code to an LLM; feed it a structured intelligence layer.
  • Dependency analysis is key. PageRank isn't just for Google; it’s for finding the InternalTypes.js of your own codebase.

Ready to see your codebase in a new light? Learn about repowise's architecture and how the MCP server fits into your workflow today.


FAQ

Q: Does repowise send my code to a server? A: No. Repowise is open-source (AGPL-3.0) and self-hostable. All parsing and graph construction happen locally. Only LLM documentation generation requires an API call (to OpenAI, Anthropic, or a local Ollama instance).

Q: How does this compare to basic "Search"? A: Search finds strings. Repowise finds relationships. It understands that Function A calls Function B, which is owned by Developer C, and has a high risk of breaking Module D.

Q: Can I use this with private repos? A: Absolutely. Since it's self-hosted, your code stays within your infrastructure.

Try repowise on your repo

One command indexes your codebase.