The only code-health score proven to predict real bugs.
25 deterministic biomarkers score every file 1 to 10. The weights are calibrated against a real defect corpus, validated cross-project, and run in under 30 seconds on a 3,000-file repo. No LLM, no cloud.
Most code-quality tools hand you a score and ask you to trust it. None of them show whether the score actually finds the bugs.
A maintainability number is only useful if the files it flags are the files that break. repowise treats that as a measurable claim, validates it against real defect labels, and publishes the result, head to head and on your own repo.
A score you can defend in review.
Deterministic, calibrated against real defects, and open from end to end.
One score, twenty-five deterministic signals
Each file is scored from structural, behavioral, and historical signals, not a single complexity number. The weights are learned offline from a defect corpus, so the score reflects what actually predicts bugs rather than what is easy to measure.
- McCabe complexity, deep nesting, brain methods, LCOM4 cohesion, god classes
- Rabin-Karp clone detection and DRY violations
- Change entropy, co-change scatter, function-level churn, code-age volatility
- Ownership dispersion and prior-defect history
Measurably better than the leading commercial tool
On the same 2,770 files across 9 languages with real defect labels, ranking by repowise health surfaces 2.3x the defects under the same review budget. The lift is paired and statistically significant.
- Recall at a 20%-of-lines budget: 0.173 vs 0.074
- Effort-aware ranking (Popt): 0.607 vs 0.462
- Defect density: 2.18x vs 0.56x
- ROC AUC: 0.731 vs 0.705
From a score to a worklist
Health is only useful if it tells you what to do next. repowise ingests coverage, watches for decline, and ranks the fixes that buy you the most for the least effort.
- Coverage ingestion (LCOV, Cobertura) for untested-hotspot detection
- Declining-health and predicted-decline alerts
- Refactoring targets ranked by impact for effort
- Surfaced in get_health and in PR risk reviews
Deterministic, from index to score.
Index
repowise parses your repo into a graph and reads its git history. No code is sent anywhere.
Score
25 biomarkers run with defect-calibrated weights, fully deterministic, in under 30 seconds on 3,000 files.
Validate
The bundled check shows whether the worst-scored files are the ones with recent bug fixes, on your repo.
Act
Refactoring targets ranked by impact for effort, plus alerts when any file's health slips.
One score, everywhere it helps.
In your AI agent
get_health and get_risk give agents the riskiest files and what to fix first, over MCP.
On every PR
The Repowise PR Bot comments when health regresses, deterministically, with zero LLM calls.
In the dashboard
KPIs, lowest-scoring files, and a module-level health rollup.
Over time
Health trends and declining-health alerts catch decay before it compounds.
For prioritization
Refactoring targets ranked by impact for effort, not gut feel.
For leaders
A defect-validated signal you can put in front of the board, tied to ownership and AI provenance.
Other tools publish a score. repowise publishes the score's predictive performance against real bug labels, head to head, and every heuristic is open source so you can reproduce it on your own repo.
Questions, answered
What does the code-health score actually measure?
Every file gets a single 1 to 10 score computed from 25 deterministic biomarkers: McCabe complexity, deep nesting, brain methods, class cohesion (LCOM4), god classes, Rabin-Karp clone detection, change entropy, co-change scatter, ownership dispersion, prior-defect history, test-quality smells, and more. Lower scores mean the file is more likely to harbor defects.
How do you know the score actually predicts bugs?
It is defect-validated. On the same 2,770 files across 9 languages with real defect labels, ranking by repowise health surfaces 2.3x the defects of a leading commercial tool under the same review budget (recall 0.173 vs 0.074, effort-aware Popt 0.607 vs 0.462, ROC AUC 0.731 vs 0.705). Across 21 open-source repos the mean cross-project ROC AUC is 0.74 with a 95% confidence interval of 0.68 to 0.79, up to 0.90 on individual repos.
Does it use an LLM?
No. Scoring is fully deterministic: 25 biomarkers with weights calibrated offline against a defect corpus. Only the learned constants ship, so the same code always produces the same score, in under 30 seconds on a 3,000-file repo. No cloud, no API calls, no drift.
How is this different from CodeScene?
Both score code health, but repowise is open source so every heuristic is inspectable and reproducible on your own repo, and the score ships inside a broader platform: an architecture-aware wiki, git intelligence, architectural decisions, agent provenance, and nine MCP tools for AI agents. repowise does not offer AI auto-refactoring.
Will it just flag big files?
No. The discrimination survives controlling for file size (partial Spearman rho of -0.16) and significantly out-discriminates both recent churn (+0.10 AUC) and prior-defect history (+0.12 AUC), with DeLong p below 1e-9.
Can it use my test coverage?
Yes. repowise ingests LCOV and Cobertura coverage to compute untested-hotspot risk (the intersection of low coverage and high hotspot score), alerts when a file's health starts declining, and ranks refactoring targets by impact for effort.
Can I prove these numbers on my own codebase?
Yes, that is the point. Every heuristic is open source under AGPL-3.0, and the validation runs on your own repo. On a typical project, 16 of the 20 lowest-health files had a bug fix in the last 6 months, 3.3x the 24% baseline.