Code Health: The Complete Guide (2026)
Code health is a measure of how risky a codebase is to change, scored from 25 deterministic biomarkers with no model and no LLM in the loop. Unlike most "debt ratio" metrics, repowise's code-health score is defect-validated: it reaches a cross-project mean ROC AUC of 0.74 at predicting which files get bug-fixed over the next six months, rising as high as 0.90 within a single repo. This guide explains what the score is, the three pillars it splits into, and how to reproduce the benchmark on your own repository.
What is code health?
Code health is a per-file, 0-to-10 measure of how much friction and defect risk the next change to that file is likely to carry. It is computed deterministically from 25 biomarkers spanning structure, dependency shape, test signal, and git history — the same inputs always produce the same score. It is a profile you can take apart, not a single dashboard number to trust blindly.
The word "deterministic" is load-bearing. There is no model and no LLM anywhere in the scorer, so the score is auditable, reproducible, and cheap to recompute. On a 3,000-file repo, an incremental rescore runs in under 30 seconds.
That matters because most "code quality" numbers are either subjective labels ("messy", "clean") or a rolled-up average that hides the one file dragging delivery down. Code health is designed to do the opposite: surface the reason next to the number, per file, so you can act on it.
The three pillars
A single headline number would blur signals that mean different things. So the 25 biomarkers feed three co-equal pillars, and only the first becomes the headline score.
1. Defect risk (the headline 1-10 score)
Defect risk is the validated pillar. It blends structural and evolutionary biomarkers — complexity, coupling, churn, change entropy, ownership dispersion, prior defects — into the 1-10 number that ranks which files are most likely to break. This is the score the merge gate and the hotspot view read from.
The headline deliberately does not blend in maintainability or performance. Mixing them would dilute the one signal that history can actually validate against bugs.
2. Maintainability
Maintainability tracks smells that raise reading and change cost without necessarily predicting bugs: long methods, too many parameters, primitive obsession, low cohesion, duplication. These hurt, but they hurt your week, not your incident count. Keeping them out of the defect headline is a feature, not an omission.
3. Static performance risk
The third pillar is a static scan for performance shapes — N+1 query patterns and I/O-in-loop constructs that quietly waste work at scale. It is tuned for high precision over high recall: it would rather stay quiet than flood you with false alarms. Like the other two, it is fully deterministic.
| Pillar | What it answers | Becomes the headline? |
|---|---|---|
| Defect risk | Where are bugs most likely to land next? | Yes |
| Maintainability | Where is the code expensive to read and change? | No (co-equal view) |
| Static performance risk | Where do N+1 / I/O-in-loop shapes waste work? | No (co-equal view) |
How this differs from a typical "debt ratio"
Most static-analysis platforms produce a maintainability or technical-debt ratio: count the rule violations, estimate remediation minutes, divide by the cost to rewrite. It is a tidy number. It is also unvalidated against where bugs actually occur, and it leans almost entirely on the structure of a single snapshot.
The key difference is that code health weighs git history as heavily as structure, and then proves it ranks real defects.
| Dimension | Typical static-analysis debt ratio | repowise code health |
|---|---|---|
| Inputs | Rule violations on a single snapshot | 25 biomarkers: structure + coupling + tests + git history |
| Determinism | Deterministic | Deterministic (zero LLM) |
| Defect validation | Rarely published | ROC AUC 0.74 cross-project, up to 0.90 per repo |
| Signal split | One blended maintainability/debt number | Three pillars; only defect risk is the headline |
| History awareness | Limited | Churn, entropy, ownership, prior defects are first-class |
| Licensing | Usually proprietary | AGPL-3.0, self-hostable |
Does it actually predict bugs?
This is the question that separates a metric from a decoration, so it is worth being precise. repowise scored 21 open-source repos across nine languages six months before their bugs landed, using a leakage-free protocol, and measured whether the score ranked the soon-to-be-buggy files above the clean ones.
The headline result: a cross-project mean ROC AUC of 0.74 (95% CI 0.68-0.79), rising as high as 0.90 within an individual repository. Under a fixed review budget — the realistic case where you can only inspect so many files — the score surfaces 2.3x more defects than the baseline.
The effort-aware numbers tell the same story: recall of 0.173 versus 0.074, and a Popt of 0.607 versus 0.462. The full benchmark, including the confounds it does not escape, is open and reproducible — you can run it on your own repos rather than take the numbers on faith.
The code-health cluster
This guide is the hub. Each spoke below goes deep on one part of the picture.
- What is code health? a tour of the biomarkers — the per-file signals, grouped, and why averages hide trouble.
- Spotting declining code-health trends — reading the time series before a slow-burn decline bites.
- Nested vs cyclomatic complexity — why two complexity metrics disagree, and when each one matters.
- Inside the code-health scorer — the normalization, weighting, and rollup mechanics under the hood.
- Does our code-health score predict bugs? — the leakage-free benchmark across 21 repos, in full.
- Process metrics beat structural metrics — why git history out-predicts code shape for defects.
- Best code-health tools in 2026 — how the platforms compare on signal, not marketing.
- Best static-analysis tools for large codebases — where scanners stop and codebase intelligence begins.
For the product view of all three pillars and the merge gate, see the code-health feature page.
How to use it
Start with the worst files, not the average. Sort by lowest defect-risk score, filter to high churn or high fan-in, and inspect the top three to five. The score tells you where; the biomarker breakdown tells you why.
Then decide between three moves: add tests before touching a fragile hotspot, schedule cleanup for a file that is both complex and volatile, or split a boundary where co-change pressure says the module shape no longer matches reality. Re-score after each change — the trend is more honest than any single snapshot.
Because the whole thing is AGPL-3.0 and self-hostable, you can run it in CI as a merge gate that judges a PR on the same signals you read by hand.
Last reviewed: June 2026.
FAQ
What is code health in software engineering?
Code health is a per-file measure of how risky and costly the next change to a file will be, combining structure, coupling, test signal, and git history. In repowise it is a 0-to-10 score built deterministically from 25 biomarkers. It is meant to be read as a profile, not a single blended dashboard number.
How many biomarkers does the code-health score use?
The score is built from 25 deterministic biomarkers, with no model and no LLM in the loop. They span structural signals (complexity, coupling, size), evolutionary signals from git history (churn, change entropy, ownership), and test-and-coverage signals. Determinism means the same inputs always produce the same score.
What are the three pillars of code health?
The three pillars are defect risk, maintainability, and static performance risk. Defect risk is the validated headline 1-to-10 score; maintainability and static performance risk are co-equal views that are never blended into the headline. Keeping them separate stops a readability smell or a performance shape from diluting the one signal validated against real bugs.
Does the code-health score actually predict bugs?
Yes, within measured limits. On a leakage-free benchmark across 21 open-source repos in nine languages, the score reached a cross-project mean ROC AUC of 0.74 (95% CI 0.68-0.79), up to 0.90 per repo, and surfaced 2.3x more defects under a fixed review budget. The benchmark is open so you can reproduce it on your own repositories.
How is code health different from a technical-debt ratio?
A typical debt ratio counts rule violations on a single snapshot and estimates remediation effort, but it rarely validates against where bugs occur. Code health weighs git history as heavily as structure and publishes its defect-prediction accuracy. It also keeps maintainability and performance as separate pillars rather than blending everything into one number.
Is it fast enough to run in CI?
Yes. The scorer is fully deterministic with no LLM calls, so an incremental rescore runs in under 30 seconds on a 3,000-file repository. Because repowise is AGPL-3.0 and self-hostable, you can wire it into CI as a merge gate without sending code to a third party.
