Code Health: The Complete Guide (2026)

repowise team··9 min read
code healthcode health scoredefect predictiontechnical debt metricmaintainabilitystatic performance risk

Code health is a measure of how risky a codebase is to change, scored from 25 deterministic biomarkers with no model and no LLM in the loop. Unlike most "debt ratio" metrics, repowise's code-health score is defect-validated: it reaches a cross-project mean ROC AUC of 0.74 at predicting which files get bug-fixed over the next six months, rising as high as 0.90 within a single repo. This guide explains what the score is, the three pillars it splits into, and how to reproduce the benchmark on your own repository.

What is code health?

Code health is a per-file, 0-to-10 measure of how much friction and defect risk the next change to that file is likely to carry. It is computed deterministically from 25 biomarkers spanning structure, dependency shape, test signal, and git history — the same inputs always produce the same score. It is a profile you can take apart, not a single dashboard number to trust blindly.

The word "deterministic" is load-bearing. There is no model and no LLM anywhere in the scorer, so the score is auditable, reproducible, and cheap to recompute. On a 3,000-file repo, an incremental rescore runs in under 30 seconds.

That matters because most "code quality" numbers are either subjective labels ("messy", "clean") or a rolled-up average that hides the one file dragging delivery down. Code health is designed to do the opposite: surface the reason next to the number, per file, so you can act on it.

The three pillars

A single headline number would blur signals that mean different things. So the 25 biomarkers feed three co-equal pillars, and only the first becomes the headline score.

1. Defect risk (the headline 1-10 score)

Defect risk is the validated pillar. It blends structural and evolutionary biomarkers — complexity, coupling, churn, change entropy, ownership dispersion, prior defects — into the 1-10 number that ranks which files are most likely to break. This is the score the merge gate and the hotspot view read from.

The headline deliberately does not blend in maintainability or performance. Mixing them would dilute the one signal that history can actually validate against bugs.

2. Maintainability

Maintainability tracks smells that raise reading and change cost without necessarily predicting bugs: long methods, too many parameters, primitive obsession, low cohesion, duplication. These hurt, but they hurt your week, not your incident count. Keeping them out of the defect headline is a feature, not an omission.

3. Static performance risk

The third pillar is a static scan for performance shapes — N+1 query patterns and I/O-in-loop constructs that quietly waste work at scale. It is tuned for high precision over high recall: it would rather stay quiet than flood you with false alarms. Like the other two, it is fully deterministic.

PillarWhat it answersBecomes the headline?
Defect riskWhere are bugs most likely to land next?Yes
MaintainabilityWhere is the code expensive to read and change?No (co-equal view)
Static performance riskWhere do N+1 / I/O-in-loop shapes waste work?No (co-equal view)

How this differs from a typical "debt ratio"

Most static-analysis platforms produce a maintainability or technical-debt ratio: count the rule violations, estimate remediation minutes, divide by the cost to rewrite. It is a tidy number. It is also unvalidated against where bugs actually occur, and it leans almost entirely on the structure of a single snapshot.

The key difference is that code health weighs git history as heavily as structure, and then proves it ranks real defects.

DimensionTypical static-analysis debt ratiorepowise code health
InputsRule violations on a single snapshot25 biomarkers: structure + coupling + tests + git history
DeterminismDeterministicDeterministic (zero LLM)
Defect validationRarely publishedROC AUC 0.74 cross-project, up to 0.90 per repo
Signal splitOne blended maintainability/debt numberThree pillars; only defect risk is the headline
History awarenessLimitedChurn, entropy, ownership, prior defects are first-class
LicensingUsually proprietaryAGPL-3.0, self-hostable

Does it actually predict bugs?

This is the question that separates a metric from a decoration, so it is worth being precise. repowise scored 21 open-source repos across nine languages six months before their bugs landed, using a leakage-free protocol, and measured whether the score ranked the soon-to-be-buggy files above the clean ones.

The headline result: a cross-project mean ROC AUC of 0.74 (95% CI 0.68-0.79), rising as high as 0.90 within an individual repository. Under a fixed review budget — the realistic case where you can only inspect so many files — the score surfaces 2.3x more defects than the baseline.

The effort-aware numbers tell the same story: recall of 0.173 versus 0.074, and a Popt of 0.607 versus 0.462. The full benchmark, including the confounds it does not escape, is open and reproducible — you can run it on your own repos rather than take the numbers on faith.

The code-health cluster

This guide is the hub. Each spoke below goes deep on one part of the picture.

For the product view of all three pillars and the merge gate, see the code-health feature page.

How to use it

Start with the worst files, not the average. Sort by lowest defect-risk score, filter to high churn or high fan-in, and inspect the top three to five. The score tells you where; the biomarker breakdown tells you why.

Then decide between three moves: add tests before touching a fragile hotspot, schedule cleanup for a file that is both complex and volatile, or split a boundary where co-change pressure says the module shape no longer matches reality. Re-score after each change — the trend is more honest than any single snapshot.

Because the whole thing is AGPL-3.0 and self-hostable, you can run it in CI as a merge gate that judges a PR on the same signals you read by hand.

Last reviewed: June 2026.

FAQ

What is code health in software engineering?

Code health is a per-file measure of how risky and costly the next change to a file will be, combining structure, coupling, test signal, and git history. In repowise it is a 0-to-10 score built deterministically from 25 biomarkers. It is meant to be read as a profile, not a single blended dashboard number.

How many biomarkers does the code-health score use?

The score is built from 25 deterministic biomarkers, with no model and no LLM in the loop. They span structural signals (complexity, coupling, size), evolutionary signals from git history (churn, change entropy, ownership), and test-and-coverage signals. Determinism means the same inputs always produce the same score.

What are the three pillars of code health?

The three pillars are defect risk, maintainability, and static performance risk. Defect risk is the validated headline 1-to-10 score; maintainability and static performance risk are co-equal views that are never blended into the headline. Keeping them separate stops a readability smell or a performance shape from diluting the one signal validated against real bugs.

Does the code-health score actually predict bugs?

Yes, within measured limits. On a leakage-free benchmark across 21 open-source repos in nine languages, the score reached a cross-project mean ROC AUC of 0.74 (95% CI 0.68-0.79), up to 0.90 per repo, and surfaced 2.3x more defects under a fixed review budget. The benchmark is open so you can reproduce it on your own repositories.

How is code health different from a technical-debt ratio?

A typical debt ratio counts rule violations on a single snapshot and estimates remediation effort, but it rarely validates against where bugs occur. Code health weighs git history as heavily as structure and publishes its defect-prediction accuracy. It also keeps maintainability and performance as separate pillars rather than blending everything into one number.

Is it fast enough to run in CI?

Yes. The scorer is fully deterministic with no LLM calls, so an incremental rescore runs in under 30 seconds on a 3,000-file repository. Because repowise is AGPL-3.0 and self-hostable, you can wire it into CI as a merge gate without sending code to a third party.

Try repowise on your repo

One command indexes your codebase.