AGENT PROVENANCE GUIDE

Agent Provenance: The AI-Debt Radar for AI-Written Code

How repowise attributes commits to the AI agents that wrote them and fuses that with a defect-validated health score and bus-factor ownership to rank the AI-written code most likely to break.

96.2%
Provenance precision, blind-validated across 124 commits by six reviewers, six of eight channels perfect
112,382
Commits across 28 repos in the is-AI-code-buggier study behind the radar
0.74
ROC AUC of the defect-validated health score the radar fuses with, across 21 repos
0
IDE plugins required, reads your existing history retroactively
By Raghav ChamadiyaUpdated June 2026 · 10 min
TL;DR

repowise reads your git history alone to attribute commits to the AI agents that wrote them, then fuses that authorship with the defect-validated 1-to-10 health score and bus-factor ownership. That intersection is the AI-debt radar: AI-written code that is also a low-health hotspot owned by a single person, ranked. It needs no IDE plugins and works retroactively on the code AI already wrote.

DEFINITION

Agent provenance is repowise attributing each commit to the AI agent that produced it, computed deterministically from git history rather than IDE instrumentation. Fused with the defect-validated health score and bus-factor ownership, it forms the AI-debt radar: a ranking of AI-written code that is also low-health and single-owner, the actual risk surface rather than a raw AI percentage.

Why does agent provenance matter?

A growing share of your codebase is now written by AI agents, and almost no one can tell you which code that is. Knowing your repo is 40-something percent AI-written is a headline, not a plan.

Industry-wide, roughly 42% of code is AI-written in 2026, and surveys report roughly 1.7x more issues in AI-generated code than human code. But a raw percentage cannot tell you where to look, and a survey cannot tell you which file will break.

  • A percentage is a vanity metric: it ranks nothing and directs no review.
  • AI agents do not carry the context a long-tenured engineer would, so their code lands without the same shared familiarity.
  • When AI code lands in a low-health file owned by one person, the usual safety nets, review familiarity and shared ownership, are thinnest exactly where the risk is highest.

How does the AI-debt radar work?

repowise treats AI authorship as a signal to fuse, not a number to display. The mechanism is deterministic end to end, computed from the git log you already have.

1. Index. repowise parses your repo into a graph and reads its git history. No code is sent anywhere and nothing is installed on developer machines.

2. Attribute by agent. A per-commit provenance detector reads eight signals from history alone, then validates with no model in the loop.

Signal classWhat it reads
Account identityBot account identities and service email addresses
Commit metadataCommit-message footers and co-author trailers
Merged-PR evidenceAgent branch prefixes and PR-body markers
Confidence handlingA human follow-up commit inside an agent's PR is downgraded so it can be filtered

Blind-validated across 124 commits by six independent reviewers, the detector returned 96.2% precision, with six of the eight channels perfect.

3. Fuse three signals. AI authorship on its own is a vanity metric; the radar intersects it with two signals already in repowise.

SignalWhat it contributes
AI authorshipWhich commits, and which resulting code, an agent produced
Code healthThe defect-validated 1-to-10 score (cross-project ROC AUC 0.74), so low-health files surface
Bus-factor ownershipFiles owned more than 80% by one author, flagged from git blame

4. Rank the radar. The intersection, AI-written and low-health and single-owner, is the actual risk surface. repowise ranks it so leaders and reviewers know where to look first.

This is a directional risk lens on files and agents, not a per-developer productivity ledger. It reads commits; it does not watch people.

How does it help you?

The radar turns a statistic into something a team can act on the same day, and it lands everywhere the work happens.

A risk surface, not a percentage

Provenance fused with health and ownership ranks the AI code most likely to cause an incident.

  • AI-written, low-health, and single-owner files are ranked together, highest risk first.
  • Tied to the same 25-biomarker score that surfaces 2.3x the defects under a fixed review budget.
  • The output is "review these files first," not "your repo is X% AI."

Surfaced where you already work

An AI-authorship flag is never an island; it sits on the same surface as the rest of repowise.

  • In your AI agent: get_risk surfaces AI-authored hotspots and what to check first, over MCP, alongside health and ownership.
  • On every PR: the Repowise PR Bot can flag deterministically when AI-written code lands in a low-health, single-owner file.
  • In the dashboard: your AI-code footprint per module, ranked by the radar rather than raw percentage.

Grounded in the data, not the hype

The radar exists because the real risk is concentration, not a wave of extra bugs.

Guardrail: this is a directional risk signal, not a per-developer ledger — it reads commits, it does not watch people.

  • Across 112,382 commits in 28 repos, agent commits were no more bug-inducing than human commits in the same repo.
  • Human-driven agents, where a person is still reviewing the diff, came in at an odds ratio of 0.57 [95% CI 0.42-0.76], and agent lines outlived human lines by 17.9 percentage points.
  • So the radar targets the dangerous intersection, AI code in a low-health, single-owner file, rather than treating all AI code as suspect.

Walkthrough: from commit to AI-debt view

Step 1 — Index your repos. Run repowise init to build the graph, git history, and health from your existing history. No IDE plugins, no telemetry.

repowise init indexing a repository and building the git history and health layers
One command indexes the repo and reads the history the radar is computed from.

Step 2 — Read the AI-code footprint. Provenance rolls up per file, module, and repo, so you see how much AI wrote before any fusion.

Step 3 — Open the AI-debt radar. Switch to the radar to see AI-written, low-health, single-owner files ranked together, highest risk first.

Step 4 — Act. Direct review and tests at the top of the radar, spread single-owner files, and let declining-health alerts track the shift over time.

Proof: what the radar stands on

Each result below is reproducible: the health heuristics are open source under AGPL-3.0, and the provenance study ships its analysis scripts.

ResultValue
Provenance precision (blind-validated, 124 commits, 6 reviewers)96.2%, six of eight channels perfect
Provenance signals read from history8 (identities, emails, footers, trailers, PR markers)
Study behind the radar112,382 commits across 28 repos
AI vs human bug induction (same repo)Agent commits no more bug-inducing; human-driven tier OR 0.57 [0.42-0.76]
Agent line survival vs human+17.9 percentage points longer-lived
Health score the radar fuses withCross-project ROC AUC 0.74, up to 0.90 per repo
Bus-factor flagFiles owned more than 80% by one author
Instrumentation required0 IDE plugins, retroactive on existing history

The full provenance study, per-repo numbers, and analysis scripts are open in the is-AI-code-buggier study; the health score it fuses with is documented in the defect-prediction validation study.

FOR YOUR ROLE

How each role uses this feature

FREQUENTLY ASKED

Questions, answered

How does repowise know which agent wrote the code?

It reads it from your git history alone. A per-commit provenance detector reads eight signals: bot account identities, service email addresses, commit-message footers, co-author trailers, and merged-PR evidence like agent branch prefixes and PR-body markers. There is no IDE plugin, no keystroke capture, and no model deciding authorship. Blind-validated across 124 commits by six reviewers, it came back at 96.2% precision, with six of the eight channels perfect.

What is the AI-debt radar?

It is the fusion of three signals already in repowise: AI authorship, the 1-to-10 defect-validated health score, and bus-factor ownership. Provenance on its own is a vanity metric, so the radar intersects those signals to rank the AI-written code that is also a low-health hotspot owned by a single person. That intersection is the actual risk surface, not the raw percentage of AI code.

Is AI-written code actually buggier than human code?

Across 112,382 commits in 28 repos, agent commits were no more likely to introduce a defect than human commits in the same repo, and the point estimates leaned protective. Human-driven agents, where a person is still reviewing the diff, came in at an odds ratio of 0.57 [95% CI 0.42-0.76], and agent-written lines outlived human ones by 17.9 percentage points. The fear that agents are flooding codebases with extra bugs is not visible in the blame history. The risk that remains is concentration, not volume, which is exactly what the radar surfaces.

Is this developer surveillance?

No. Agent provenance is a risk-management signal for the codebase, not a per-developer productivity ledger. The unit of analysis is the file and the agent, never the engineer's hour. It reads commits, it does not watch people, and it is built so it cannot be turned into a per-developer ranking. The question it answers is which AI-written code is risky, not who typed fastest.

Does it work on our existing history?

Yes, retroactively. Because attribution is computed from git history rather than live instrumentation, repowise analyzes the code AI already wrote, all the way back through your existing commits. You install nothing on developer machines and wait for no new data to accumulate. The AI-code footprint is visible on day one, including across contractors and past contributors.

How accurate is the attribution?

It is a directional risk signal tied to the defect-validated health score and ownership, not a precise per-developer ledger. The provenance detector hit 96.2% precision under blind validation; its one real failure mode, a human pushing a follow-up commit inside an agent's PR, has its confidence downgraded so it can be filtered. The value is in the fusion: AI-written code that is also a low-health hotspot owned by one person is the code most likely to bite, and that ranking holds up even when any single attribution is approximate.

Last reviewed: June 2026

See where AI wrote your riskiest code in your repo