Task-Shaped MCP Tools Beat Entity CRUD for Code Agents
A good way to test mcp tools design is to watch what happens when an agent has to explain a real regression. If the server only offers nouns — file, symbol, commit, owner — the model spends its budget assembling the answer the hard way. It Greps, reopens the same files, checks a commit, loses the thread, then asks for more context it could have had up front.
That failure mode is not subtle. It is the agent re-reading files and missing context because the server exposed objects instead of jobs.
CRUD-style MCP tools make code agents do the assembly work
Entity-shaped tools are what you get when you mirror a data model into MCP: get_file, list_symbols, fetch_commit, update_record, maybe get_owner. They are familiar, composable, and usually wrong for code agents.
They are wrong for the same reason a database schema is not a user interface. A code agent is not trying to enumerate entities. It is trying to answer a question: what changed, what depends on it, who owns it, what decision explains it, and what is risky if I touch it?
CRUD-style MCP tools force the model to chain search, read, infer, and remember. That is where the wheels come off. The agent starts with Grep, opens a file, finds a symbol, opens the caller, then the callee, then the commit, then the owner. By the time it has enough context, it has often reread the same file three times and still missed the dependency edge that mattered.
The core problem is not that entity-shaped tools are low level. It is that they are shaped like nouns in your database, while the agent’s job is a verb.
Repowise’s mcp tools design starts from the opposite assumption: if the model needs to explain a change, the server should return the change-shaped bundle, not the raw ingredients. That is the same logic behind what MCP is and why a codebase needs a server: Model Context Protocol defines a tool interface, but it does not require you to expose a thin object store as your product.
CRUD VS TASK SHAPES
A failing-change task on a real repo: what the agent had to figure out
Take a mundane but realistic task from the pallets/flask style of benchmark work: explain why a change in a request path started breaking tests after a refactor. Nothing exotic. The agent needs to answer three questions:
- What code path changed?
- What depends on it?
- Was there a prior decision or commit that explains the shape of the change?
The initial path with entity-shaped tools looks familiar enough to anyone who has watched a coding agent flail:
- Grep for the function name.
- Open the module.
- Inspect the symbol definition.
- Fetch the commit that touched it.
- Open the owner file or nearby config.
- Still not know whether the failure is in the caller, the callee, or a stale assumption in docs.
What it lacked was not more nouns. It lacked the context that makes the nouns meaningful:
- callers and callees around the target
- related files in the same dependency neighborhood
- git history beyond the one commit the model happened to inspect
- decision history explaining why the code is shaped that way
- ownership and churn signals that tell you where to look first
This is where generic entity CRUD becomes expensive. The model can find the symbol, but it cannot infer the blast radius without assembling half the repository by hand.
For code agents, that is not just slow. It is brittle. The agent re-reads files because it does not know which file matters yet, then misses the ownership edge because ownership was never part of the tool response, then guesses at the root cause because the decision history lives in a different system.
FAILING CHANGE INVESTIGATION
Task-shaped tools collapse search, read, and reason into one call
Task-shaped tools are opinionated on purpose. They do not ask the agent to discover the shape of the answer. They return the answer shape directly.
Repowise exposes tools like:
get_overviewfor the first pass on an unfamiliar codebaseget_contextfor the workhorse bundle around a targetget_riskfor hotspot and blast-radius analysisget_whyfor decisions and rationaleget_answerfor cited, confidence-gated Q&Asearch_codebasewhen semantic search is the right fallbackget_dead_codewhen the question is cleanup, not comprehension
A contract like this is the point:
get_context(target, include=[source, callers, callees, ownership, freshness])
That is not a file API. It is a job API. The model asks for the target and the dimensions that matter, and the server returns a bundled answer with the relevant graph nodes, docs, ownership, and staleness signals.
This is where mcp tools design starts to matter more than tool count. A smaller number of better-shaped tools usually beats a larger menu of nouns, because each call reduces the number of follow-up calls the agent has to invent. That is especially true in Model Context Protocol, where tool semantics are meant to be explicit. If the server knows that get_risk is a real task, it can package the blast radius, reviewers, co-change partners, and test gaps in one response instead of making the model stitch them together from four separate endpoints.
The result is fewer tool calls, fewer rereads, and fewer failure modes. Repowise’s pallets/flask SWE-QA benchmark found 49% fewer tool calls, 89% fewer files read, 36% lower cost, and parity answer quality. That is the sort of result that makes the API shape look less like aesthetics and more like latency control.
If you want the deeper benchmark context, see our token-efficiency benchmark on pallets/flask.
Worked example: explain a change with four entity calls versus one task call
Here is the same investigation, two ways.
| Approach | Calls | Files read | Context returned | Likely failure mode |
|---|---|---|---|---|
| Entity-shaped flow | 4-6 | 6-12 | File text, symbol definition, commit diff, maybe owner | Re-reading, stale context, missed caller/callee edges |
| Task-shaped flow | 1-2 | 1-3 | Source, callers, callees, ownership, freshness, related docs, decision pointers | Smaller failure surface, but depends on tool quality |
| Repowise benchmark shape | — | 89% fewer | Bundled context, cited answer, graph + git + docs + decisions | Parity quality, fewer dead ends |
A typical before trace looks like this:
Grepfor the symbol.- Open the file.
- Inspect the symbol definition.
- Inspect the commit.
- Open the owner or nearby config.
- Ask a follow-up because the answer still feels uncertain.
The after trace is shorter:
get_context(target=..., include=[source, callers, callees, ownership, freshness])get_why(target=...)if the question is about rationaleget_risk(target=...)if the question is about blast radius
The interesting part is not that the task-shaped path is shorter. It is that the answer is better aligned with the real question. The model gets the dependency neighborhood, not just the file. It gets the docs, not just the code. It gets the decision trail, not just the latest diff.
We got one thing wrong initially: we assumed the agent would always want the most complete context bundle. In practice, that can be too much for simple questions. The better design was not “always return everything.” It was “return the right bundle for the job, and let the agent ask for more when confidence is low.” That is why get_answer exists as a confidence-gated first pass, and search_codebase is there when semantic search needs to take over.
BEFORE AFTER TOOL TRACE
Why task-shaped tools work better for code than for generic CRUD
Code is not a record store. It is a graph with history.
That is why task-shaped tools work better when the server is built on four intelligence layers:
- Graph intelligence for symbols, dependencies, communities, and execution flow
- Git intelligence for churn, ownership, co-change pairs, and bus factor
- Documentation intelligence for a living wiki with freshness and confidence scoring
- Decision intelligence for architectural rationale, staleness tracking, and
get_why()
Those layers make task-shaped tools possible because they let the server answer questions that are already opinionated:
- trace dependency
- explain change
- prepare patch
- assess risk
- find decision
- identify dead code
A generic CRUD server is good at “give me the object.” A codebase intelligence server should be good at “tell me what this object means in context.”
That distinction matters more in multi-repo workspaces, where the failure mode gets worse faster. If the service boundary spans backend, frontend, and shared packages, entity CRUD pushes the agent into manual correlation across repos. Task-shaped tools can bundle cross-repo co-change pairs, API contract extraction, package dependency mapping, and federated queries into something the model can actually use without building its own mini data warehouse in the prompt.
This is also why the best tools are opinionated at the API surface. Repowise’s seven MCP tools are not trying to be a complete object model. They are trying to be the shortest path from question to answer. The count will grow, but the principle should not: expose jobs, not nouns.
For setup and wiring, how to set up an MCP server for Claude Code, Cursor, and Cline is the practical companion piece, and the earlier list of MCP tools for code agents shows how different task shapes map to actual agent behavior.
Where entity primitives still belong in an opinionated MCP server
This is not an argument for deleting all low-level primitives. You still want them, just not as the main thing.
Keep entity primitives like:
- file
- symbol
- commit
- decision
- search
They belong in the server when the task cannot be answered confidently, when the user wants inspection-level control, or when you need a fallback for debugging the higher-level tools themselves.
That boundary matters. If a question is “show me the raw file,” a file tool is fine. If the question is “why did this change break the dependency chain,” a file tool is a trap unless it is paired with graph and history context.
Hooks and automation fit here too, but as support infrastructure. PreToolUse can enrich Grep and Glob with related files from local SQLite. PostToolUse can mark the wiki stale after a commit. Those are useful because they reduce friction around the task tools. They should not become the main API surface, because the main API surface should answer the job directly.
The rule of thumb is simple: expose primitives when inspection matters; expose task tools when comprehension matters.
What a senior platform team should keep when redesigning its MCP surface
If you are redesigning an MCP server for a codebase, start with the questions your agents ask most often, not the objects you already store.
A practical checklist:
- Ship
get_overviewfirst for unknown repos. - Add
get_contextfor any target the agent is likely to touch. - Add
get_whyfor decisions andget_riskfor blast radius. - Keep
get_answerfor concise, cited answers when confidence is high. - Keep entity primitives only as fallback and inspection tools.
- Bundle ownership, freshness, callers, callees, and related docs by default where relevant.
- Measure tool calls per task, files read, wall time, answer quality, and stale-context incidents.
- Watch multi-repo workflows separately, because entity CRUD breaks there fastest.
If the redesign works, you will see the same pattern Repowise saw on pallets/flask: fewer tool calls, fewer rereads, lower cost, and answer quality that does not move much. That is the signal that the server is finally shaped around the job instead of the schema.
For teams evaluating the hosted path, the broader context is in giving agents codebase context without prompt stuffing. If you are specifically comparing task surfaces with hosted vs self-hosted setup tradeoffs, the setup guide above is the right next read.
FAQ
What are task-shaped MCP tools?
They are MCP tools designed around what the agent is trying to do, not around the objects in your database. Examples include get_context, get_risk, get_why, and get_overview, which return bundled context instead of forcing the model to assemble it from files and commits.
Why are entity-shaped tools bad for code agents?
Because they make the agent do the assembly work. The model ends up chaining Grep, file reads, symbol inspection, commit inspection, and owner lookup, which leads to the problem of agent re-reading files and missing context.
How should I design MCP tools for a codebase?
Start from the top recurring jobs: explain change, trace dependency, assess risk, find decision, prepare patch, and locate dead code. Then bundle the context each job needs, using graph, git, documentation, and decision layers underneath.
When should an MCP server expose files and symbols instead of task tools?
When the user explicitly wants inspection-level control, when a task cannot be answered confidently, or when you need fallback primitives for debugging and composability. Files and symbols should be secondary, not the default interface.
What is the Model Context Protocol?
Model Context Protocol is the standard that lets models connect to external tools and data sources through a defined server interface. The protocol gives you the plumbing; mcp tools design is the part where you decide whether the server speaks in nouns or in jobs.


