Task-Shaped MCP Tools Beat Entity CRUD for Code Agents

repowise team·May 8, 2026·11 min read

mcp tools design

A good way to test mcp tools design is to watch what happens when an agent has to explain a real regression. If the server only offers nouns — file, symbol, commit, owner — the model spends its budget assembling the answer the hard way. It Greps, reopens the same files, checks a commit, loses the thread, then asks for more context it could have had up front.

That failure mode is not subtle. It is the agent re-reading files and missing context because the server exposed objects instead of jobs.

CRUD-style MCP tools make code agents do the assembly work

Entity-shaped tools are what you get when you mirror a data model into MCP: get_file, list_symbols, fetch_commit, update_record, maybe get_owner. They are familiar, composable, and usually wrong for code agents.

They are wrong for the same reason a database schema is not a user interface. A code agent is not trying to enumerate entities. It is trying to answer a question: what changed, what depends on it, who owns it, what decision explains it, and what is risky if I touch it?

CRUD-style MCP tools force the model to chain search, read, infer, and remember. That is where the wheels come off. The agent starts with Grep, opens a file, finds a symbol, opens the caller, then the callee, then the commit, then the owner. By the time it has enough context, it has often reread the same file three times and still missed the dependency edge that mattered.

The core problem is not that entity-shaped tools are low level. It is that they are shaped like nouns in your database, while the agent’s job is a verb.

Repowise’s mcp tools design starts from the opposite assumption: if the model needs to explain a change, the server should return the change-shaped bundle, not the raw ingredients. That is the same logic behind what MCP is and why a codebase needs a server: Model Context Protocol defines a tool interface, but it does not require you to expose a thin object store as your product.

CRUD VS TASK SHAPES

A failing-change task on a real repo: what the agent had to figure out

Take a mundane but realistic task from the pallets/flask style of benchmark work: explain why a change in a request path started breaking tests after a refactor. Nothing exotic. The agent needs to answer three questions:

What code path changed?
What depends on it?
Was there a prior decision or commit that explains the shape of the change?

The initial path with entity-shaped tools looks familiar enough to anyone who has watched a coding agent flail:

Grep for the function name.
Open the module.
Inspect the symbol definition.
Fetch the commit that touched it.
Open the owner file or nearby config.
Still not know whether the failure is in the caller, the callee, or a stale assumption in docs.

What it lacked was not more nouns. It lacked the context that makes the nouns meaningful:

callers and callees around the target
related files in the same dependency neighborhood
git history beyond the one commit the model happened to inspect
decision history explaining why the code is shaped that way
ownership and churn signals that tell you where to look first

This is where generic entity CRUD becomes expensive. The model can find the symbol, but it cannot infer the blast radius without assembling half the repository by hand.

For code agents, that is not just slow. It is brittle. The agent re-reads files because it does not know which file matters yet, then misses the ownership edge because ownership was never part of the tool response, then guesses at the root cause because the decision history lives in a different system.

FAILING CHANGE INVESTIGATION

Task-shaped tools collapse search, read, and reason into one call

Task-shaped tools are opinionated on purpose. They do not ask the agent to discover the shape of the answer. They return the answer shape directly.

Repowise exposes tools like:

get_overview for the first pass on an unfamiliar codebase
get_context for the workhorse bundle around a target
get_risk for hotspot and blast-radius analysis
get_why for decisions and rationale
get_answer for cited, confidence-gated Q&A
search_codebase when semantic search is the right fallback
get_dead_code when the question is cleanup, not comprehension

A contract like this is the point:

get_context(target, include=[source, callers, callees, ownership, freshness])

That is not a file API. It is a job API. The model asks for the target and the dimensions that matter, and the server returns a bundled answer with the relevant graph nodes, docs, ownership, and staleness signals.

This is where mcp tools design starts to matter more than tool count. A smaller number of better-shaped tools usually beats a larger menu of nouns, because each call reduces the number of follow-up calls the agent has to invent. That is especially true in Model Context Protocol, where tool semantics are meant to be explicit. If the server knows that get_risk is a real task, it can package the blast radius, reviewers, co-change partners, and test gaps in one response instead of making the model stitch them together from four separate endpoints.

The result is fewer tool calls, fewer rereads, and fewer failure modes. Repowise’s pallets/flask SWE-QA benchmark found 49% fewer tool calls, 89% fewer files read, 36% lower cost, and parity answer quality. That is the sort of result that makes the API shape look less like aesthetics and more like latency control.

If you want the deeper benchmark context, see our token-efficiency benchmark on pallets/flask.

Worked example: explain a change with four entity calls versus one task call

Here is the same investigation, two ways.

Approach	Calls	Files read	Context returned	Likely failure mode
Entity-shaped flow	4-6	6-12	File text, symbol definition, commit diff, maybe owner	Re-reading, stale context, missed caller/callee edges
Task-shaped flow	1-2	1-3	Source, callers, callees, ownership, freshness, related docs, decision pointers	Smaller failure surface, but depends on tool quality
Repowise benchmark shape	—	89% fewer	Bundled context, cited answer, graph + git + docs + decisions	Parity quality, fewer dead ends

A typical before trace looks like this:

Grep for the symbol.
Open the file.
Inspect the symbol definition.
Inspect the commit.
Open the owner or nearby config.
Ask a follow-up because the answer still feels uncertain.

The after trace is shorter:

get_context(target=..., include=[source, callers, callees, ownership, freshness])
get_why(target=...) if the question is about rationale
get_risk(target=...) if the question is about blast radius

The interesting part is not that the task-shaped path is shorter. It is that the answer is better aligned with the real question. The model gets the dependency neighborhood, not just the file. It gets the docs, not just the code. It gets the decision trail, not just the latest diff.

We got one thing wrong initially: we assumed the agent would always want the most complete context bundle. In practice, that can be too much for simple questions. The better design was not “always return everything.” It was “return the right bundle for the job, and let the agent ask for more when confidence is low.” That is why get_answer exists as a confidence-gated first pass, and search_codebase is there when semantic search needs to take over.

BEFORE AFTER TOOL TRACE

Why task-shaped tools work better for code than for generic CRUD

Code is not a record store. It is a graph with history.

That is why task-shaped tools work better when the server is built on four intelligence layers:

Graph intelligence for symbols, dependencies, communities, and execution flow
Git intelligence for churn, ownership, co-change pairs, and bus factor
Documentation intelligence for a living wiki with freshness and confidence scoring
Decision intelligence for architectural rationale, staleness tracking, and get_why()

Those layers make task-shaped tools possible because they let the server answer questions that are already opinionated:

trace dependency
explain change
prepare patch
assess risk
find decision
identify dead code

A generic CRUD server is good at “give me the object.” A codebase intelligence server should be good at “tell me what this object means in context.”

That distinction matters more in multi-repo workspaces, where the failure mode gets worse faster. If the service boundary spans backend, frontend, and shared packages, entity CRUD pushes the agent into manual correlation across repos. Task-shaped tools can bundle cross-repo co-change pairs, API contract extraction, package dependency mapping, and federated queries into something the model can actually use without building its own mini data warehouse in the prompt.

This is also why the best tools are opinionated at the API surface. Repowise’s seven MCP tools are not trying to be a complete object model. They are trying to be the shortest path from question to answer. The count will grow, but the principle should not: expose jobs, not nouns.

For setup and wiring, how to set up an MCP server for Claude Code, Cursor, and Cline is the practical companion piece, and the earlier list of MCP tools for code agents shows how different task shapes map to actual agent behavior.

Where entity primitives still belong in an opinionated MCP server

This is not an argument for deleting all low-level primitives. You still want them, just not as the main thing.

Keep entity primitives like:

file
symbol
commit
decision
search

They belong in the server when the task cannot be answered confidently, when the user wants inspection-level control, or when you need a fallback for debugging the higher-level tools themselves.

That boundary matters. If a question is “show me the raw file,” a file tool is fine. If the question is “why did this change break the dependency chain,” a file tool is a trap unless it is paired with graph and history context.

Hooks and automation fit here too, but as support infrastructure. PreToolUse can enrich Grep and Glob with related files from local SQLite. PostToolUse can mark the wiki stale after a commit. Those are useful because they reduce friction around the task tools. They should not become the main API surface, because the main API surface should answer the job directly.

The rule of thumb is simple: expose primitives when inspection matters; expose task tools when comprehension matters.

What a senior platform team should keep when redesigning its MCP surface

If you are redesigning an MCP server for a codebase, start with the questions your agents ask most often, not the objects you already store.

A practical checklist:

Ship get_overview first for unknown repos.
Add get_context for any target the agent is likely to touch.
Add get_why for decisions and get_risk for blast radius.
Keep get_answer for concise, cited answers when confidence is high.
Keep entity primitives only as fallback and inspection tools.
Bundle ownership, freshness, callers, callees, and related docs by default where relevant.
Measure tool calls per task, files read, wall time, answer quality, and stale-context incidents.
Watch multi-repo workflows separately, because entity CRUD breaks there fastest.

If the redesign works, you will see the same pattern Repowise saw on pallets/flask: fewer tool calls, fewer rereads, lower cost, and answer quality that does not move much. That is the signal that the server is finally shaped around the job instead of the schema.

For teams evaluating the hosted path, the broader context is in giving agents codebase context without prompt stuffing. If you are specifically comparing task surfaces with hosted vs self-hosted setup tradeoffs, the setup guide above is the right next read.

FAQ

What are task-shaped MCP tools?

They are MCP tools designed around what the agent is trying to do, not around the objects in your database. Examples include get_context, get_risk, get_why, and get_overview, which return bundled context instead of forcing the model to assemble it from files and commits.

Why are entity-shaped tools bad for code agents?

Because they make the agent do the assembly work. The model ends up chaining Grep, file reads, symbol inspection, commit inspection, and owner lookup, which leads to the problem of agent re-reading files and missing context.

How should I design MCP tools for a codebase?

Start from the top recurring jobs: explain change, trace dependency, assess risk, find decision, prepare patch, and locate dead code. Then bundle the context each job needs, using graph, git, documentation, and decision layers underneath.

When should an MCP server expose files and symbols instead of task tools?

When the user explicitly wants inspection-level control, when a task cannot be answered confidently, or when you need fallback primitives for debugging and composability. Files and symbols should be secondary, not the default interface.

What is the Model Context Protocol?

Model Context Protocol is the standard that lets models connect to external tools and data sources through a defined server interface. The protocol gives you the plumbing; mcp tools design is the part where you decide whether the server speaks in nouns or in jobs.