Skills or Subagents

Every agent I build hits the same fork in the road. A new capability is needed. Should it run in the agent’s own context (a skill) or in a separate process (a subagent)?

The two patterns look similar from a distance. A skill is a markdown file with instructions the agent reads inline. A subagent is a separate invocation with its own context window, its own system prompt, and potentially its own model. Both let you extend what an agent can do. The choice between them looks like a stylistic preference until you ship the first one and find out it isn’t.

Skills

Zero startup latency
Conversation context preserved
Ship by dropping a file
Same model as the parent

Subagents

~15 seconds startup per call
Context isolation
Different model possible
Parallel execution

The default I’ve landed on is “skills first.” Not because skills are better in every case.

The cost of crossing an agent boundary

Every time I delegate something to a subagent, I pay three costs the skill version doesn’t pay.

The first is startup latency. A subagent invocation has about fifteen seconds of overhead before the work begins. If I’m doing fifty of them in a session, that’s twelve minutes of nothing happening before the work happens.

The second is context loss. The subagent doesn’t know what the user just said. It doesn’t know what decisions the parent agent already made. It doesn’t know which constraints are implicit in the conversation. Anything I want it to know has to fit in the briefing message. UC Berkeley’s MAST taxonomy of multi-agent failures named this specifically: “Loss of History” is a recurring failure mode where agents forget previous context or experience unexpected conversation resets during handoffs.

The third is verification difficulty. The parent agent receives the subagent’s final output and has to trust it. If I can’t tell whether the subagent did a good job from the final message alone, the delegation is poorly scoped. Google DeepMind’s delegation framework requires it: don’t hand off unless the outcome can be precisely verified.

Skills don’t pay any of these. The instructions load into the agent’s existing context. The agent uses its existing tools. The output isn’t a handoff to summarize, it’s a continuation of the same conversation. Zero latency, zero context loss, zero verification overhead.

When subagents earn their keep

The case for a subagent is specific. Four conditions, any one of them sufficient.

Promote a skill to a subagent when...

Self-evaluation

Subagent

Anthropic’s harness design guide says: “Tuning a standalone evaluator to be skeptical turns out to be far more tractable than making a generator critical of its own work.” Models have a measurable preference for their own output (Panickssery et al., NeurIPS 2024). They recognize their own writing and rate it higher. Skills can’t do this honestly.

Large intermediate output

Subagent

Some capabilities generate enormous payloads on the way to a small answer. Reading every file in a repo to find one thing. A subagent absorbs that payload in its own context window and returns the distilled finding to the parent. Chroma’s “context rot” study found every frontier model degrades as input length grows, well before the advertised window limit.

Different model

Subagent

If a capability can run on a cheap model, a subagent is the clean wrapper. RouteLLM demonstrated 85% cost reduction at 95% quality by routing between expensive and cheap models. Bulk classification, filtering, formatting. None of these need the expensive model the parent is running on.

Parallel execution

Subagent

A skill runs sequentially in the main agent. Subagents can run in parallel. If the work is N independent searches or N independent file analyses, the right pattern is N subagents, not one skill that loops.

None of the above

Skill

If none of those apply, the capability should be a skill. Drop a SKILL.md file in the right directory. The agent picks it up next session. No handoff, no overhead, no risk of context drift.

What this looks like in practice

I built Q, a Slack-native agent for my team. The agent needed to file tickets, query our wiki, send messages, classify reactions, look up users. Almost all of that lives as skills. Single agent, single context, one continuous conversation per thread. Q is fast because nothing it does crosses an agent boundary.

Q’s skills

File a ticket
Query the wiki
Send a thread message
Classify a reaction
Look up a user

Q’s subagents

!reflect self-evaluation cycle
Memory search across thousands of records

The exceptions are where the four conditions hit. Q has a !reflect command that runs a self-evaluation cycle: Q reads recent sessions, notices where it was corrected, proposes updates to its own knowledge base. That’s a subagent, because the evaluation step shouldn’t happen in the same context that produced the work being evaluated. Memory search runs through an isolated retrieval layer too, because the volume of records being searched would corrupt the parent’s working context.

My adversarial harness is the other clear case. The Generator and the Evaluator can’t be the same agent. The whole point of separating them is to break the self-praise bias. So they’re separate agents in separate sessions. But within each of those agents, almost every capability is a skill. The Generator has a TDD skill, a commit-message skill, a parallel-execution skill. The Evaluator has a rubric skill, a verification skill. The agents are separated only where separation is the whole point. Everything else stays in-context.

The default I keep coming back to

The framing I use is: skills are how the agent learns more about what it can do. Subagents are how the agent admits there’s a job that doesn’t belong in the conversation it’s having.

Most new capabilities I’m tempted to add are about expanding what the agent knows. Those are skills. The ones that genuinely need separation are about the parent agent doing its actual job uninterrupted while something else handles a different kind of work. Those are subagents.

If I’m not sure which one, I write the skill version first. It’s twelve lines of markdown. If the skill turns out to be the wrong choice, the upgrade to a subagent is straightforward. The reverse, building a subagent and discovering it should have been a skill, costs an actual refactor and the operational habit of remembering to call it.

The cheaper bet is the right starting bet.