· ai-agents · pm-workflow · claude-code

What I've Learned Building Agents

A year of building AI agents, four rules of thumb I keep coming back to, and the one that ties the other three together.

I’ve been building AI agents for about a year. Q, a Slack-native triage agent for my team. The Autobots harness, an adversarial multi-agent system for shipping code. A handful of smaller things in between. Most of what I learned came from shipping something wrong, reading the research after, and refactoring.

Four rules of thumb keep coming back to me whenever I have to make a design decision. They’re not original. Each one is published research I read after the fact. I’ve written three of them up in full. The fourth doesn’t have its own piece, because it isn’t really a separate rule. It’s the thing the other three are all about.

The three I’ve written up

Each of those is a real decision I got wrong before I got it right. But they rhyme. Pull on any one of them and the same thread comes loose.

The one that ties them together: context

Every decision above is a decision about how context flows. Function over persona keeps the prompt focused on the task instead of a character. Skills over subagents keeps the work in one continuous context instead of fragmenting it across handoffs. Single-agent over multi-agent keeps the reasoning in one place instead of leaking it across boundaries. Three rules, one underlying constraint.

That framing is Anthropic’s, and the data underneath it is sobering. Chroma’s “context rot” study tested eighteen frontier models and found every single one degrades as input length grows, well before the advertised window limit.

18 of 18
frontier models degrade as input length grows
Chroma, context rot
200K → 50K
where a 200K-context model can already lose precision
Chroma, context rot

The practical version of this is a question I ask about every piece of context I’m about to add: can I say why it’s needed for the task in front of me right now? If I can’t, it’s probably hurting. Skills that pull in tens of thousands of tokens of tool output get reconsidered. Subagent handoffs return distilled summaries, not raw dumps. Persona text that spends 200 tokens on “you take pride in your work” is spending attention budget that could have gone to the actual problem.

This is also why I don’t treat any of these rules as permanent. Model upgrades change the math. A capability that needed subagent isolation at 100K context might work fine as a skill at 1M. The right architecture today can be the wrong one in six months. I revisit my subagent boundaries every time the underlying model gets a major bump.

What I’d tell myself six months ago

Don’t reach for the human-team analogy. Define by function. Default to skills. Don’t escalate to multi-agent without a specific reason. And underneath all of it, watch the context budget, because that’s what the other three rules are protecting.

None of this is settled. The research is still moving, the models are still improving, and the failure modes I’m worried about today might be artifacts of this generation of capabilities. The rules I’d give myself six months ago I’ll probably rewrite six months from now too.

I keep building agents anyway. The four rules just keep showing up.