Whack-a-mole is the wrong game

Chibi sticker art: a white-ponytailed agent stands unaware on a sci-fi map platform while a hooded purple figure crouches in the void underneath, grinning and aiming a gun up through the floor.

The problems I keep hitting building AI agents feel oddly familiar. Not the tech, the problems. I’ll run into something, sit with it, and realize I already worked through a version of it years ago making games, and that I’m reaching for the same fix without thinking about it. I did ten years of games before I touched an agent. Here’s the first fix that came to mind.

In VALORANT, players constantly found ways to get Omen’s teleport outside of the map boundary. From outside they could shoot up through the floor, or through a wall, at people who couldn’t see them and couldn’t shoot back. We patched a spot. They found another one. We patched that. They found a third.

Omen under the map teleport

VALORANT playable characters are also called Agents by the way. Just a coincidence I caught after writing it down, but funny that I’m thinking about Agents in a completely different way now.

This is the part of game design nobody warns you about. You ship a game, and plenty of the players are good and live within the bounds you intend. There are tons more whose entire hobby is finding the edge of every system you built and looking for ways to escape them. They’ll test every map boundary, every ability interaction, every economy, all the way to the limit, and they’ll find things you didn’t design and couldn’t have predicted. The entire idea of this spawned its own hobby, speedrunning. Turn every possible edge, every exploit, every interaction into a coordinated dance for the fastest completion time possible.

For a long time I filed all of that under games. Players are bored and relentless, and there are always more of them than there are of you. Then I started building with AI agents, and the same problem showed up with no players in it at all. The system finds the edges on its own now.

Stop patching pixels

The first instinct with the teleport exploit was to patch the spot. That’s the losing game. There’s always another spot. You can’t enumerate every way a few thousand people will find to break a system, the same way you can’t enumerate every strange thing a model will do across every input it’ll ever see. Chasing them one at a time is whack-a-mole, and the moles are smarter and more numerous than you are.

The same instinct shows up in my AI work. The agent does something I didn’t want, so I add a rule to stop that exact thing. Then it does a slightly different version, so I add another rule. A few months later I’ve got a pile of special cases and no actual boundary.

Build the wall, not the patch

What actually fixed the under-the-map problem wasn’t a better patch. It was a different kind of rule. We made it so that if any part of your character left the playable space, you died instantly. Not “block this teleport spot.” Leave the valid area, by any method, and you’re dead. One rule that made the whole category of exploit pointless. We stopped defending individual pixels and defined the boundary of the world, then enforced it absolutely.

That’s the first move games teach you. When you can’t list all the bad cases, stop trying. Define the edge of the space you actually want, and put a hard wall there.

I built the same thing into my own agent setup last month and didn’t notice until later. The agents I run can touch real files. A wrong command can delete real work. I didn’t try to teach the model every dangerous command, because I’d have missed one. Instead I put a guardrail in front of everything it runs: anything that tries to force-delete a directory gets killed before it executes. No list of cases. One boundary. Leave the safe space and you die instantly.

That’s the VALORANT map edge. I drew the same line in 2021 and in 2026, and only saw it was the same line the second time.

Give the bad behavior a room

The hard wall works when there’s a clean edge to defend. Sometimes there isn’t. Sometimes the behavior you don’t want is going to happen no matter what, and your only real choice is what you do with it once it does.

On Idle Frontier, an idle game I worked on at Kongregate, people cheated. We built detection, they found new methods, we built more detection. Real whack-a-mole, the kind with no clean edge, because “cheating” is a moving target and a determined player always finds the next exploit. Banning them felt right and solved nothing. They’d make new accounts, or they’d churn and tell their friends the studio was hostile. Either way we lost a paying customer and the arms race kept going.

So we stopped trying to remove the cheaters and built them their own world instead. We flagged behavior that looked like cheating and quietly moved those players onto separate leaderboards, where they only ever competed against each other. The cheaters got to keep cheating, against other cheaters, and felt like they were winning. The honest players never saw them and kept a leaderboard that meant something. And the cheaters stayed paying customers.

We didn’t eliminate the bad behavior. We gave it a room with a door, and the door kept it away from everyone else.

Chibi sticker art: two mischievous cheaters penned into their own fenced leaderboard, gleefully competing against each other, while honest players compete fairly on a separate podium across the fence.

That’s the second move. For agents, we use sandboxing. You can’t make a capable model never produce a strange or risky output, so you decide where being wrong is cheap and let it run free there, and you fence the places where being wrong is expensive. I let my agents take their odd paths where a bad result costs nothing. I put a wall around the places where a bad result costs money or deletes something real. Same as the cheaters. The unpredictable behavior gets a room of its own.

In the game

The wall: leave the map by any method, you die instantly.

The room: flagged cheaters get moved to a leaderboard of their own.

In my agents

The wall: a command that tries to force-delete a directory is killed before it runs.

The room: risky work runs where being wrong is cheap, fenced off from where it’s expensive.

Why I keep reaching for games

The default instinct with any system is to want it to do the same thing every time. When it doesn’t, the reflex is to treat that as a defect and stamp it out until it behaves. I had that reflex too. Most of computing runs on it, and for most software it’s exactly right.

Games don’t let you keep that reflex. A game with no variance isn’t a game. Player creativity, emergent strategies, the thing somebody does on stream you never imagined: that unpredictability is the entire point. It’s also where every exploit comes from. So building games is partly the practice of shipping a system you can’t fully predict and deciding, on purpose, where the unpredictability gets to live and where it’s walled off. That’s the muscle I built for a decade before I ever touched an agent, so it’s the one I reach for now.

I don’t think games are special here. Plenty of people hold systems they can’t fully predict for a living and have their own ways through it. Statisticians, distributed-systems people, anyone who’s run a large live service. A lot of roads reach the same place. Games are just the road I know in my hands, so it’s the one this clicked through for me.

I also don’t want to oversell the analogy. The two stories I trust most are both about the adversarial edge: players, or reality, pushing on a system to find where it breaks. That covers a lot of what goes wrong with agents, but not all of it. Some of the variance isn’t anyone pushing on anything. It’s just the model being strange on a perfectly normal input, and the map edge and the cheater room don’t obviously solve that one. I haven’t worked that part out yet.

The core of it I’m pretty sure about. The job was never to make the agent deterministic. You can’t, and chasing it is whack-a-mole forever. The job is the thing games taught me to do. Decide where the variance gets to live, build a hard wall at the edge of the rest, and give the behavior you can’t prevent a room of its own. I keep finding out I already knew how to do this. I learned it years ago, patching a map that players had figured out how to stand underneath.