The Agent Tending Problem

Why coding agents need an orchestrator, not more agents

27 March 2026

Listen to this essay (26 min)

Narrated by George via ElevenLabs

The Feeling Nobody Has Named

You have three coding agents running. One is refactoring an authentication module. Another is writing integration tests. A third is migrating a database schema. Each can work autonomously for ten, maybe fifteen minutes before it needs your attention: a decision about an interface, a clarification about expected behaviour, a judgment call about whether to break backwards compatibility.

You are cycling between terminal tabs. You check on the auth refactor. It has finished and is waiting. You switch to the test agent. It hit an ambiguous requirement and stalled seven minutes ago. You give it a one-sentence instruction and switch to the database agent. It is still running. You switch back to the auth agent, give it the next task, and glance at the test agent. Still working.

There is a feeling that accompanies this workflow, and as far as I can tell nobody has named it. It is the low-grade anxiety of knowing that somewhere, right now, one of your agents is idle because you have not attended to it yet. Every second an agent waits for your instruction is a second of compute you are paying for and not using. The anxiety scales with the number of agents. It is worse than ordinary multitasking stress because the cost of your inattention is visible, quantifiable, and growing.

Call it idle-agent anxiety. It is the defining emotional experience of the multi-agent era, and it is driven by a precise mathematical structure.


Little's Law for Human Attention

The number of agents you can productively manage in parallel is not a vague function of your multitasking ability. It is governed by a ratio:

N = T_work / T_instruct

Where T_work is the average duration an agent works autonomously before needing your input, and T_instruct is the average time you spend giving it the next instruction (including context-switching cost). If your agents work for 10 minutes between instructions and each instruction takes you 2 minutes (including reading their output, deciding what to do, and typing the prompt), you can manage 5 agents. If instructions take 30 seconds, you can manage 20.

This is structurally identical to Little's Law in queueing theory: L = λW, where L is the average number of items in a system, λ is the arrival rate, and W is the average time each item spends in the system. Here, you are the server. Agents needing attention are the queue. Your throughput is bounded by how fast you can process each "attention request" from an agent.

The formula has three immediate consequences.

First, the bottleneck is always the human. Compute is cheap. Attention is not. You can spin up 50 agents on Codex or Claude Code. You cannot spin up 50 units of your own attention. The system is always attention-bound, never compute-bound.

Second, the highest-leverage investment is increasing T_work, not decreasing T_instruct. Better prompts help (lower T_instruct). But extending the autonomous work interval, giving agents more context, better tools, clearer specifications, the ability to make safe decisions without asking, is worth far more. Doubling T_work doubles your effective parallelism. Halving T_instruct also doubles it, but T_instruct has a floor (you still have to read and think) while T_work has no ceiling.

Third, there is a cognitive ceiling around 5 to 7. This matches Miller's number for working memory. Even if the formula says you could manage 12 agents, your ability to maintain mental models of 12 concurrent workstreams degrades sharply after 7. The practical limit is the minimum of the formula and your cognitive span.


The CNC Machine Tending Analogy

This problem is not new. It has been solved before, in a different domain, with a different name.

In CNC machining, a single operator tends multiple machines. The operator loads raw material into Machine 1, starts the programme, walks to Machine 2, unloads the finished part, loads new material, starts it, checks Machine 3, adjusts tooling, and cycles back to Machine 1. The operator's job is not to run the machines. It is to keep them from being idle.

The parallels are exact:

Manufacturing engineers solved this decades ago. The solution was not "give the operator more machines." It was to build machine-tending systems: robotic loaders, automated material handling, centralised dashboards showing machine status, and supervisory control software that routes the operator's attention to the machine that needs it most.

The equivalent in the AI agent world is an orchestration layer: software that manages agent lifecycles, routes your attention to the agent that is blocked, and handles the mechanical parts of instruction (context loading, state persistence, failure recovery) so you can focus on the judgment calls that only a human can make.


The Gap All Three Platforms Created by Design

Here is the structural fact that makes this essay more than an analogy exercise. Claude Code, Codex, and Gemini Code Assist all block recursive sub-agent spawning. This is not an oversight. It is a deliberate design decision, and it is the correct one.

Claude Code allows spawning sub-agents, but those sub-agents cannot spawn further sub-agents. The recursion is capped at depth one. Codex runs each task in an isolated cloud sandbox with no ability to launch additional Codex instances from within. Gemini Code Assist operates within a bounded session context. In all three cases, the architectural choice is the same: the agentic loop is a leaf node, not a recursive structure.

Why? Because unbounded recursive agent spawning is a denial-of-service vector, a cost explosion vector, and an accountability nightmare. If Agent A can spawn Agent B which can spawn Agent C, you have lost the ability to reason about resource consumption, you have lost the ability to attribute actions to decisions, and you have created a system where a single misspecified goal can generate unbounded computational work. The platforms are right to prevent this.

But the consequence is significant. If the individual agent cannot orchestrate other agents, and if humans are attention-bounded at 5 to 7 concurrent agents, then multi-agent coordination must happen at a layer above the individual agent. The orchestration layer is not optional. It is the structurally necessary complement to the platforms' own design constraint.

The platforms ship the engine. They do not ship the factory floor.


What the Orchestration Layer Must Do

Not everything labelled "agent orchestration" qualifies. Most existing tools are workflow engines with LLM calls bolted on: directed acyclic graphs of prompts, static routing, no fault tolerance. The machine-tending analogy tells us what the orchestration layer actually needs.

Durable state across agent restarts

A CNC machine does not lose its programme when the operator walks away. An orchestrated agent must not lose its context when its process is interrupted, whether by an LLM provider 503, an SSH tunnel drop, or a Cloud Run instance being recycled. State must be persisted by default, not opted into. Recovery must replay from the last known-good checkpoint without re-executing side effects.

A supervision hierarchy

In a machine shop, there is a foreman who monitors all operators, and operators who monitor all machines. The hierarchy exists because attention is scarce and must be allocated efficiently. In the agent world, this means a supervisor process that monitors agent health, restarts failed agents, and escalates to the human only when automated recovery is insufficient. The human should be the court of last resort, not the first responder.

Attention routing

The orchestrator must answer the question: "Which agent needs me most right now?" This is a scheduling problem. Priority might be determined by how long the agent has been blocked (fairness), how critical its task is (urgency), or how much downstream work depends on its output (impact). A good orchestrator makes this decision for you, surfacing the highest-priority blocked agent rather than making you poll all of them.

Protocol-native integration

The orchestrator must speak the protocols that coding agents already understand. Today, that means MCP (Model Context Protocol). An orchestrator that exposes itself as an MCP server can be called by Claude Code, Cursor, Copilot, or any MCP-compatible client with a single configuration entry. No SDK to install, no binary to distribute, no onboarding friction. Tomorrow, it means A2A (Agent-to-Agent protocol, originated at Google and contributed to open governance), which adds structured task delegation and cross-vendor agent discovery.

Policy governance

Different agents should have different permissions. The agent refactoring authentication code should be allowed to modify auth/ but not billing/. The test-writing agent should be allowed to read production code but not modify it. The database migration agent should be gated on human approval before executing destructive DDL. These are not application-level concerns. They are orchestration-level concerns, because they govern what agents are permitted to do across the entire fleet.


Why BEAM Maps onto This Problem

I claimed above that the agent-tending problem is "solved." Let me be precise about what I mean. I am not claiming a formal isomorphism in the algebraic sense. I am claiming that the engineering constraints of the two domains are the same constraints, and that a runtime designed to satisfy one set inherits the properties needed for the other.

The BEAM virtual machine (the runtime for Erlang and Elixir) was designed in the late 1980s at Ericsson for telephone switching. A telephone switch manages millions of concurrent connections, any of which can fail at any time, and the system must continue operating with no perceptible interruption. The engineering constraints were: massive concurrency, per-connection isolation, hierarchical fault recovery, and zero-downtime upgrades.

Map "telephone connection" to "AI agent" and the correspondence is direct:

I am not claiming this is the only runtime that could work. Akka (now Apache Pekko) on the JVM offers actor-model concurrency. Go's goroutines with careful supervision could approximate some of these properties. But none of these alternatives provide all five properties, process isolation, supervision hierarchies, durable state machines, hot code upgrades, and preemptive scheduling, as built-in runtime guarantees rather than application-level concerns. The BEAM does not require you to build fault tolerance. It provides it.


What This Means for the Platforms

Claude Code, Codex, and Gemini are converging on a shared architecture: a powerful single-agent loop with rich tool use, bounded context, and explicit sub-agent spawning at depth one. The competition between them is at the agent level: better models, better tool integration, better context management, longer autonomous work intervals.

But the competition at the orchestration level has barely started. The platforms have, correctly, decided not to build recursive orchestration into the agent. That means orchestration is an external concern. And external concerns become ecosystems.

Consider the analogy to container orchestration. Docker shipped the container. Kubernetes shipped the orchestrator. Docker was necessary but insufficient for production workloads. Kubernetes filled the gap between "I can run a container" and "I can run a fleet of containers reliably." The same structural gap exists today between "I can run an AI agent" and "I can run a fleet of AI agents reliably."

The orchestration layer for AI agents needs to be:


The Uncomfortable Economics

There is a harder version of this argument that I want to make explicit.

As T_work increases (agents become more capable, work longer without interruption), the formula N = T_work / T_instruct says you can manage more agents. But "more agents" means more concurrent computational work, which means more API calls, more tokens, more compute. The cost of running 20 agents for an hour is not 20 times the cost of running one agent. It is 20 times the token cost plus the orchestration overhead plus the failure-recovery cost plus the cost of the human attention required to manage the fleet.

This is where most "just use more agents" thinking breaks down. The implicit assumption is that agents are free and human attention is expensive, so the solution is always more agents. In reality, agents are cheap but not free, human attention is bounded but not zero-cost, and the combinatorial interaction between agents (merge conflicts, shared state, incompatible changes) grows superlinearly with fleet size.

The orchestration layer's job is not to maximise the number of agents. It is to maximise the total useful work per unit of human attention. That is a different optimisation target, and it often means running fewer agents more effectively rather than more agents with less oversight.


From Tending to Delegation

The machine-tending model assumes a human in the loop. But there is a trajectory beyond tending, and it is visible if you follow the manufacturing analogy forward.

CNC machine tending evolved into automated manufacturing cells, then into flexible manufacturing systems, then into lights-out factories where no human is present on the floor. Each step delegated more judgment to the control system. The human moved from operator to supervisor to plant designer to someone who checks a dashboard once a day.

The same trajectory applies to agent orchestration. Today, you tend agents: give instructions, check results, make judgment calls. Tomorrow, the orchestrator handles routine decisions autonomously and escalates only genuine ambiguities. Eventually, the orchestrator becomes a planning system that decomposes high-level goals into agent-sized tasks, assigns them, monitors execution, and reports results. You become the person who sets objectives and reviews outcomes, not the person who types prompts.

But, and this is critical, that trajectory does not eliminate the need for orchestration. It intensifies it. A lights-out factory requires more sophisticated control software than a manually tended one, not less. The orchestrator absorbs complexity that the human used to handle. If the orchestrator is fragile, the entire system is fragile. If the orchestrator loses state, every agent loses context. If the orchestrator has no supervision hierarchy, a single failure cascades.

This is why "just chain some API calls together" is not orchestration. It is scripting. And the gap between scripting and orchestration is the gap between a machine shop where someone manually moves parts between machines and a manufacturing execution system with real-time scheduling, fault recovery, and quality control.


What Already Exists and What Does Not

To be honest about the status of this work. Elan is a working BEAM-native multi-agent runtime. It exists today as a tested Elixir application with 120 passing tests, durable state via event-sourced gen_statem processes, git-native provenance (one branch per agent), policy-governed tool orchestration, and Postgres-backed persistence. Benchmarks show ~2.75ms recovery time, 623 tasks/second scheduling throughput, and 162ms tool timeout enforcement. Live Vertex AI integration is confirmed. The previous essay describes these internals.

What does not yet exist: the MCP server interface, the Claude Code hook receiver, the A2A protocol compliance, and the attention-routing dashboard described in this essay. These are the next phase. The runtime is real. The orchestration layer built on top of it is planned. I am describing the problem and the architecture, not claiming to have shipped the solution.

The next phase has three concrete deliverables:

The atomic primitive is a durable task: an idempotent unit of work with a lifecycle (pending → assigned → running → completed | failed), a lease (time-bounded ownership that prevents orphaned work), and a provenance trail (every state transition recorded as an immutable event). The orchestrator's job reduces to: accept tasks, assign them to agents, supervise execution, and route human attention to blocked tasks. Everything else is a protocol adapter on this primitive.

The API is the canonical surface. An MCP server is a protocol adapter on the API. A web dashboard is a frontend on the API. A CLI is a typed wrapper around the API. Every downstream client compounds the investment in the API.

What the first use looks like concretely: you add Elan as an MCP server in your Claude Code configuration, a single JSON entry pointing at the Elan endpoint. From that point, Claude Code can call elan.create_task, elan.get_next_blocked, and elan.complete_task. The orchestrator persists all state, restarts failed tasks, and surfaces the highest-priority blocked agent when you ask "what needs me?" No SDK, no binary, no onboarding beyond the config entry.


The Strongest Objections

There are three serious arguments against this thesis. I want to address them directly.

"The platforms will build orchestration themselves." Perhaps. But the history of platform companies suggests otherwise. AWS did not build Kubernetes. Kubernetes emerged as an external orchestration layer for containers because container orchestration is a different problem from container execution, with different design constraints and a different user. The same structural argument applies: agent execution (what Claude Code, Codex, and Gemini do) is a different problem from agent orchestration (fleet management, fault recovery, attention routing). The platforms' core competence is model quality and agent capabilities. Orchestration is a distributed systems problem that benefits from a different runtime (fault-tolerant, concurrent, durable) than the one optimised for single-agent inference. It is possible that one of the platforms will build this. It is unlikely that all three will, and a platform-agnostic orchestrator has a structural advantage over one tied to a single vendor's agent.

"LangGraph, CrewAI, and other frameworks already do this." They occupy the space but do not solve the problem. LangGraph is a directed graph of LLM calls with optional checkpointing bolted onto a Python runtime. CrewAI is a role-assignment framework for prompt orchestration. Neither has per-agent process isolation, hierarchical fault recovery, or durable state as a runtime invariant. They are workflow engines, not orchestration runtimes. The difference is the same as the difference between a shell script that runs Docker containers in sequence and Kubernetes. Both "orchestrate containers." One survives a node failure. The distinction matters precisely when things go wrong, which in long-running autonomous agent systems is not an edge case but a certainty.

"Agents will become autonomous enough that orchestration becomes trivial." This is the strongest objection. If T_work approaches infinity (agents that never need human input), the tending problem dissolves. But even in a lights-out scenario, you still need fault recovery (what happens when an agent crashes?), resource management (how many agents should run concurrently?), conflict resolution (what happens when two agents modify the same file?), and audit (what did each agent do and why?). Manufacturing automation did not eliminate the need for manufacturing execution systems. It made them more sophisticated. The same applies here: more capable agents require more sophisticated orchestration, not less.


The Bet

The bet is simple. The major platforms will continue to improve the individual agent: longer context, better tool use, more capable models, longer autonomous work intervals. They will not build recursive orchestration into the agent, because doing so correctly is a distributed systems problem that conflicts with their incentive to keep the agent simple, reliable, and attributable.

This means the orchestration layer will be external. It will be built by someone. And the team that builds it on a runtime designed for exactly this class of problem, massive concurrency, per-process isolation, hierarchical fault recovery, zero-downtime evolution, will have a structural advantage over teams that bolt orchestration onto runtimes that were designed for request-response workloads.

I could be wrong. The platforms could absorb orchestration. Agents could become so reliable that fault recovery is unnecessary. A Python-based framework could add enough infrastructure to match BEAM's concurrency guarantees. These are real possibilities. But they require the platforms to solve a problem orthogonal to their core competence, agents to defy the reliability patterns of every other software system in history, and Python to acquire properties that its runtime was explicitly not designed to provide. I would rather bet on the structural alignment between problem and tool.

The BEAM was built for telephone switches. Telephone switches were the original machine-tending problem: millions of concurrent stateful processes, any of which can fail, managed by a control system that must never go down.

AI agent orchestration is the same problem, with language models instead of phone calls.

The solution is the same too.

About the author: Eduardo Aguilar Pelaez is CTO and co-founder at Legal Engine Ltd and Honorary Research Fellow at the Department of Surgery & Cancer, Faculty of Medicine, Imperial College London. Elan is a BEAM-native multi-agent runtime. The previous essay, Elan: Why AI Agents Need an Operating System, Not a Framework, describes the runtime architecture. This essay describes the orchestration problem it exists to solve.