Which cool, great, I sure love "pip install"ing every time instead of just baking a single container image with it already installed.
This isn't any sort of fancy or interesting sandboxing, this is shelling out to "docker run", and not even using docker as well as it could.
Quoting from the linked page:
> The tradeoff is ~5-10 seconds of container startup overhead
Sure, maybe it's 5-10 seconds if you use containers wrong. Unpacking a root filesystem and spinning up a clean mount namespace on linux is a few ms, and taking more than a second means something is going wrong, like "pip install"ing at runtime instead of buildtime for some reason.
I can spin up a full linux vm and run some code in quicker than 5 seconds.
The "2 lines of code" framing is appealing but hides the real complexity: what happens when the agent needs to make external API calls at runtime?
Sandboxed execution solves the safety problem (agent cannot destroy your filesystem). But autonomous agents also need compute resources — inference, embeddings, image generation — that run outside the sandbox. The payment and authentication for those external calls is where the interesting engineering happens.
An agent running in a sandbox with a funded wallet (USDC on Base L2 via x402) can pay for its own compute without any human in the loop. That is the missing piece between "launch an agent" and "agent runs autonomously for weeks."
Under the hood it's effectively running:
Which cool, great, I sure love "pip install"ing every time instead of just baking a single container image with it already installed.This isn't any sort of fancy or interesting sandboxing, this is shelling out to "docker run", and not even using docker as well as it could.
Quoting from the linked page:
> The tradeoff is ~5-10 seconds of container startup overhead
Sure, maybe it's 5-10 seconds if you use containers wrong. Unpacking a root filesystem and spinning up a clean mount namespace on linux is a few ms, and taking more than a second means something is going wrong, like "pip install"ing at runtime instead of buildtime for some reason.
I can spin up a full linux vm and run some code in quicker than 5 seconds.
Sandboxed execution solves the safety problem (agent cannot destroy your filesystem). But autonomous agents also need compute resources — inference, embeddings, image generation — that run outside the sandbox. The payment and authentication for those external calls is where the interesting engineering happens.
An agent running in a sandbox with a funded wallet (USDC on Base L2 via x402) can pay for its own compute without any human in the loop. That is the missing piece between "launch an agent" and "agent runs autonomously for weeks."
Works great when you have a clear verification signal (tests passing), but what drives convergence when that signal isn’t well-defined?