I put in 15 hours or so with gas town this weekend, from just around the 0.1 release.
Think of as an extended bipolar-optimism-fueled glimpse into the future. Steve's MO is laid out in the medium post - but basically, it's okay to lose things, rewrite whole subsystems, whatever, this is the future. It's really fun and interesting to watch the speed of development.
I've made a few multi agent coding setups in the last year, and I think gas town has the team side about right: big boss (mayor), operations boss (deacon), relatively linear keeper of truth (witness), single point for merges (refiner), lots of coders with their code held lightly.
I love the idea of formulas - a lot of what makes gas town work and informs how well it ultimately will work is the formulas. They're close conceptually to skills.
I don't love the mad max branding, but meh, whatever, it's fun, and a perk of the brave new world where you can make stuff like for a few hundred bucks a month sent to anthropic - software can have personality again, yay.
Conceptually I think there is a product team element to this still missing - deploy engineers, product managers, visual testing. Everything is sort of out there, janky in parts, but workable to glue together right now, and will only improve. That said, the mad max town analogy is going to get overstretched at some point; we already have pretty good names for all the parts that are needed, and as coordination improves, we're going to want to add more stuff into the coordination. So, I'd like to see a version of this with normal names and expanded.
Upshot - worth a look - if beads is any indication, give it a month or two or four to settle down unless you like living on the bleeding bleeding edge.
As someone who never saw Mad Max, Slow Horses, Cat’s Cradle, Breaking Bad and only saw Waterworld when I was a kid all the references in this post went completely over my head, and I just think of words used in there as their own terminology. Like, if non-engineers read about chemical production.
The article was pretty Ok. Kubernetes has it's own share of obnoxious terminology that often comes up as "we name it different so that it doesn't sound like AWS". At some point you just accept the terminology in relation to the tool you use and move on.
How do you do the multi agent setups in containers? I keep trying to figure out ways to start with stuff like this but it always boils down to I don't want to give entirely autonomous agents access to my entire filesystem and/or github perms. I just want them to be able to hack away in their own container and produce a pr I can read or test. I think something like a local git with the remote in the container pointing at the version on the machine could be a start but setting all that up is not trivial. As far as I can tell Steve is just running everything on the base machine in multiple worktreees/multiple clones of the project - which seems to put enormous amounts of trust on agents to actually create branches in a disciplined way. I can't imagine they can be trusted to?
Yes, definitely. I spent about half that time poking around, understanding the setup, doing some bug fixing and put in a PR for gas town itself, although I used Claude Code separately for making the PR.
I pointed it at a Postgres time series project I was working on, and it deployed a much better UI and (with some nudging) fixed docker errors on a remote server, which involved logging in to the server to check logs. It probably opened and fixed 50 or so beads in total.
I'd reach for it first to do something complicated ("convoy" or epic) over Claude Code even as is -- like, e.g. "copy this data ingestion we do for site x, and implement it for sites y,z,,a,b,c,d. start with a formal architecture that respects our current one and remains extensible for all these sites" is something I think it would do a fair job at.
As to cost - I did not run out of my claude pro max subscription poking around with it. It infers ... a lot ... though. I pulled together a PR that would let you point some or all of the agent types at local or other endpoints, but it's a little early, I think for the codebase. I'd definitely reach for some cheaper and/or faster inference for some of the use cases.
Gergely Orosz (The Pragmatic Engineer) interviewed Yegge [1] and Kent Beck [2], both experienced engineers before vibe coding, and they express similar sentiments about how LLMs reinvigorated their enjoyment of programming. This introduction to Gas Town is very clear on its intended audience with plenty of warnings against overly eager adoption. I agree that using tools like this haphazardly could lead to disaster, but I would not dismiss the possibility that they could be used productively.
Anecdote, but some of the time when I am blasted after a day of thinking for my job all day a design session randomly throwing shit at an LLM hits the spot. I usually make some meaningful progress on a pet project. I rarely let the LLM do much pure vibe coding. I iterate with several LLMs until it looks and feels right and then hack on it myself or let the LLM do drudgery like refactoring or boilerplate to get me over the humps. In that sense I do strongly agree.
Beck was in Melbourne a few weeks ago, and his take on LLM usage was so far divorced from what Yegge is doing that their views on what LLMs are capable of in early 2026 are irreconcilable.
It's techno-freemasonry. One must break through the symbolism. The author wielding it and transmitting it cannot just plainly say the knowledge. We don't have the vocabulary or grammar for these new things, so storytelling and story universes convey it. The zoomorphism and cinematic references ground us in what all these bots are doing mimetically.
I'm excited the author shared and so exuberantly; that said I did quick-scroll a bunch of it. It is its own kind of mind-altering substance, but we have access to mind-bending things.
If you look at my AgentDank repo [1], one could see a tool for finding weed, or you could see connecting world intelligence with SQL fluency and pairing it with curated structured data to merge the probabilistic with the deterministic computing forms. Which I quickly applied to the OSX Screentime database [2].
Vibe coding turned a corner in November and I'm creating software in ways I would have never imagined. Along with the multimodal capabilities, things are getting weirder than ever.
Mr Yegge now needs to add a whole slew of characters to Gas Town to maintain multi-modal inputs and outputs and artifacts.
Just two days I go, I had LLMs positioning virtual cameras to render 3D models it created using the Swift language after looking at a picture of what to make, and then "looking" at the results to see the next code changes. Crazy. [3]
ETA: It was only 14 months earlier that I was amazed that a multi-modal model could identify a trend in a chart [4].
"Lets make extreme generalizations about tens of thousands of people because of an extremely unique outlier (who doesn't even belong to that group of people)."
The shocking changes to the culture over the last 20 years start to make a lot more sense when you realize someone decided to flood the society with mass quantities of prescription Amphetamines.
The writing doesn't feel particularly out of character for Yegge, who has always been at least a bit like this. (Though I don't know if that's just him, or drugs as well.)
This reminds me so much of my own experience with AI fueled dev mania. Rapidly build semi-functional wonders, then pivot to something shiny and new to avoid the QA trudge of polishing that wonderous turd.
This thing is so long and poorly written I honestly quit reading. I got all the way to Gas Town 101. If some LLM didn't write most of this I'd be surprised. These things all have the same tone, the breathless, maniacally uncritical, young Robin Williams aura but none of the comedy.
The article seems to be about fun, which I'm all for, and I highly appreciate the usage of MAKER as an evaluation task (finally, people are actually evaluating their theories on something quantitative) but the messaging here seems inherently contradictory:
> Gas Town helps with all that yak shaving, and lets you focus on what your Claude Codes are working on.
Then:
> Working effectively in Gas Town involves committing to vibe coding. Work becomes fluid, an uncountable that you sling around freely, like slopping shiny fish into wooden barrels at the docks. Most work gets done; some work gets lost. Fish fall out of the barrel. Some escape back to sea, or get stepped on. More fish will come. The focus is throughput: creation and correction at the speed of thought.
I see -- so where exactly is my focus supposed to sit?
As someone who sits comfortably in the "Stage 8" category that this article defines, my concern has never been throughput, it has always been about retaining a high-degree of quality while organizing work so that, when context switching occurs, it transitions me to near-orthogonal tasks which are easy to remember so I can give high-quality feedback before switching again.
For instance, I know Project A -- these are the concerns of Project A. I know Project B -- these are the concerns of Project B. I have the insight to design these projects so they compose, so I don't have to keep track of a hundred parallel issues in a mono Project C.
On each of those projects, run a single agent -- with review gates for 2-3 independent agents (fresh context, different models! Codex and Gemini). Use a loop, let the agents go back and forth.
This works and actually gets shit done. I'm not convinced that 20 Claudes or massively parallel worktrees or whatever improves on quality, because, indeed, I always have to intervene at some point. The blocker for me is not throughput, it's me -- a human being -- my focus, and the random points of intervention which ... by definition ... occur stochastically (because agents).
Finally:
> Opus 4.5 can handle any reasonably sized task, so your job is to make tasks for it. That’s it.
This is laughably not true, for anyone who has used Opus 4.5 for non-trivial tasks. Claude Code constantly gives up early, corrupts itself with self-bias, the list goes on and on. It's getting better, but it's not that good.
a response like this is confusing to me. what you are saying makes sense, but seems irrelevant. something like gas town is clearly not attempting to be a production grade tool. its an opinionated glimpse into the future. i think the astethic was fitting and intentional.
this is the equivalent of some crazy inventor in the 19th century strapping a steam engine onto a unicycle and telling you that some day youll be able to go 100mph on a bike. He was right in the end, but no one is actually going to build something usable with current technology.
Opus 4.5 isnt there. But will there be a model in 3-5 years thats smart enough, fast enough, and cheap enough for a refined vision of this to be possible? Im going to bet on yes to that question.
> something like gas town is clearly not attempting to be a production grade tool.
Compare to the first two sentences:
> Gas Town is a new take on the IDE for 2026. Gas Town helps you with the tedium of running lots of Claude Code instances. Stuff gets lost, it’s hard to track who’s doing what, etc. Gas Town helps with all that yak shaving, and lets you focus on what your Claude Codes are working on.
Compared to your read, my read is confused: is it or is it not intending to be a useful tool (we can debate "production" quality, here I'm just thinking something I'd actually use meaningfully -- like Claude Code)?
I think the author wants us to take this post seriously, so I'm taking it seriously, and my critique in the original post was a serious reaction.
... no one ever used crypto to buy things. most engineers are currently already using AI. such a dumb comparison that really just doesnt pass the sniff test.
People use crypto all the time to buy dollars. Thats its main purpose: spend sanctioned rubles to buy crypto to buy dollars; use randomware to coersively obtain crpyto to buy dollars, etc.
Inside scoop: the pub group who owned that pub (still going, owns four in Cambridge and environs) was cofounded by Steve Early, a Cambridge computer scientist who wrote his own POS software, so it was very much a case of "yeah, that sounds like fun, I'll add it". (Until tax and primary rate risk made it not fun, so it was removed.)
For anyone who takes doing their taxes seriously, this is a nightmare. Every pint ordered involves a capital gain (or loss) for the buyer. At a certain point you're doing enough accounting that you might as well be running the bar yourself (or just paying in cash)!
Meanwhile here I am at stage 0. I work on several projects where we are contractually obliged to not use any AI tools, even self-hosted ones. And AFAIK there's now a growing niche of mostly government projects with strict no-AI policy.
I’m luckily in a situation where I can afford to explore this stuff without the concerns that come from using it within an organization (and those concerns are 100% valid and haven’t been solved yet, especially not by this blog post)
> For instance, I know Project A -- these are the concerns of Project A. I know Project B -- these are the concerns of Project B. I have the insight to design these projects so they compose, so I don't have to keep track of a hundred parallel issues in a mono Project C. On each of those projects, run a single agent -- with review gates for 2-3 independent agents (fresh context, different models! Codex and Gemini). Use a loop, let the agents go back and forth.
Can you talk more about the structure of your workflow and how you evolved it to be that?
I've tried most of the agentic "let it rip" tools. Quickly I realized that GPT 5~ was significantly better at reasoning and more exhaustive than Claude Code (Opus, RL finetuned for Claude Code).
"What if Opus wrote the code, and GPT 5~ reviewed it?" I started evaluating this question, and started to get higher quality results and better control of complexity.
I could also trust this process to a greater degree than my previous process of trying to drive Opus, look at the code myself, try and drive Opus again, etc. Codex was catching bugs I would not catch with the same amount of time, including bugs in hard math, etc -- so I started having a great degree of trust in its reasoning capabilities.
It's a Claude Code plugin -- it combines the "don't let Claude stop until condition" (Stop hook) with a few CLI tools to induce (what the article calls) review gates: Claude will work indefinitely until the reviewer is satisfied.
In this case, the reviewer is a fresh Opus subagent which can invoke and discuss with Codex and Gemini.
One perspective I have which relates to this article is that the thing one wants to optimize for is minimizing the error per unit of work. If you have a dynamic programming style orchestration pattern for agents, you want the thing that solves the small unit of work (a task) to have as low error as possible, or else I suspect the error compounds quickly with these stochastic systems.
I'm trying this stuff for fairly advanced work (in a PhD), so I'm dogfooding ideas (like the ones presented in this article) in complex settings. I think there is still a lot of room to learn here.
I'm sure we're just working with the same tools thinking through the same ideas. Just curious if you've seen my newsletter/channel @enterprisevibecode https://www.enterprisevibecode.com/p/let-it-rip
This is clearly going to develop the same problem Beads has. I've used it. I'm in stage 7. Beads is a good idea with a bad implementation. It's not a designed product in the sense we are used to, it's more like a stream of consciousness converted directly into code. There are many features that overlap significantly, strange bugs, and the docs are also AI generated so have fun reading them. It's a program that isn't only vibe coded, it was vibe designed too.
Gas Town is clearly the same thing multiplied by ten thousand. The number of overlapping and adhoc concepts in this design is overwhelming. Steve is ahead of his time but we aren't going to end up using this stuff. Instead a few of the core insights will get incorporated into other agents in a simpler but no less effective way.
And anyway the big problem is accountability. The reason everyone makes a face when Steve preaches agent orchestration is that he must be in an unusual social situation. Gas Town sounds fun if you are accountable to nobody: not for code quality, design coherence or inferencing costs. The rest of us are accountable for at least the first two and even in corporate scenarios where there is a blank check for tokens, that can't last. So the bottleneck is going to be how fast humans can review code and agree to take responsibility for it. Meaning, if it's crap code with embarrassing bugs then that goes on your EOY perf review. Lots of parallel agents can't solve that fundamental bottleneck.
>This is clearly going to develop the same problem Beads has. I've used it. I'm in stage 7. Beads is a good idea with a bad implementation. It's not a designed product in the sense we are used to, it's more like a stream of consciousness converted directly into code. There are many features that overlap significantly, strange bugs, and the docs are also AI generated so have fun reading them. It's a program that isn't only vibe coded, it was vibe designed too.
Yeah this describes my feeling on beads too. I actually really like the idea - a lightweight task/issue tracker integrated with a coding agent does seem more useful than a pile of markdown todos/plans/etc. But it just doesnt work that well. Its really buggy and the bugs seem to confuse the agent since it was given instructions to do things a certain way that dont work consistently.
I tried using beads. There kept being merge conflicts and the agent just kept one or the other changes instead of merging it intelligently, killing any work I did on making tasks or resolving others. Still haven't seen how beads solves this problem... and it's also an unnecessary one. This should be a separate piece of it that doesn't rely on agent not funging up the merge.
How long until Atlassian makes "JIRA for Agents" where all your tasks and updates and memory aren't stored in Git (so no merge conflicts) but are still centralized and shareable between all your agents/devs/teams..
You might like linear-beads[1] better. It's a simpler and less invasive version of beads I made to solve some of the unusual design choices. It can also (optionally) use linear as the storage backend for the agent's tasks, which has the excellent side effect that you as a human can actually see what the agent is working on and direct the agent from within linear.
Despite it's quirks I think beads is going to go down as one of the first pieces of software that got some adoption where the end user is an agent
Linear is great, it's what JIRA should've been. Basically task management for people who don't want to deal with task management. It's also full featured, fast (they were famously one of the earlier apps to use a local-first sync-engine style architecture), and keyboard-centric.
Definitely suitable for hobby projects, but can also scale to large teams and massive codebases.
> Course, I’ve never looked at Beads either, and it’s 225k lines of Go code that tens of thousands of people are using every day. I just created it in October. If that makes you uncomfortable, get out now.
It unlocks a (still) hidden multiagent orchestration function in Claude code. The person making it unminified the code and figured out how to unlock it.
I find it quite well done - I started a orchestrator project a few days ago and scrapped it because it'll be fully integrated soon it seems.
Why can't you use an issue tracker that is built into the git control, like forgejo? It feels like it would be easy to use this with an API key, out or direct database access (I'm doing both with agents). If you self host, you've got a very standard and reliable issue tracker. Why does beads need to exist? What can't an agent do with that setup I've described above?
I believe the idea is that it's for managing many fine-grained todo's to keep the agents on track. When multiple code agents are working at the same time and when there's a merge conflict, the code agents can do the merges, too.
But yeah, I'm only running one code agent at a time, so that's not a problem I have. I should probably start with just a todo list as plain text.
I am wondering if it would be a viable strategy to vibe code almost "in reverse" - take a giant ball of slop such as beads, and use agents to strip away feature after feature until you are left with only exactly what you need, streamlined to your exact workflow. Maybe it'd be faster to just start from scratch, but it might be an interesting experiment. Most of my struggles with using beads so far have come from being off the #1 use case of too many of its features, and having to slog through too much documentation to know what to actually use.
> I actually have six species of bamboo on my property.
I have enjoyed Steve's rants since "Execution in the Kingdom of Nouns" and the Google "Platform rant", but he may need someone to talk to him about bamboo and what a terrible life choice it is. Unless you can keep it the hell away from you and your neighbours it is bad, very bad. I'm talking about clumping varieties, the runners are a whole other level.
It's perfectly on brand for an AI advocate to have a fast-growing invasive species that's going to externalize costs onto his neighbors and damage the local ecosystem.
So, can anyone point me to a video where someone is doing this to produce something meaningful, real world stuff? I want to see the requirements being decided upfront, followed by the prompting and then "coding" by the agents. Not some, fly by the seat of my pants and we stop when "it kinda looks okay" thing. E.g. an end-to-end SaaS or whatever non-trivial.
I've replaced a couple of apps that I would have previously paid for with my own vibe coded agent based setup similar to GasTown.
What I am finding most beneficial almost immediately is I have a dedicated Telegram channel that I can post all sorts of unstructured data into and it's automatically routed via LLMS and stored into the right channel and then other agents work on that data to provide me insights. I have a calorie counter, workout capture, reminders, daily diary prompts all up and running as of right now, and honestly it's better than anything I could have bought "off the shelf"
Last night I needed a C# console app to convert PDFs to a sprite sheet. I spent 30 seconds writing the prompt and another 30 seconds later the app was running and successfully converting PDFs on the first try. I then spent about another 2 mins adding a progress bar, tweaking the output format and moving the main logic into a new library.
Sure. I do that too. However, the article talks about something very different. What you describe is Stage 2 or 3 as listed in the article. I want to see a demonstration of Stage 8 in action.
> First, you should locate yourself on the chart. What stage are you in your AI-assisted coding journey?
> Stage 1: Zero or Near-Zero AI: maybe code completions, sometimes ask Chat questions
> Stage 2: Coding agent in IDE, permissions turned on. A narrow coding agent in a sidebar asks your permission to run tools.
> Stage 3: Agent in IDE, YOLO mode: Trust goes up. You turn off permissions, agent gets wider.
> Stage 4: In IDE, wide agent: Your agent gradually grows to fill the screen. Code is just for diffs.
> Stage 5: CLI, single agent. YOLO. Diffs scroll by. You may or may not look at them.
> Stage 6: CLI, multi-agent, YOLO. You regularly use 3 to 5 parallel instances. You are very fast.
> Stage 7: 10+ agents, hand-managed. You are starting to push the limits of hand-management.
> Stage 8: Building your own orchestrator. You are on the frontier, automating your workflow.
> *If you’re not at least Stage 7, or maybe Stage 6 and very brave, then you will not be able to use Gas Town. You aren’t ready yet.*
Boy this smells a lot like early days of blogging about block chains, specifically ethereum and friends.
It's not that there's nothing useful, maybe even important, in there, it's just so far it's all just the easy parts: playing around inside a computer.
I've noticed a certain trend over the years where you get certain types of projects that get lots of hype and excitement and much progress seems to be made, but when you dig deep enough you find out that it's all just the fun, easy sort of progress.
The fun progress, which not at all coincidentallly tends to also be the easy progress, is the type that happens solely inside a computer.
What do I mean by that? I mean programs who only operate at the level of artificial computer abstractions.
The hard part is always dealing with "the real world": hardware that returns "impossible" results to your nicely abstract api functions, things that stop working in places they really shouldn't be able to, or even, and this is the really tricky bit, dealing with humans.
Databases are a good example of this kind of thing. It's easy to start off a database writing all the clever (and fun) bits like btrees and hash maps and chained hashes that spill to disk to optimize certain types of tables and so on, but I'd wager that at least half of the code in a "real" database like sqlite or postgresql is devoted to dealing with strange hardware errors or leaky api abstractions across multiple platforms or the various ways a human can send nonsensical input into the system and really screw things up.
I'd also bet that this type of code is a lot less fun to write and took much longer than the rest (which incidentally is why I always get annoyes when programming language demos show code with only a happy path, but that's another rant and this comment is already excessive).
Anyways, this AI thing is definitely a gold rush and it's important to keep in mind that there was in fact a lot of gold that got dug up but, as everyone constantly repeats, the more consistent way to benefit is sell the shovels and this is very definitely an ad for a shovel.
I love this take. I think what you're describing is very much happened with the Industrial Revolution. It was just a bunch of powerful, dangerous machines at first, doing small jobs faster. Scaling it up took the whole planet and a long time.
I think we are at the beginning of the second such journey. Lots of people will get hurt while we learn how to scale it up. It's why I've gone with dangerous sounding theming and lots of caution with Gas Town.
It's not a coincidence that all those articles and tutorials urge you to use agents to spend tokens and write more agents that spend more tokens and talk to even more LLMs, and write even more agents and wrappers... I don't know to which end. Probably to spend tokens until your wallet bleeds dry, I guess.
Agents and wrappers that put you deeper into LLM spending frenzy is like the new "todo app".
With dozens of agents running at a time, it must cost $1000's to build anything non-trivial. Is there a business model behind this project, or is he just made of money?
Is there a term for AI-fueled dev psychosis? "AI architecture astronaut" ? There should be one if not. Or maybe just AI-fueled hucksterisim...
I recognize 100% that a tool to manage ai agents with long term context tracking is going to be a big thing. Many folks have written versions of this already. But mashing together the complexity of k8s with a hodge podge of lotr and mad max references is not it.
Its like the complexity of J2EE combined with AI-fueled solipsim and a microdosing mushroom regime gone off the rails. What even are all the layers of abstractions here? and to build what? What actual apps or systems has this thing built? AFAICT it has built gas town, and nothing else. Not surprising that it has eaten its own tail.
The amount of jargon, ai art, pop culture references, and excessive complexity going on here is truly amazing, and I would assume its satire if I didn't know Yegge's style and previous writings. Its like someone looked at the amount of overlapping and confusing tools Anthropic has released around Claude Code, and said "hold my beer, hand me 3 red bulls and a shot of espresso, I can top that!".
I do think a friend of mine nailed it though with this quote:
"This whole "I'm using agents to write so much software" building-in-public trend, but without actually showing what they built, reminds me of the people selling courses on stock trading or drop shipping."
The amount of get-rich quick schemes around any new tech are boundless. As yegge himself points out in the post towards the end, you'd be surprised what you can pull off with a ridiculous blog post, big-tech reputation, and excessive LOC dev-tools in a hype-driven market. How could it be wrong if it aligns so closely with so many CEOs dreams?
Naming your energy-guzzling "just throw more agents at it" thingamajig after a location in the post-apocalyptic Mad Max universe is certainly a choice.
That's the OLD way of thinking! The future is bigger and bigger vibe-coded machines for faster and faster vibe coding, oceans of unread code piped back into the intake valve, for the glorification of itself and its own inevitability. "Practical" "applications" are merely speedbumps in the way of our new Singularity Engines, shooting out million-line diffs that will not, and SHOULD NOT, be useful for anything. We will know when we have achieved success when we no longer even consider computer programming a tool for solving real-world problems.
Everyone keeps being angry at me when I mention that the way things are going, future development will just be based on "did something wrong while writing code? all good, throw everything out and rewrite, keep pulling the level of the slot machine and eventually it'll work". It's a fair tactic, and it might work if we make the coding agents cheap enough.
I'll add a personal anecdote - 2 years ago, I wrote a SwiftUI app by myself (bare you, I'm mostly an infrastructure/backend guy with some expertise in front end, where I get the general stuff, but never really made anything big out of it other than stuff on LAMPP back in 2000s) and it took me a few weeks to get it to do what I want to do, with bare minimum of features. As I was playtesting my app, I kept writing a wishlist of features for myself, and later when I put it on AppStore, people around the world would email me asking for some other features. But life, work and etc. would get into way, and I would have no time to actually do them, as some of the features would take me days/weeks.
Fast forward to 2 weeks ago, at this point I'm very familiar with Claude Code, how to steer multiple agents at a time, quick review its outputs, stitch things together in my head, and ask for right things. I've completed almost all of the features, rewrote the app, and it's already been submitted to AppStore. The code isn't perfect, but it's also not that bad. Honestly, it's probably better from what I would've written myself. It's an app that can be memory intensive in some parts, and it's been doing well from my testings. On top of it, since I've been steering 2-3 agents actively myself, I have the entire codebase in my mind. I also have overwhelming amount of more notes what I would do better and etc.
My point is, if you have enough expertise and experience, you'll be able to "stitch things together" cleaner than others with no expertise. This also means, user acquisition, marketing and data will be more valuable than the product itself, since it'll be easier to develop competing products. Finding users for your product will be the hard part. Which kinda sucks, if I'll be honest, but it is what it is.
> It's a fair tactic, and it might work if we make the coding agents cheap enough.
I don’t see how we get there, though, at least in the short term. We’re still living in the heavily-corporate-subsidized AI world with usage-based pricing shenanigans abound. Even if frontier models providers find a path to profitability (which is a big “if”), there’s no way the price is gonna go anywhere but up. It’s moviepass on steroids.
Consumer hardware capable of running open models that compete with frontier models is still a long ways away.
Plus, and maybe it’s just my personal cynicism showing, but when did tech ever reduce pricing while maintaining quality on a provided service in the long run? In an industry laser focused on profit, I just don’t see how something so many believe to be a revolutionary force in the market will be given away for less than it is today.
Billions are being invested with the expectation that it will fetch much more revenue than it’s generating today.
Many of the billions being invested are for the power bill of training of new models. Not to mention the hardware needed to do so. Any hardware training a new model, isn't being used for inference.
If training of new models ceased, and hardware was just dedicated to inference, what would that do to prices and speed? It's not clear to me how much inference is actually being subsidized over the actual cost to run the hardware to do it. If there's good data on that I'd love to learn more though.
Or, if it does _now_, how long it'll be before it' will work well using downloadable models that'll run on, say, a new car's worth of Mac Studios with a bunch of RAM in them to allow a small fleet of 70B and 120B models (or larger) to run locally? Perhaps even specialised models for each of the roles this uses?
There are already providers that are comically cheap at the moment. Minimax/M2.1 for example is good-enough for many things and 15min long sessions still cost something like 2c. Another version or two and Claude is likely going to start feeling the pressure (or minimax will raise the prices - we'll see)
> There are already providers that are comically cheap at the moment.
But how many of those providers are too subsidizing their offering through investment capital? I don't know offhand of anyone in this space that is running at or close to breakeven.
It feels very much like the early days of streaming when you could watch everything with a single Netflix account. Those days are long gone and never coming back.
I don't think it matters in practice. Yesterday we could run GPT oss, today we can run m2.1, tomorrow we'll run something comparable to Opus4.5. The open models are getting so good that what we get at the end of this year will be "good enough" locally for years, even if everything else burns down.
This seems too pessimistic? Moore's law is a thing. Price reductions with better quality have been going on for a long time in computing and networking.
We're also seeing significant price reductions every year for LLM's. Not for frontier models, but you can get the equivalent of last year's model for cheaper. Hard to tell from the outside, but I don't think it's all subsidized?
I think maybe people over-updated on Bitcoin mining. Most tech is not inherently expensive.
> We’re still living in the heavily-corporate-subsidized AI world
There's little evidence this is true. Even OpenAI who is spending more than anyone is only losing money because of the free version of ChatGPT. Anthropic says they will be profitable next year.
> Plus, and maybe it’s just my personal cynicism showing, but when did tech ever reduce pricing while maintaining quality on a provided service in the long run? In an industry laser focused on profit, I just don’t see how something so many believe to be a revolutionary force in the market will be given away for less than it is today.
Really?
I mean I guess I'm showing my age but the idea I can get a VM for a couple of dollars a month and expect it to be reliable make me love the world I live in. But I guess when I started working there was no cloud and to get root on a server meant investing thousands of dollars.
According to Ed Zitron, Anthropic spent more than it's total revenue in the first 9 months of 2025 on AWS alone: $2.66 billion on AWS compute on an estimated $2.55 billion in revenue. That's just AWS, not payroll, not other software or hardware spend. He's regularly reporting concrete numbers that look horrible for the industry while hyperscalers and foundation model companies continue to make general statements while refusing to get specific or release real revenue figures. If you only listen to what the CEOs are saying, then sure it sounds great.
Anthropic also said that AI would be writing 95% of code in 3 months or something, however many months ago that was.
> but when did tech ever reduce pricing while maintaining quality on a provided service in the long run
That's an old world that we experienced in 2000s, and maybe in early 2010s, where we cared about the quality on a provided service in the long run. For anything web-app-general-stuff related, that's long gone, as everyone (reads: mostly everyone) has very short attention span, and what is needed is "if the thing i desire can be done right now". In long run? Who cares. I keep seeing this in every day life, at work, discussions with my previous clients and etc.
Once again, I wish it wasn't true, but nothing is pointing that it's not true.
I think for beginners, it might be more like a roguelike? You go off in the wrong direction entirely and die, but you learn something and start again.
Since we have version control, you can restart anywhere if you think it's a good place to fork from. I like greenfield development, but I suspect that there are going to be a lot more forks from now on, much like the game modding scene.
The thing about beginners (and I'm sure we can all relate to them from our past) is they won't really know which path is good or bad. In roguelike, when you make a mistake, you kinda know why you made a mistake and how you got there. For beginners, even if you have version control, you never developed that "sense of what feels right". Or "there HAS TO BE a simpler way of doing it, i just have to ask" sense. I have no idea how to describe it, but I think you might get what I mean?
Well, at some point you'd learn by maxing out your credit card on your cloud bill, or getting hacked and losing all your users' data, or...
Companies with money-making businesses are gonna find themselves in an interesting spot when the "vibe juniors" are the vast majority of the people they can find to hire. New ways will be needed to reduce the risk.
The difficulty comes in managing the agent. Ensuring it knows the state of the codebase, conventions to follow, etc. Steering it.
I've had the same experience as you. I've applied it to old projects which I have some frame of reference for and it's like a 200x speed boost. Just absolutely insane - that sort of speed can overcome a lot of other shortcomings.
I don't know if I'm understanding you correctly, but my experience reflects what (I think) you're saying. Given a fully formed older project and a clean set of feature requests, Claude can be a beast. On the other hand, steering it through a greenfield project feels more labor intensive than writing the code myself.
I'm a full stack dev, and solo, so I write data schema, backends and frontends at the same time, usually flipping between them to test parts of new features. As far as AI use, I'm really just at the level of using a single Claude agent in an IDE - and only occasionally, because it writes a lot of nonsense. So maybe I'm missing out on the benefits of multiple agents. But where I currently see value in it is in writing (1) boilerplate and (b) sugar - where it has full access to a large and stable codebase. Where I think it fails is in writing overarching logical structures, especially early on in a project. It isn't good at writing elegant code with a clear view of how data, back and front should work together. When I've tried to start projects from scratch with Claude, it feels like I'm fighting against its micro-view of each piece of code, where it's unable to gain a macro-view of how to orchestrate the whole system.
So like, maybe a bottomless wallet and a dozen agents would help with that, but there isn't so much room for errors or bugs in my work code as there is in my fun/play/casual game code. As a result I'm not really seeing that much value in it for paid work.
I've found it to do quite well if you form a detailed design doc and you state all your implementation detail opinions up front. Architecture, major third party libraries, technologies, etc. But it can generate a lot of code very fast - it's hard to steer everything. There is certainly a tradeoff between speed and control. At one end, if you want to specify how every single line is written then yeah, it's going to be faster if you do it yourself. On the other hand, if you want to let it make more assumptions on implementation details, it can go extremely fast.
If your end goal is to produce some usable product, then the implementation details matter less. Does it work? Yes? OK then maybe dont wrestle with the agent over specific libraries or coding patterns.
We intend to sing the love of danger, the habit of energy and fearlessness.
Courage, audacity, and revolt will be essential elements of our poetry.
Up to now literature has exalted a pensive immobility, ecstasy, and sleep. We intend to exalt aggresive action, a feverish insomnia, the racer’s stride, the mortal leap, the punch and the slap.
We affirm that the world’s magnificence has been enriched by a new beauty: the beauty of speed. A racing car whose hood is adorned with great pipes, like serpents of explosive breath—a roaring car that seems to ride on grapeshot is more beautiful than the Victory of Samothrace.
…
https://www.arthistoryproject.com/artists/filippo-tommaso-ma...
>It’s also 100% vibe coded. I’ve never seen the code, and I never care to, which might give you pause. ‘Course, I’ve never looked at Beads either, and it’s 225k lines of Go code that tens of thousands of people are using every day.
225k lines for a cli issue tracker? What the fuck?
Curious what fidelity/precision the author finds necessary with Claude 4.5 Opus/GPT 5.2.
Looking at the screenshot of "Tracked Issues", it seems many of the "tasks" are likely overlapping in terms of code locality.
Based on my own experience, I've found the current crop of models to work well at a slightly higher-level of complexity than the tasks listed there, and they often benefit from having a shared context vs. when I've tried to parallelize down to that level of work (individual schema changes/helper creation/etc.).
Maybe I'm still just unclear on the inner workings, but it's my understanding each of those tasks is passed to Claude Code and developed separately?
In either case, I think this project is a glimpse into the future of software development (albeit with a grungy desert punk tinted lens).
For context, I've been "full vibe-coding"[0] for the past 6 months, and though it started painfully, the models are now good enough that not reading the code isn't much of an issue anymore.
A Steve Yegge blog post was made to be shortened with AI. :)
I think Gas Town looks interesting directionally and as a PoC. Like it or not, that's the world we'll end up in. Some products will do it well and some will be horrible monsters. (Like I'm already dreading Oracle Gas Town and Azure Gas Town).
I think the Amp coding agent trends in the direction of Gas Town already. Powerful but expensive, uses a mix of models and capabilities to do something that's greater than the sum of the parts.
> I got tired of trying to make them line up, sorry!
IMHO, it's less disorienting to have the post dated after the comments than it is to see a comment you thought you wrote a couple days ago but is dated today. So you're welcome to stop trying to line up timestamps.
How about recording two dates on the post, the original post date and the re-upped date, and then putting something like "14 hours ago; originally 2 days ago" in the post header?
How about simply not tampering with the date on the post?
The most I imagine most folks saying is "Didn't I see this post on the front page days ago?". For many other discussion fora, it's not uncommon for posts to be at the top of the pile for many days... so a days-old post date should be nothing unusual.
That's exactly what we tried and didn't work. The amount of noise it generated was vastly greater than the noise the status quo generates, unsatisfactory though it is.
Agreed, although perhaps not as strongly; I remember seeing these comments from a few days ago, so I was feeling a little gaslit seeing the new timestamps and wondered if I'd hallucinated the original thread.
I had a lot of fun reading the articles about Gas Town although I started to lose track of the odd naming. Only odd because they make sense to Steve and others who have seen the Mad Max, Water World movies.
I promptly gave Claude the text to the articles and had him rewrite using idiomatic distributed systems naming.
Thank you! Is this the future? Everyone gets to have their own cutesy translation of everything? If I want "kubectl apply" to have a Tron theme, while my coworker wants a Disney theme. Is the runbook going to be in Klingon if I'm fluent in that?
Are many, many Agents going to produce better quality outputs than 1 Agent?
Assuming this isn't a parody project, maybe this just isn't for me, and thats fine. I'm struggling to understand a production use case where I'd be comfortable letting this thing loose.
No. It is going to produce a mindboggling amount of code in a very short amount of time, and hopefully in the process bad variants somehow would be pruned out as the horde of agents engage in an orgy of edits.
Keyboards are highly deterministic. And when they're not, e.g. due to physical wear or software glitches, this makes them basically unusable for touch typists.
Loved the themes of this article. It's inspiring me to make my own agentic coding system modeled after the Napoleonic Wars. You'll be able to command an army of agents to battle! My boss will be so happy
There are no concepts in this blog post. It is the author's opinions in the form of a pseudo-Erlang program with probabilities. If one reads it like it is a program, you realize that the underlying core has been obfuscated by implementation details.
I'm looking for "the Emacs" of whatever this is, and I haven't read a blog post which isolates the design yet.
It's nice to see someone else going mad, even deeper down the well.
I don't known the details but I was wondering why people aren't "just" writing chat venues any commns protocols for the chats? So the fundamental unit is a chat that humans and agents can be a member of.
You can also have DMs etc to avoid chattiness.
But fundmantally if you start with this kind of madness you don't have a strict hierarchy and it might also be fun to see how it goes.
I briefly started building this but just spun out and am stuck using PAL MCP for now and some dumb scripts. Not super content with any of it yet.
I used Claude Code yesterday to increase the duration the invite email token is valid if the invite is sent from another user and the account is new. It simply made all tokens valid for a longer time instead. If you don’t look at the code how do you spot it?
I had the same thought of using beads to build a multi-agent orchestrator with a defined set of workflows.
But to keep things tractable, i've kept the orchestration within a collection of subagents in a single Claude code session. The orchestration system is called Pied-Piper and you can find the code here - https://github.com/sathish316/pied-piper
I use Zed (this is completely optional since claude code can work 100% stand alone), Claude Code (I have Max) and Beads. I also take advantage of the .claude/instructions.md file and let Claude know to ALWAYS use Beads, and to use rg instead of "grep" which is kind of slow (if anyone from Anthropic is reading this, for the love of GOD use ripgrep instead of grep), a small summary about the project, and some ground rules. If there's key tickets that matter for the project I tell it to note them in the instructions. The instructions files the first thing Claude reads when you first open up a chat window with it, if you make amendments ask it to reread the file.
Outside of that its trial and error, but I've learned you don't need to kick off a new chat instance very much if at all. I also like Beads because if I have to "run" or go offline I can tell it to pause and log where it left off / where its at.
For some projects I tell claude not to close tickets without my direct approval because sometimes it closes them without testing, my baseline across all projects is that it compiles and runs without major errors.
Also, and I forgot this. I make them ALWAYS commit changes. Every single time, if this horrifies you, just remember you can always revert code, people need to stop getting scared of version control, use it to your full advantage.
Upon reading the nth "Web development is fun again because LLMs make the complexity go away" article here on Hackernews, I started thinking, "LLM-based development is going to get even more complicated, fiddly, and tedious, and then we'll have to paper that complexity over with abstractions, which we'll then manage using LLMs in another turn of an infinite spiral of setting the earth progressively more on fire, just to stand up a basic web app which we could have done in 2000 with some HTML and PHP/Perl/Python knowledge". And here we are.
I've been reading Steve for a long long time. He's had a lot of good ideas, issued some solid advice, but has always had a quirky sense of humor. A few pages into it I thought "this has to be a joke". But I couldn't find the punchline. This is the most depressing comment section I've read in a long time. I might have to enter the lottery to become an apprentice electrician.
I was trying to be patient, hoping this entire thing would collapse sooner rather than later - but now I think I'm just going to start planning my exit from this industry forever.
Nah, believe me, the next few years are going to be very, but very interesting. We are building megatons of technical debt. On the application side, people who have no idea of how all of this works down the hood are using LLMs for things that any chimp could do with deterministic code, and even when they use it for what it is really good at, that is, dealing with natural language and other unstructured input, nobody really has the discipline to have extensive validation suits, guardrails and quality gates.
The simplicity of just plugging a few lines of code in a framework or a workflow engine means the barrier of entry is really, really low, what guarantees that we will have thousands of business process running through those duct taped agents in almost every kind of industry you can imagine.
Mountains of code nobody understand, even more Byzantine post-training to shoehorn more complex tool-usage into the models.
Compliance issues galore. Security incidents by the ton.
The future is going to be very, very interesting pretty soon. Why would you leave your front-row seat right now?
I would instead invest some good time and money in buying and learning to play a modern replica of a greek Kithara.
Even if you do exit, all the software around you will steadily get worse and worse. Software engineering is already really bad, especially for consumer products, and all this vibe agent crud is only going to accelerate the badness.
I tried it out but despite what the README says (https://github.com/steveyegge/gastown), the mayor didn't create a convoy or anything, the mayor is just doing all the work itself, appearing no different than a `claude` invocation.
Update: I was hoping it'd at least be smart enough to automatically test the project still builds but it did not. It also didn't commit the changes.
> are you the mayor?
Yes. I violated the Mayor protocol - I should have dispatched this work to the gmailthreading crew worktree instead of implementing it directly myself.
The CLAUDE.md is clear: "Mayor Does NOT Edit Code" and "Coordinate, don't implement."
Maybe Yegge should have build it around Codex instead - Codex is a lot better at adhering to instructions.
Pros: The overall system architecture is similar to my own latest attempt at solving this problem. I like the tmux-based console-monitoring approach (rather than going full SDK + custom UI), it makes it easier to inspect what is going on. The overlap between my ideas and Steve's is around 75%.
Cons: Arguing with "The Mayor" about some other detached processes poor workmanship seems like a major disconnect and architectural gap. A game of telephone is unlikely to be better than simply using claude. I was also hoping gastown would amplify my intent to complete the task of "Add feature X" without early-stopping, but so far it's more work than both 1. Vibing with claude directly and 2. Creating a highly-detailed spec with checkboxes and piping in "do the next task" until it's done.
Definitely looking forward to seeing how the tools in this space evolve. Eventually someone is bound to get it right!
P.s. the choice of nomenclature throughout the article is a bit odd, making it hard to follow. Movie characters, dogs and raccoons, huh? How about striving for descriptive SWE clarity?
that's what got us CQRS "command query responsibility segregation" which is technically correct word but absolutely fucking meaningless to anyone that doesn't know what it means already.
It should have been called "read here, write there" but noooooooOOOOOooooo we need descriptive SWE clarity so only people with CS degrees that know all the acronyms already can understand wtf is being said.
What I dislike about Claude code and vibe coding in general is that I haven’t seen Claude code users learn a whole lot about how to do their jobs better. A terminal pane is just too small to be a good place to learn.
With vibe coding you just give the code some constraints and then system will try to work within those constraints, but what if those constraints are wrong? What if you’re asking the wrong question? Then you’ll end up with over complicated slop.
It’s a shame that vibe coded slop seems to be a new standard, when in fact you can use AI tools to produce much higher quality code if you actually care to engage in thoughtful conversations with the AIs and take a growth mindset.
I don't know about you, but when the creator of a software says I have not read any of the code, I don't want to install or use it. Call me old fashioned. Really hoping this terrifying vibe coding future dies an early death before the incurred technical debt makes every digital interaction a landmine.
To be fair, the author says: "Do not use Gas Town."
I started "fully vibecoding" 6 months ago, on a side-project, just to see if it was possible.
It was painful. The models kept breaking existing functionality, overcomplicating things, and generally just making spaghetti ("You're absolutely right! There are 4 helpers across 3 files that have overlapping logic").
A combination of adjusting my process (read: context management) and the models getting better, has led me to prefer "fully vibecoding" for all new side-projects.
Note: I still read the code that gets merged for my "real" work, but it's no longer difficult for me to imagine a future where that's not the case.
I have noticed in just the past two weeks or so, a lot of the naysayers have changed their tunes. I expect over the next 2 months there will be another sea change as the network effect and new frameworks kick in.
I think we have crossed the chasm and the pragmatists have adopted these tools because they are actually useful now. They've thrown out a lot of their previously held principles and norms to do so and I doubt the more conservative crowd will be so quick to compromise.
2 years sounds more likely than 2 months since the established norms and practices need to mature a lot more than this to be worthy of the serious consideration of the considerably serious.
No. If anything we are getting "new" models but hardly any improvements. Things are "improving" on scores, ranking and whatever other metrics the AI industry has invented but nothing is really materializing in real work.
Agreed. If the author did not bother to write, much less read, their work, why should we spend time reading it?
In the past a large codebase indicated that maybe you might take the project serious, as some human effort was expended in its creation. There were still some outliers like Urbit and it's 144 KLOC of Hoon code, perverse loobeans and all.
Now if I get so much as a whiff of AI scent of a project, I lot all interest. It indicates that the author did not a modicum of their own time in the project, so therefore I should waste my own time on it.
(I use LLM-based coding tools in some of my projects, but I have the self-respect to review the generated code before publishing init.)
I’ve come to appreciate that there is a new totally valid (imo) kind of software development one can do now where you simply do not read the code at all. I do this when prototyping things with vibe coding for example for personal use, and I’ve posted at least one such project on GitHub for others who may want to run the code.
Of course as a developer you still have to take responsibility for your code, minimally including a disclaimer, and not dumping this code in to someone else’s code base. For example at work when submitting MRs I do generally read the code and keep MRs concise.
I’ve found that there is a certain kind of coder that hears of someone not reading the code and this sounds like some kind of moral violation to them. It’s not. It’s some weird new kind of coding where I’m more creating a detailed description of the functionality I want and incrementally refining it and iterating on it by describing in text how I want it to change. For example I use it to write GUI programs for Ubuntu using GTK and python. I’m not familiar with python-gtk library syntax or GTK GUI methods so there’s not really much of a point in reading the code - I ask the machine to write that precisely because I’m unfamiliar with it. When I need to verify things I have to come up with ways for the machine to test the code on its own.
Point is I think it’s honestly one new legitimate way of using these tools, with a lot of caveats around how such generated code can be responsibly used. If someone vibe coded something and didn’t read it and I’m worried it contains something dangerous, I can ask Claude to analyze it and then run it in a docker container. I treat the code the same way the author does - as a slightly unknown pile of functions which seem to perform a function but may need further verification.
I’m not sure what this means for the software world. On the face of it it seems like it’s probably some kind of problem, but I think at the same time we will find durable use cases for this new mode of interacting with code. Much the same as when compilers abstracted away the assembly code.
Many years ago, java compilers, though billed out as a multiple-platform write-once-run-anywhere solution, those compilers would output different bytecode that would behave in interesting and sometimes unpredictable fashion. You would be inside jdb, trying to debug why the compiler did what it did.
This is not exactly that, but it is one step up. Having agents output code that then gets compiled/interpreted/whatever, based upon contextual instruction, feels very, very familiar to engineers who have ever worked close to the metal.
"Old fashioned", in this aspect, would be putting guardrails in place so that you knew that what the agent/compiler was creating was what you wanted. Many years ago, that was binaries or bytecode packaged with lots of symbols for debugging. Today, that's more automated testing.
You are ignoring the obvious difference between errors introduced while translating one near-formal-intent-clear language to another as opposed to ambiguous-natural-language to code done through a non-deterministic intermediary. At some point in the future the non-deterministic intermediary will become stable enough (when temperature is low and model versions won't affect output much) but the ambiguity of the prompting language is still going to remain an issue. Hence, read before commit will always be a requirement I think.
A good friend of mine wrote somewhere that at about 5 agents or so per project is when he is the bottleneck. I respect that assessment. Trust but verify. This way of getting faster output by removing that bottleneck altogether is, at least for me, not a good path forward.
Unfortunately, reading before merge commit is not always a firm part of human team work. Neither reading code nor test coverage by themselves are sufficient to ensure quality.
I've been vibe coding my own personal assistant platform, still haven't read any of the code but who cares it's just for me and it works.
Now I've got tools and functionality that I would have paid for before as separate apps that are running "for free" locally.
I can't help but think this is the way forward and we'll just have to deal with the landmine as/when it comes, or hope that the tooling gets drastically better so we the landmine isn't as powerful as we fear.
Same here. I'm "happy" that I'm old "enough" to be able to wrap up my career in a few years time and likely be able to get out of this mess before this "agentic AI slop" becomes the expected workflow.
On my personal project I do sometimes chat with ChatGPT and it works as a rubber duck. I explain, put my thoughts into words and typically I already solve my problem when I'm thinking it through while expressing it in words. But I must also admit that ChatGPT is very good at producing prose and I often use it for recommending names of abstractions/concepts, modules, functions, enums etc. So there's some value there.
But when it comes to code I want to understand everything that goes into my project. So in the end of the day I'm always going to be the "bottle neck", whether I think through the problem myself and write the code or I review and try to understand the AI generated code slop.
It seems to me that using the AI slop generation workflow is a great fit for the industry though, more quantity rather quality and continuous churn. Make it cheaper to replace code so that the replacement can be replaced a week later with another vibe-coded slop. Quality might drop, bugs might proliferate but who cares?
And to be fair, code itself has no value, it's ephemeral, data and its transformations are what matter. Maybe at some point we can just throw out the code and just use the chatbots to transform the data directly!
This is pretty much how I use LLMs as well. These interactions have convinced me that while the LLMs are very convincing with persuasive arguments, they are wrong often on things I am good at; so much so that I would have a hard time opening PRs for code edited by them without reading it carefully. Gell-man amnesia and all that seems appropriate here even though that anthropomorphizes LLMs to an uncomfortable extent. At some point in the future I can see them becoming very good at recognizing my intent and also reasoning correctly. Not there yet.
Yeah, the assumption is that it eventually will be the same or better. It's basically how this software was created, he seems to have made a few different versions before he was happy.
Compilers only obtained that level of trust through huge amounts of testing and deterministic execution. You don't look at compiler output because it's nearly always correct. People find compiler bugs horrifying for that reason.
LLMs are far from being as trustworthy as compilers.
If I use the same codebase and the same compiler version and the same compiler flags over and over again to produce a binary, I expect the binary to be the deterministically be the same machine code. I would not expect that from an LLM.
You're old fashioned, and that's ok, if it's ok with you.
But when high level languages were getting started, we had to read and debug the the transformed lower level output they made (hello C-front). At a certain point, most of us stopped debugging the layer below and most LLVM IR and assembly flow by without anyone reading it.
I use https://exe.dev to orchestrate several agents, and I am seeing the same benefits as Steve (with a better UI). My code smell triggers with lots of diffs that flow by, but just as often this feeling of, "oh, that's a nice feature, it's much better than I could have made" is also triggered. If you work with colleagues who occasionally delight and surprise you with excellent work, it's the same thing.
Maybe if you are not used to the feeling of being surprised and mostly delighted by your (human) colleagues, orchestrated agentic coding is hard to get your head around.
I have nothing against automated code completion on steroids or agents. What I cannot condone is not reading and understanding the generated code. If you have not understood your agent generated code, you will be "surprised" for sure, sooner or later.
Sounds like "madness" (or better, fun) but - if this work all converged into "something", wouldn't the product/system improve so much, that in a matter of days, really nothing would be left to do..?
Most likely, tens of other bugs are being introduced at each step, etc etc, right?
I love Steve Yegge, but even if this thing was working how it should (and it isn't), it is just overly complicated and with a lot of misdirection.
He as a dev should know that adding a layer of names on top of already named entities is not a good practice. But he just had fun and this came up. Which is fantastic. But I don't want to have to translate names in my head all the time.
Just not useful. Beads also... really sorry to say this, but it is a task runner with labels, but it has 0 awareness of the actual tasks.
I don't know, maybe I am wrong, but this just doesn't seem like a thing that will work. Which is why I think it will be popular, nobody will be able to make it work, but they will not want to look dumb and will say it is awesome and amazing. Like another AI thingy I could name but will not that everyone is using.
But love Yegge and hope he does well. Amp for a little bit that I used it, is really solid agent and delivered much better results than many others.
Someone here has lost the plot and at this point I wonder if it is me. Is software supposed to be deterministic anymore? Are incremental steps expected to be upgrades and not regressions? Is stability of behavior and dependability desirable? Should we culturally reward striving to get more done with less.
...no, I haven't lost the plot. I'm seeing another fad of the intoxicated parting with their money bending a useful tool into a golden hammer of a caricature. I dread seeing the eventual wreckage and self-realization from the inevitable hangover.
i always thought my job was to be able to prove the correctness of the system but maybe the reality is that my job was actually just to sling shit at someone until they were satisfied.
I've never understood this argument. Do you ever work with other humans? They are very much not deterministic, yet they can often produce useful code that helps you achieve more than you could by yourself.
I am interested in this as well. From the article:
```Gas Town is also expensive as hell. You won’t like Gas Town if you ever have to think, even for a moment, about where money comes from. I had to get my second Claude Code account, finally; they don’t let you siphon unlimited dollars from a single account, so you need multiple emails and siphons, it’s all very silly. My calculations show that now that Gas Town has finally achieved liftoff, I will need a third Claude Code account by the end of next week. It is a cash guzzler.'''
Since I am quite capable of shitting up my own code for free, and I've got zero interest in this stupid AI nonsense anyway, I'm vanishingly unlikely to actually use this. But, still: I like to keep half an eye on what is going on, even if I hate it. And I am more than somewhat intrigued about what the numbers actually look like.
There's a simpler design here begging to show itself.
We're trying to orchestrate a horde of agents. The workers (polecats?) are the main problem solvers. Now you need a top level agent (mayor) to breakdown the problem and delegate work, and then a merger to resolve conflicts in the resulting code (refinery). Sometimes agents get stuck and need encouragement.
The molecules stuff confused me, but I think they're just "policy docs," checklists to do common tasks.
But this is baby stuff. Only one level of hierarchy? Show me a design for your VP agent and I'll be impressed for real.
Try to find actual screenshots of this shit or what it really does in the 200 000-word diarrhea (funnily he agrees it's diarrhea [1]).
---
He also references his previous slop called beads. To quote, "Course, I’ve never looked at Beads either, and it’s 225k lines of Go code that tens of thousands of people are using every day".
Do not listen to newly converted or accept anything from them. Steve Yegge used to be a good engineer with great understanding of the world. Now it's all gupps and polecats
[1] Quote from the article: "it’s a bunch of bullshit I pulled out of my arse over the past 3 weeks, and I named it after badgers and stuff."
Our civilization is doomed if this is the future. Zero quality, zero resiliency, zero coherent vision, zero cohesive intent. Just chaotic slop everywhere, the ultimate Ouroboros.
I thought I was working in a respectable trade with mostly conscientious adults doing a job with a lot of responsibility. What is this shit? Are we actually just a bunch of children who value fun over anything else?
Think of as an extended bipolar-optimism-fueled glimpse into the future. Steve's MO is laid out in the medium post - but basically, it's okay to lose things, rewrite whole subsystems, whatever, this is the future. It's really fun and interesting to watch the speed of development.
I've made a few multi agent coding setups in the last year, and I think gas town has the team side about right: big boss (mayor), operations boss (deacon), relatively linear keeper of truth (witness), single point for merges (refiner), lots of coders with their code held lightly.
I love the idea of formulas - a lot of what makes gas town work and informs how well it ultimately will work is the formulas. They're close conceptually to skills.
I don't love the mad max branding, but meh, whatever, it's fun, and a perk of the brave new world where you can make stuff like for a few hundred bucks a month sent to anthropic - software can have personality again, yay.
Conceptually I think there is a product team element to this still missing - deploy engineers, product managers, visual testing. Everything is sort of out there, janky in parts, but workable to glue together right now, and will only improve. That said, the mad max town analogy is going to get overstretched at some point; we already have pretty good names for all the parts that are needed, and as coordination improves, we're going to want to add more stuff into the coordination. So, I'd like to see a version of this with normal names and expanded.
Upshot - worth a look - if beads is any indication, give it a month or two or four to settle down unless you like living on the bleeding bleeding edge.
The article was pretty Ok. Kubernetes has it's own share of obnoxious terminology that often comes up as "we name it different so that it doesn't sound like AWS". At some point you just accept the terminology in relation to the tool you use and move on.
I pointed it at a Postgres time series project I was working on, and it deployed a much better UI and (with some nudging) fixed docker errors on a remote server, which involved logging in to the server to check logs. It probably opened and fixed 50 or so beads in total.
I'd reach for it first to do something complicated ("convoy" or epic) over Claude Code even as is -- like, e.g. "copy this data ingestion we do for site x, and implement it for sites y,z,,a,b,c,d. start with a formal architecture that respects our current one and remains extensible for all these sites" is something I think it would do a fair job at.
As to cost - I did not run out of my claude pro max subscription poking around with it. It infers ... a lot ... though. I pulled together a PR that would let you point some or all of the agent types at local or other endpoints, but it's a little early, I think for the codebase. I'd definitely reach for some cheaper and/or faster inference for some of the use cases.
1. https://www.youtube.com/watch?v=TZE33qMYwsc
2. https://www.youtube.com/watch?v=aSXaxOdVtAQ
It's far from a homogenous crowd. Yegge stands out with extreme opinions even from people who adopted the new tools daily.
I'm excited the author shared and so exuberantly; that said I did quick-scroll a bunch of it. It is its own kind of mind-altering substance, but we have access to mind-bending things.
If you look at my AgentDank repo [1], one could see a tool for finding weed, or you could see connecting world intelligence with SQL fluency and pairing it with curated structured data to merge the probabilistic with the deterministic computing forms. Which I quickly applied to the OSX Screentime database [2].
Vibe coding turned a corner in November and I'm creating software in ways I would have never imagined. Along with the multimodal capabilities, things are getting weirder than ever.
Mr Yegge now needs to add a whole slew of characters to Gas Town to maintain multi-modal inputs and outputs and artifacts.
Just two days I go, I had LLMs positioning virtual cameras to render 3D models it created using the Swift language after looking at a picture of what to make, and then "looking" at the results to see the next code changes. Crazy. [3]
ETA: It was only 14 months earlier that I was amazed that a multi-modal model could identify a trend in a chart [4].
[1] https://github.com/AgentDank/dank-mcp
[2] https://github.com/AgentDank/screentime-mcp
[3] https://github.com/ConAcademy/WeaselToonCadova/
[4] https://github.com/NimbleMarkets/ollamatea/blob/main/cmd/ot-...
https://news.ycombinator.com/item?id=44530767
(posted here a few months back)
"Psychedelics are the latest employee health benefit" (tech company) https://www.ft.com/content/e17e5187-8aa7-4564-9e63-eec294226...
"A new psychedelic era dawns in America" (specifically about use in california) https://www.ft.com/content/5b64945f-da21-46d9-853f-c949a95b9...
"How Silicon Valley rediscovered LSD" https://www.ft.com/content/0a5a4404-7c8e-11e7-ab01-a13271d1e...
I could go on, but the knowledge that psychadelic drugs are prominent in the tech community is not a new fact.
This is instantly recognizable as the work of someone who's been up for a couple days on Adderall.
Of course, there may be other explanations, including other drugs. But if I was one to bet...
I'm on my second agent orchestration framework, Omnispect - https://omnispect.dev/
Example created by Omnispect:
Oneshot - https://omnispect.dev/battleclone00.html
Polished - https://omnispect.dev/battleclone04.html
> Gas Town helps with all that yak shaving, and lets you focus on what your Claude Codes are working on.
Then:
> Working effectively in Gas Town involves committing to vibe coding. Work becomes fluid, an uncountable that you sling around freely, like slopping shiny fish into wooden barrels at the docks. Most work gets done; some work gets lost. Fish fall out of the barrel. Some escape back to sea, or get stepped on. More fish will come. The focus is throughput: creation and correction at the speed of thought.
I see -- so where exactly is my focus supposed to sit?
As someone who sits comfortably in the "Stage 8" category that this article defines, my concern has never been throughput, it has always been about retaining a high-degree of quality while organizing work so that, when context switching occurs, it transitions me to near-orthogonal tasks which are easy to remember so I can give high-quality feedback before switching again.
For instance, I know Project A -- these are the concerns of Project A. I know Project B -- these are the concerns of Project B. I have the insight to design these projects so they compose, so I don't have to keep track of a hundred parallel issues in a mono Project C.
On each of those projects, run a single agent -- with review gates for 2-3 independent agents (fresh context, different models! Codex and Gemini). Use a loop, let the agents go back and forth.
This works and actually gets shit done. I'm not convinced that 20 Claudes or massively parallel worktrees or whatever improves on quality, because, indeed, I always have to intervene at some point. The blocker for me is not throughput, it's me -- a human being -- my focus, and the random points of intervention which ... by definition ... occur stochastically (because agents).
Finally:
> Opus 4.5 can handle any reasonably sized task, so your job is to make tasks for it. That’s it.
This is laughably not true, for anyone who has used Opus 4.5 for non-trivial tasks. Claude Code constantly gives up early, corrupts itself with self-bias, the list goes on and on. It's getting better, but it's not that good.
this is the equivalent of some crazy inventor in the 19th century strapping a steam engine onto a unicycle and telling you that some day youll be able to go 100mph on a bike. He was right in the end, but no one is actually going to build something usable with current technology.
Opus 4.5 isnt there. But will there be a model in 3-5 years thats smart enough, fast enough, and cheap enough for a refined vision of this to be possible? Im going to bet on yes to that question.
> something like gas town is clearly not attempting to be a production grade tool.
Compare to the first two sentences:
> Gas Town is a new take on the IDE for 2026. Gas Town helps you with the tedium of running lots of Claude Code instances. Stuff gets lost, it’s hard to track who’s doing what, etc. Gas Town helps with all that yak shaving, and lets you focus on what your Claude Codes are working on.
Compared to your read, my read is confused: is it or is it not intending to be a useful tool (we can debate "production" quality, here I'm just thinking something I'd actually use meaningfully -- like Claude Code)?
I think the author wants us to take this post seriously, so I'm taking it seriously, and my critique in the original post was a serious reaction.
This tool is dangerous, largely untested, and yet may be of interest if you are already doing similar things in production.
https://www.wired.com/story/london-bitcoin-pub/
The POS software's on GitHub: https://github.com/sde1000/quicktill
Can you talk more about the structure of your workflow and how you evolved it to be that?
"What if Opus wrote the code, and GPT 5~ reviewed it?" I started evaluating this question, and started to get higher quality results and better control of complexity.
I could also trust this process to a greater degree than my previous process of trying to drive Opus, look at the code myself, try and drive Opus again, etc. Codex was catching bugs I would not catch with the same amount of time, including bugs in hard math, etc -- so I started having a great degree of trust in its reasoning capabilities.
I've codified this workflow into a plugin which I've started developing recently: https://github.com/evil-mind-evil-sword/idle
It's a Claude Code plugin -- it combines the "don't let Claude stop until condition" (Stop hook) with a few CLI tools to induce (what the article calls) review gates: Claude will work indefinitely until the reviewer is satisfied.
In this case, the reviewer is a fresh Opus subagent which can invoke and discuss with Codex and Gemini.
One perspective I have which relates to this article is that the thing one wants to optimize for is minimizing the error per unit of work. If you have a dynamic programming style orchestration pattern for agents, you want the thing that solves the small unit of work (a task) to have as low error as possible, or else I suspect the error compounds quickly with these stochastic systems.
I'm trying this stuff for fairly advanced work (in a PhD), so I'm dogfooding ideas (like the ones presented in this article) in complex settings. I think there is still a lot of room to learn here.
It's cool to see others thinking the same thing!
Gas Town is clearly the same thing multiplied by ten thousand. The number of overlapping and adhoc concepts in this design is overwhelming. Steve is ahead of his time but we aren't going to end up using this stuff. Instead a few of the core insights will get incorporated into other agents in a simpler but no less effective way.
And anyway the big problem is accountability. The reason everyone makes a face when Steve preaches agent orchestration is that he must be in an unusual social situation. Gas Town sounds fun if you are accountable to nobody: not for code quality, design coherence or inferencing costs. The rest of us are accountable for at least the first two and even in corporate scenarios where there is a blank check for tokens, that can't last. So the bottleneck is going to be how fast humans can review code and agree to take responsibility for it. Meaning, if it's crap code with embarrassing bugs then that goes on your EOY perf review. Lots of parallel agents can't solve that fundamental bottleneck.
Yeah this describes my feeling on beads too. I actually really like the idea - a lightweight task/issue tracker integrated with a coding agent does seem more useful than a pile of markdown todos/plans/etc. But it just doesnt work that well. Its really buggy and the bugs seem to confuse the agent since it was given instructions to do things a certain way that dont work consistently.
And also auditable, trackable, reportable, etc..
I was sort of kidding with "JIRA for Agents", obviously using the API and existing tool you can make agents use it.
We use Github at my current job and similarly have Claude Code update issues and PRs when it does work.
Show HN: I replaced Beads with a faster, simpler Markdown-based task tracker - https://news.ycombinator.com/item?id=46487580 - Jan 2026 (2 comments) (<-- I've put this one in the SCP - see https://news.ycombinator.com/item?id=26998308 for explanation)
Solving Agent Context Loss: A Beads and Claude Code Workflow for Large Features - https://news.ycombinator.com/item?id=46471286 - Jan 2026 (1 comment)
Beads – A memory upgrade for your coding agent - https://news.ycombinator.com/item?id=46075616 - Nov 2025 (68 comments)
Beads: A coding agent memory system - https://news.ycombinator.com/item?id=45566864 - Oct 2025 (1 comment)
Despite it's quirks I think beads is going to go down as one of the first pieces of software that got some adoption where the end user is an agent
[1]: https://github.com/nikvdp/linear-beads
What do you like about Linear? Is it suitable for hobby projects?
Linear is great, it's what JIRA should've been. Basically task management for people who don't want to deal with task management. It's also full featured, fast (they were famously one of the earlier apps to use a local-first sync-engine style architecture), and keyboard-centric.
Definitely suitable for hobby projects, but can also scale to large teams and massive codebases.
> Course, I’ve never looked at Beads either, and it’s 225k lines of Go code that tens of thousands of people are using every day. I just created it in October. If that makes you uncomfortable, get out now.
It's 2025, accountability is a thing of the past. The future belongs to the unaccountable and their AI swarm.
Facebook burned something like $70bn on "metaverse" with seemingly zero results. There's a lot more capital (and biosphere) to burn on AI agents.
Or did you find one that's good?
It unlocks a (still) hidden multiagent orchestration function in Claude code. The person making it unminified the code and figured out how to unlock it.
I find it quite well done - I started a orchestrator project a few days ago and scrapped it because it'll be fully integrated soon it seems.
But yeah, I'm only running one code agent at a time, so that's not a problem I have. I should probably start with just a todo list as plain text.
There's a lot of strange things going on in that project.
try to add some common sense, and you'll get shouted out.
which is fine, I'll just make my own version without the slop.
I have enjoyed Steve's rants since "Execution in the Kingdom of Nouns" and the Google "Platform rant", but he may need someone to talk to him about bamboo and what a terrible life choice it is. Unless you can keep it the hell away from you and your neighbours it is bad, very bad. I'm talking about clumping varieties, the runners are a whole other level.
What I am finding most beneficial almost immediately is I have a dedicated Telegram channel that I can post all sorts of unstructured data into and it's automatically routed via LLMS and stored into the right channel and then other agents work on that data to provide me insights. I have a calorie counter, workout capture, reminders, daily diary prompts all up and running as of right now, and honestly it's better than anything I could have bought "off the shelf"
Last night I needed a C# console app to convert PDFs to a sprite sheet. I spent 30 seconds writing the prompt and another 30 seconds later the app was running and successfully converting PDFs on the first try. I then spent about another 2 mins adding a progress bar, tweaking the output format and moving the main logic into a new library.
> First, you should locate yourself on the chart. What stage are you in your AI-assisted coding journey? > Stage 1: Zero or Near-Zero AI: maybe code completions, sometimes ask Chat questions > Stage 2: Coding agent in IDE, permissions turned on. A narrow coding agent in a sidebar asks your permission to run tools. > Stage 3: Agent in IDE, YOLO mode: Trust goes up. You turn off permissions, agent gets wider. > Stage 4: In IDE, wide agent: Your agent gradually grows to fill the screen. Code is just for diffs. > Stage 5: CLI, single agent. YOLO. Diffs scroll by. You may or may not look at them. > Stage 6: CLI, multi-agent, YOLO. You regularly use 3 to 5 parallel instances. You are very fast. > Stage 7: 10+ agents, hand-managed. You are starting to push the limits of hand-management. > Stage 8: Building your own orchestrator. You are on the frontier, automating your workflow. > *If you’re not at least Stage 7, or maybe Stage 6 and very brave, then you will not be able to use Gas Town. You aren’t ready yet.*
Once in a blue moon it will not completely fail and might even spit out something impressive at first glance, so some people will latch on to that.
It's not that there's nothing useful, maybe even important, in there, it's just so far it's all just the easy parts: playing around inside a computer.
I've noticed a certain trend over the years where you get certain types of projects that get lots of hype and excitement and much progress seems to be made, but when you dig deep enough you find out that it's all just the fun, easy sort of progress.
The fun progress, which not at all coincidentallly tends to also be the easy progress, is the type that happens solely inside a computer.
What do I mean by that? I mean programs who only operate at the level of artificial computer abstractions.
The hard part is always dealing with "the real world": hardware that returns "impossible" results to your nicely abstract api functions, things that stop working in places they really shouldn't be able to, or even, and this is the really tricky bit, dealing with humans.
Databases are a good example of this kind of thing. It's easy to start off a database writing all the clever (and fun) bits like btrees and hash maps and chained hashes that spill to disk to optimize certain types of tables and so on, but I'd wager that at least half of the code in a "real" database like sqlite or postgresql is devoted to dealing with strange hardware errors or leaky api abstractions across multiple platforms or the various ways a human can send nonsensical input into the system and really screw things up.
I'd also bet that this type of code is a lot less fun to write and took much longer than the rest (which incidentally is why I always get annoyes when programming language demos show code with only a happy path, but that's another rant and this comment is already excessive).
Anyways, this AI thing is definitely a gold rush and it's important to keep in mind that there was in fact a lot of gold that got dug up but, as everyone constantly repeats, the more consistent way to benefit is sell the shovels and this is very definitely an ad for a shovel.
I think we are at the beginning of the second such journey. Lots of people will get hurt while we learn how to scale it up. It's why I've gone with dangerous sounding theming and lots of caution with Gas Town.
I only think it takes 2 years this time though.
Agents and wrappers that put you deeper into LLM spending frenzy is like the new "todo app".
I recognize 100% that a tool to manage ai agents with long term context tracking is going to be a big thing. Many folks have written versions of this already. But mashing together the complexity of k8s with a hodge podge of lotr and mad max references is not it.
Its like the complexity of J2EE combined with AI-fueled solipsim and a microdosing mushroom regime gone off the rails. What even are all the layers of abstractions here? and to build what? What actual apps or systems has this thing built? AFAICT it has built gas town, and nothing else. Not surprising that it has eaten its own tail.
The amount of jargon, ai art, pop culture references, and excessive complexity going on here is truly amazing, and I would assume its satire if I didn't know Yegge's style and previous writings. Its like someone looked at the amount of overlapping and confusing tools Anthropic has released around Claude Code, and said "hold my beer, hand me 3 red bulls and a shot of espresso, I can top that!".
I do think a friend of mine nailed it though with this quote: "This whole "I'm using agents to write so much software" building-in-public trend, but without actually showing what they built, reminds me of the people selling courses on stock trading or drop shipping."
The amount of get-rich quick schemes around any new tech are boundless. As yegge himself points out in the post towards the end, you'd be surprised what you can pull off with a ridiculous blog post, big-tech reputation, and excessive LOC dev-tools in a hype-driven market. How could it be wrong if it aligns so closely with so many CEOs dreams?
WARNING DANGER CAUTION GET THE F** OUT YOU WILL DIE
I have never met Steve, but this warning alone is :chefskiss:
Has to be close for the shortest time from first commit to HN front page.
I'll add a personal anecdote - 2 years ago, I wrote a SwiftUI app by myself (bare you, I'm mostly an infrastructure/backend guy with some expertise in front end, where I get the general stuff, but never really made anything big out of it other than stuff on LAMPP back in 2000s) and it took me a few weeks to get it to do what I want to do, with bare minimum of features. As I was playtesting my app, I kept writing a wishlist of features for myself, and later when I put it on AppStore, people around the world would email me asking for some other features. But life, work and etc. would get into way, and I would have no time to actually do them, as some of the features would take me days/weeks.
Fast forward to 2 weeks ago, at this point I'm very familiar with Claude Code, how to steer multiple agents at a time, quick review its outputs, stitch things together in my head, and ask for right things. I've completed almost all of the features, rewrote the app, and it's already been submitted to AppStore. The code isn't perfect, but it's also not that bad. Honestly, it's probably better from what I would've written myself. It's an app that can be memory intensive in some parts, and it's been doing well from my testings. On top of it, since I've been steering 2-3 agents actively myself, I have the entire codebase in my mind. I also have overwhelming amount of more notes what I would do better and etc.
My point is, if you have enough expertise and experience, you'll be able to "stitch things together" cleaner than others with no expertise. This also means, user acquisition, marketing and data will be more valuable than the product itself, since it'll be easier to develop competing products. Finding users for your product will be the hard part. Which kinda sucks, if I'll be honest, but it is what it is.
I don’t see how we get there, though, at least in the short term. We’re still living in the heavily-corporate-subsidized AI world with usage-based pricing shenanigans abound. Even if frontier models providers find a path to profitability (which is a big “if”), there’s no way the price is gonna go anywhere but up. It’s moviepass on steroids.
Consumer hardware capable of running open models that compete with frontier models is still a long ways away.
Plus, and maybe it’s just my personal cynicism showing, but when did tech ever reduce pricing while maintaining quality on a provided service in the long run? In an industry laser focused on profit, I just don’t see how something so many believe to be a revolutionary force in the market will be given away for less than it is today.
Billions are being invested with the expectation that it will fetch much more revenue than it’s generating today.
If training of new models ceased, and hardware was just dedicated to inference, what would that do to prices and speed? It's not clear to me how much inference is actually being subsidized over the actual cost to run the hardware to do it. If there's good data on that I'd love to learn more though.
Or, if it does _now_, how long it'll be before it' will work well using downloadable models that'll run on, say, a new car's worth of Mac Studios with a bunch of RAM in them to allow a small fleet of 70B and 120B models (or larger) to run locally? Perhaps even specialised models for each of the roles this uses?
But how many of those providers are too subsidizing their offering through investment capital? I don't know offhand of anyone in this space that is running at or close to breakeven.
It feels very much like the early days of streaming when you could watch everything with a single Netflix account. Those days are long gone and never coming back.
We're also seeing significant price reductions every year for LLM's. Not for frontier models, but you can get the equivalent of last year's model for cheaper. Hard to tell from the outside, but I don't think it's all subsidized?
I think maybe people over-updated on Bitcoin mining. Most tech is not inherently expensive.
There's little evidence this is true. Even OpenAI who is spending more than anyone is only losing money because of the free version of ChatGPT. Anthropic says they will be profitable next year.
> Plus, and maybe it’s just my personal cynicism showing, but when did tech ever reduce pricing while maintaining quality on a provided service in the long run? In an industry laser focused on profit, I just don’t see how something so many believe to be a revolutionary force in the market will be given away for less than it is today.
Really?
I mean I guess I'm showing my age but the idea I can get a VM for a couple of dollars a month and expect it to be reliable make me love the world I live in. But I guess when I started working there was no cloud and to get root on a server meant investing thousands of dollars.
According to Ed Zitron, Anthropic spent more than it's total revenue in the first 9 months of 2025 on AWS alone: $2.66 billion on AWS compute on an estimated $2.55 billion in revenue. That's just AWS, not payroll, not other software or hardware spend. He's regularly reporting concrete numbers that look horrible for the industry while hyperscalers and foundation model companies continue to make general statements while refusing to get specific or release real revenue figures. If you only listen to what the CEOs are saying, then sure it sounds great.
Anthropic also said that AI would be writing 95% of code in 3 months or something, however many months ago that was.
Yes, but it's unclear how much of that is training costs vs operational costs. They are very different things.
That's an old world that we experienced in 2000s, and maybe in early 2010s, where we cared about the quality on a provided service in the long run. For anything web-app-general-stuff related, that's long gone, as everyone (reads: mostly everyone) has very short attention span, and what is needed is "if the thing i desire can be done right now". In long run? Who cares. I keep seeing this in every day life, at work, discussions with my previous clients and etc.
Once again, I wish it wasn't true, but nothing is pointing that it's not true.
Since we have version control, you can restart anywhere if you think it's a good place to fork from. I like greenfield development, but I suspect that there are going to be a lot more forks from now on, much like the game modding scene.
Companies with money-making businesses are gonna find themselves in an interesting spot when the "vibe juniors" are the vast majority of the people they can find to hire. New ways will be needed to reduce the risk.
...go to jail?
I've had the same experience as you. I've applied it to old projects which I have some frame of reference for and it's like a 200x speed boost. Just absolutely insane - that sort of speed can overcome a lot of other shortcomings.
I'm a full stack dev, and solo, so I write data schema, backends and frontends at the same time, usually flipping between them to test parts of new features. As far as AI use, I'm really just at the level of using a single Claude agent in an IDE - and only occasionally, because it writes a lot of nonsense. So maybe I'm missing out on the benefits of multiple agents. But where I currently see value in it is in writing (1) boilerplate and (b) sugar - where it has full access to a large and stable codebase. Where I think it fails is in writing overarching logical structures, especially early on in a project. It isn't good at writing elegant code with a clear view of how data, back and front should work together. When I've tried to start projects from scratch with Claude, it feels like I'm fighting against its micro-view of each piece of code, where it's unable to gain a macro-view of how to orchestrate the whole system.
So like, maybe a bottomless wallet and a dozen agents would help with that, but there isn't so much room for errors or bugs in my work code as there is in my fun/play/casual game code. As a result I'm not really seeing that much value in it for paid work.
If your end goal is to produce some usable product, then the implementation details matter less. Does it work? Yes? OK then maybe dont wrestle with the agent over specific libraries or coding patterns.
We intend to sing the love of danger, the habit of energy and fearlessness.
Courage, audacity, and revolt will be essential elements of our poetry.
Up to now literature has exalted a pensive immobility, ecstasy, and sleep. We intend to exalt aggresive action, a feverish insomnia, the racer’s stride, the mortal leap, the punch and the slap.
We affirm that the world’s magnificence has been enriched by a new beauty: the beauty of speed. A racing car whose hood is adorned with great pipes, like serpents of explosive breath—a roaring car that seems to ride on grapeshot is more beautiful than the Victory of Samothrace. … https://www.arthistoryproject.com/artists/filippo-tommaso-ma...
225k lines for a cli issue tracker? What the fuck?
Looking at the screenshot of "Tracked Issues", it seems many of the "tasks" are likely overlapping in terms of code locality.
Based on my own experience, I've found the current crop of models to work well at a slightly higher-level of complexity than the tasks listed there, and they often benefit from having a shared context vs. when I've tried to parallelize down to that level of work (individual schema changes/helper creation/etc.).
Maybe I'm still just unclear on the inner workings, but it's my understanding each of those tasks is passed to Claude Code and developed separately?
In either case, I think this project is a glimpse into the future of software development (albeit with a grungy desert punk tinted lens).
For context, I've been "full vibe-coding"[0] for the past 6 months, and though it started painfully, the models are now good enough that not reading the code isn't much of an issue anymore.
I think Gas Town looks interesting directionally and as a PoC. Like it or not, that's the world we'll end up in. Some products will do it well and some will be horrible monsters. (Like I'm already dreading Oracle Gas Town and Azure Gas Town).
I think the Amp coding agent trends in the direction of Gas Town already. Powerful but expensive, uses a mix of models and capabilities to do something that's greater than the sum of the parts.
This explains why some of the comments have timestamps that appear older than the post itself. I got tired of trying to make them line up, sorry!)
IMHO, it's less disorienting to have the post dated after the comments than it is to see a comment you thought you wrote a couple days ago but is dated today. So you're welcome to stop trying to line up timestamps.
Status quo sucks also, it just sucks less. Haven't yet figured out an actually good solution. Sorry!
The most I imagine most folks saying is "Didn't I see this post on the front page days ago?". For many other discussion fora, it's not uncommon for posts to be at the top of the pile for many days... so a days-old post date should be nothing unusual.
Re artificial uplifting a.k.a. re-upping, see https://news.ycombinator.com/item?id=26998308 and https://news.ycombinator.com/pool
I promptly gave Claude the text to the articles and had him rewrite using idiomatic distributed systems naming.
Fun times!
Assuming this isn't a parody project, maybe this just isn't for me, and thats fine. I'm struggling to understand a production use case where I'd be comfortable letting this thing loose.
Who is the intended audience for this design?
I'm looking for "the Emacs" of whatever this is, and I haven't read a blog post which isolates the design yet.
I don't known the details but I was wondering why people aren't "just" writing chat venues any commns protocols for the chats? So the fundamental unit is a chat that humans and agents can be a member of.
You can also have DMs etc to avoid chattiness.
But fundmantally if you start with this kind of madness you don't have a strict hierarchy and it might also be fun to see how it goes.
I briefly started building this but just spun out and am stuck using PAL MCP for now and some dumb scripts. Not super content with any of it yet.
But to keep things tractable, i've kept the orchestration within a collection of subagents in a single Claude code session. The orchestration system is called Pied-Piper and you can find the code here - https://github.com/sathish316/pied-piper
It is only 1.6k Lines of Go code.
Gas Town is from the creator of beads.
Outside of that its trial and error, but I've learned you don't need to kick off a new chat instance very much if at all. I also like Beads because if I have to "run" or go offline I can tell it to pause and log where it left off / where its at.
For some projects I tell claude not to close tickets without my direct approval because sometimes it closes them without testing, my baseline across all projects is that it compiles and runs without major errors.
The simplicity of just plugging a few lines of code in a framework or a workflow engine means the barrier of entry is really, really low, what guarantees that we will have thousands of business process running through those duct taped agents in almost every kind of industry you can imagine.
Mountains of code nobody understand, even more Byzantine post-training to shoehorn more complex tool-usage into the models.
Compliance issues galore. Security incidents by the ton.
The future is going to be very, very interesting pretty soon. Why would you leave your front-row seat right now?
I would instead invest some good time and money in buying and learning to play a modern replica of a greek Kithara.
How would you go about doing that?
Update: I was hoping it'd at least be smart enough to automatically test the project still builds but it did not. It also didn't commit the changes.
Maybe Yegge should have build it around Codex instead - Codex is a lot better at adhering to instructions.Pros: The overall system architecture is similar to my own latest attempt at solving this problem. I like the tmux-based console-monitoring approach (rather than going full SDK + custom UI), it makes it easier to inspect what is going on. The overlap between my ideas and Steve's is around 75%.
Cons: Arguing with "The Mayor" about some other detached processes poor workmanship seems like a major disconnect and architectural gap. A game of telephone is unlikely to be better than simply using claude. I was also hoping gastown would amplify my intent to complete the task of "Add feature X" without early-stopping, but so far it's more work than both 1. Vibing with claude directly and 2. Creating a highly-detailed spec with checkboxes and piping in "do the next task" until it's done.
Definitely looking forward to seeing how the tools in this space evolve. Eventually someone is bound to get it right!
P.s. the choice of nomenclature throughout the article is a bit odd, making it hard to follow. Movie characters, dogs and raccoons, huh? How about striving for descriptive SWE clarity?
that's what got us CQRS "command query responsibility segregation" which is technically correct word but absolutely fucking meaningless to anyone that doesn't know what it means already.
It should have been called "read here, write there" but noooooooOOOOOooooo we need descriptive SWE clarity so only people with CS degrees that know all the acronyms already can understand wtf is being said.
With vibe coding you just give the code some constraints and then system will try to work within those constraints, but what if those constraints are wrong? What if you’re asking the wrong question? Then you’ll end up with over complicated slop.
It’s a shame that vibe coded slop seems to be a new standard, when in fact you can use AI tools to produce much higher quality code if you actually care to engage in thoughtful conversations with the AIs and take a growth mindset.
I started "fully vibecoding" 6 months ago, on a side-project, just to see if it was possible.
It was painful. The models kept breaking existing functionality, overcomplicating things, and generally just making spaghetti ("You're absolutely right! There are 4 helpers across 3 files that have overlapping logic").
A combination of adjusting my process (read: context management) and the models getting better, has led me to prefer "fully vibecoding" for all new side-projects.
Note: I still read the code that gets merged for my "real" work, but it's no longer difficult for me to imagine a future where that's not the case.
[0] https://github.com/kucherenko/jscpd
https://github.com/shepherdjerred/scout-for-lol/blob/main/es...
2 years sounds more likely than 2 months since the established norms and practices need to mature a lot more than this to be worthy of the serious consideration of the considerably serious.
In the past a large codebase indicated that maybe you might take the project serious, as some human effort was expended in its creation. There were still some outliers like Urbit and it's 144 KLOC of Hoon code, perverse loobeans and all.
Now if I get so much as a whiff of AI scent of a project, I lot all interest. It indicates that the author did not a modicum of their own time in the project, so therefore I should waste my own time on it.
(I use LLM-based coding tools in some of my projects, but I have the self-respect to review the generated code before publishing init.)
Of course as a developer you still have to take responsibility for your code, minimally including a disclaimer, and not dumping this code in to someone else’s code base. For example at work when submitting MRs I do generally read the code and keep MRs concise.
I’ve found that there is a certain kind of coder that hears of someone not reading the code and this sounds like some kind of moral violation to them. It’s not. It’s some weird new kind of coding where I’m more creating a detailed description of the functionality I want and incrementally refining it and iterating on it by describing in text how I want it to change. For example I use it to write GUI programs for Ubuntu using GTK and python. I’m not familiar with python-gtk library syntax or GTK GUI methods so there’s not really much of a point in reading the code - I ask the machine to write that precisely because I’m unfamiliar with it. When I need to verify things I have to come up with ways for the machine to test the code on its own.
Point is I think it’s honestly one new legitimate way of using these tools, with a lot of caveats around how such generated code can be responsibly used. If someone vibe coded something and didn’t read it and I’m worried it contains something dangerous, I can ask Claude to analyze it and then run it in a docker container. I treat the code the same way the author does - as a slightly unknown pile of functions which seem to perform a function but may need further verification.
I’m not sure what this means for the software world. On the face of it it seems like it’s probably some kind of problem, but I think at the same time we will find durable use cases for this new mode of interacting with code. Much the same as when compilers abstracted away the assembly code.
This is not exactly that, but it is one step up. Having agents output code that then gets compiled/interpreted/whatever, based upon contextual instruction, feels very, very familiar to engineers who have ever worked close to the metal.
"Old fashioned", in this aspect, would be putting guardrails in place so that you knew that what the agent/compiler was creating was what you wanted. Many years ago, that was binaries or bytecode packaged with lots of symbols for debugging. Today, that's more automated testing.
Now I've got tools and functionality that I would have paid for before as separate apps that are running "for free" locally.
I can't help but think this is the way forward and we'll just have to deal with the landmine as/when it comes, or hope that the tooling gets drastically better so we the landmine isn't as powerful as we fear.
On my personal project I do sometimes chat with ChatGPT and it works as a rubber duck. I explain, put my thoughts into words and typically I already solve my problem when I'm thinking it through while expressing it in words. But I must also admit that ChatGPT is very good at producing prose and I often use it for recommending names of abstractions/concepts, modules, functions, enums etc. So there's some value there.
But when it comes to code I want to understand everything that goes into my project. So in the end of the day I'm always going to be the "bottle neck", whether I think through the problem myself and write the code or I review and try to understand the AI generated code slop.
It seems to me that using the AI slop generation workflow is a great fit for the industry though, more quantity rather quality and continuous churn. Make it cheaper to replace code so that the replacement can be replaced a week later with another vibe-coded slop. Quality might drop, bugs might proliferate but who cares?
And to be fair, code itself has no value, it's ephemeral, data and its transformations are what matter. Maybe at some point we can just throw out the code and just use the chatbots to transform the data directly!
LLMs are far from being as trustworthy as compilers.
But when high level languages were getting started, we had to read and debug the the transformed lower level output they made (hello C-front). At a certain point, most of us stopped debugging the layer below and most LLVM IR and assembly flow by without anyone reading it.
I use https://exe.dev to orchestrate several agents, and I am seeing the same benefits as Steve (with a better UI). My code smell triggers with lots of diffs that flow by, but just as often this feeling of, "oh, that's a nice feature, it's much better than I could have made" is also triggered. If you work with colleagues who occasionally delight and surprise you with excellent work, it's the same thing.
Maybe if you are not used to the feeling of being surprised and mostly delighted by your (human) colleagues, orchestrated agentic coding is hard to get your head around.
Most likely, tens of other bugs are being introduced at each step, etc etc, right?
He as a dev should know that adding a layer of names on top of already named entities is not a good practice. But he just had fun and this came up. Which is fantastic. But I don't want to have to translate names in my head all the time.
Just not useful. Beads also... really sorry to say this, but it is a task runner with labels, but it has 0 awareness of the actual tasks.
I don't know, maybe I am wrong, but this just doesn't seem like a thing that will work. Which is why I think it will be popular, nobody will be able to make it work, but they will not want to look dumb and will say it is awesome and amazing. Like another AI thingy I could name but will not that everyone is using.
But love Yegge and hope he does well. Amp for a little bit that I used it, is really solid agent and delivered much better results than many others.
...no, I haven't lost the plot. I'm seeing another fad of the intoxicated parting with their money bending a useful tool into a golden hammer of a caricature. I dread seeing the eventual wreckage and self-realization from the inevitable hangover.
I've never understood this argument. Do you ever work with other humans? They are very much not deterministic, yet they can often produce useful code that helps you achieve more than you could by yourself.
```Gas Town is also expensive as hell. You won’t like Gas Town if you ever have to think, even for a moment, about where money comes from. I had to get my second Claude Code account, finally; they don’t let you siphon unlimited dollars from a single account, so you need multiple emails and siphons, it’s all very silly. My calculations show that now that Gas Town has finally achieved liftoff, I will need a third Claude Code account by the end of next week. It is a cash guzzler.'''
Since I am quite capable of shitting up my own code for free, and I've got zero interest in this stupid AI nonsense anyway, I'm vanishingly unlikely to actually use this. But, still: I like to keep half an eye on what is going on, even if I hate it. And I am more than somewhat intrigued about what the numbers actually look like.
We're trying to orchestrate a horde of agents. The workers (polecats?) are the main problem solvers. Now you need a top level agent (mayor) to breakdown the problem and delegate work, and then a merger to resolve conflicts in the resulting code (refinery). Sometimes agents get stuck and need encouragement.
The molecules stuff confused me, but I think they're just "policy docs," checklists to do common tasks.
But this is baby stuff. Only one level of hierarchy? Show me a design for your VP agent and I'll be impressed for real.
He is so in love with his own voice.
Try to find actual screenshots of this shit or what it really does in the 200 000-word diarrhea (funnily he agrees it's diarrhea [1]).
---
He also references his previous slop called beads. To quote, "Course, I’ve never looked at Beads either, and it’s 225k lines of Go code that tens of thousands of people are using every day".
It's slop to a level that people create extensive scripts to try and purge it from the system since it infects everything you do: https://gist.github.com/banteg/1a539b88b3c8945cd71e4b958f319...
Do not listen to newly converted or accept anything from them. Steve Yegge used to be a good engineer with great understanding of the world. Now it's all gupps and polecats
[1] Quote from the article: "it’s a bunch of bullshit I pulled out of my arse over the past 3 weeks, and I named it after badgers and stuff."
There is a repo and I am not sure; the only way to resolve it probably is to spend some of that money he’s talking about.
Our civilization is doomed if this is the future. Zero quality, zero resiliency, zero coherent vision, zero cohesive intent. Just chaotic slop everywhere, the ultimate Ouroboros.