Orchestrate teams of Claude Code sessions

(code.claude.com)

360 points | by davidbarker 18 hours ago

24 comments

  • bluerooibos 13 hours ago
    This is great and all but, who can actually afford to let these agents run on tasks all day long? Is anyone here actually using this or are these rollouts aimed at large companies?

    I'm burning through so many tokens on Cursor that I've had to upgrade to Ultra recently - and i'm convinced they're tweaking the burn rate behind the scenes - usage allowance doesn't seem proportional.

    Thank god the open source/local LLM world isn't far behind.

    • MarkMarine 7 hours ago
      A Claude max 20x plan and you’ll be fine. I’d been doing my normal process of running 4 Claude sessions in parallel because that was about the right amount of concurrent sessions for me to watch what’s going on and approve/deny plans and code… and this blows it out of the water. With an agent swarm it’s so fast at executing and testing I’m limited by my idea and review capabilities now. I tried running 2 and I can’t keep up, I’m defining specs and the other window is done, tested, validated and waiting for me.
    • anupamchugh 7 hours ago
      Real numbers from today. FastAPI codebase, ~50k LOC. 4 agents, 6 tasks, ~6 min wall clock vs ~18-20 min sequential. 24 tests, 0 file conflicts. Token cost: roughly 4x a single session.

      To your cost question — agent teams are sprinters, not marathon runners. You use them for a 6-minute burst of parallel work, not all day. A 6-minute burst at 4x cost is still cheaper than 20 minutes at 1x if your time matters more than tokens.

      The constraint nobody mentions: tasks must be file-disjoint. Two agents editing the same file means overwrites. Plan decomposition matters more than the agents themselves.

      One thing to watch: Claude Code crashed mid-session with a React reconciler error (#23555). 4 agents + MCP servers pushes the UI past its limits.

      • simianwords 7 hours ago
        Need it be actually disjoint? Interested in learning about the limitation here because apparently the agents can coordinate.

        Otherwise what’s the difference between what they are providing vs me creating two independent pull requests using agents and having an agent resolve merge conflicts?

        • anupamchugh 6 hours ago
          It does need to be disjoint. The https://code.claude.com/docs/en/agent-teams are explicit: "Two teammates editing the same file leads to overwrites. Break the work soeach teammate owns a different set of files."

          locking is for task claiming — preventing two agents from grabbing the same task — not for file writes:

          "Task claiming uses file locking to prevent race conditions when multiple teammates try to claim the same task simultaneously."

          The coordination layer (TaskList, blockedBy, SendMessage) handles logical task sequencing, not concurrent file access. You can make agent B wait for agent A via dependencies, but that serializes the work and kills the parallelism benefit.

          • Aditya_Garg 4 hours ago
            Anthropic themselves were able to write a c compiler using teams all at the same time

            https://www.anthropic.com/engineering/building-c-compiler

            Here is the relevant excerpt:

            "To prevent two agents from trying to solve the same problem at the same time, the harness uses a simple synchronization algorithm:

            Claude takes a "lock" on a task by writing a text file to current_tasks/ (e.g., one agent might lock current_tasks/parse_if_statement.txt, while another locks current_tasks/codegen_function_definition.txt). If two agents try to claim the same task, git's synchronization forces the second agent to pick a different one. Claude works on the task, then pulls from upstream, merges changes from other agents, pushes its changes, and removes the lock. Merge conflicts are frequent, but Claude is smart enough to figure that out."

    • logicx24 13 hours ago
      I can't even get through my Claude Max quota, and that's only 200/mo. And I code every day and use it for various other pretty-intensive tasks.
      • dangus 12 hours ago
        only $200/mo…$200 a month is a used car payment.

        I guarantee you that price will double by 2027. Then it’ll be a new car payment!

        I’m really not saying this to be snarky, I’m saying this to point out that we’re really already in the enshittification phase before the rapid growth phase has even ended. You’re paying $200 and acting like that’s a cheap SaaS product for an individual.

        I pay less for Autocad products!

        This whole product release is about maximizing your bill, not maximizing your productivity.

        I don’t need agents to talk to each other. I need one agent to do the job right.

        • __turbobrew__ 10 hours ago
          $200/month is peanuts when you are a business paying your employees $200k/year. I think LLMs make me at least 10% more effective and therefore the cost to my employer is very worth it. Lots of trades have much more expensive tools (including cars).
          • dangus 10 hours ago
            > I think LLMs make me at least 10% more effective

            I know this was last year but...

            https://metr.org/blog/2025-07-10-early-2025-ai-experienced-o...

            • spiderfarmer 9 hours ago
              I don’t need external research to validate or invalidate my own experience.
              • legulere 14 minutes ago
                One of the outcomes of that study is that your own productivity estimate might not match up with reality.
              • tehjoker 8 hours ago
                I think it depends on the tasks you use it for. Bootstrapping or translating projects between languages is amazing. New feature development? Questionable.
                • __turbobrew__ 7 hours ago
                  I don’t write frontend stuff, but sometimes need to fix a frontend bug.

                  Yesterday I fed claude very surgical instructions on how the bug happens, and what I want to happen instead, and it oneshot the fix. I had a solution in about 5 minutes, whereas it would have taken me at least an hour, but most likely more time to get to that point.

                  Literally an hour or two of my day was saved yesterday. I am salaried at around $250/hour, so in that one interaction AI saved my employer $250-500 in wages.

                  AI allows me to be a T shaped developer, I have over a decade of deep experience in infrastructure, but know fuck all about front end stuff. But having access to AI allows me as an individual who generally knows how computers work to fix a simple problem which is not in my domain.

                • samtheprogram 7 hours ago
                  New feature development in web and mobile apps is absolutely 10% more productive with these tools, and anyone who says otherwise is coping. That's a large fraction of software development.
            • __turbobrew__ 7 hours ago
              Honestly, that is a “skill issue” as the kids these days say. When used properly and with skill, agents can increase your productivity. Like any tool, use it wrong and your life will be worse off. The logically consistent view if you want to believe this study and my experience is that the average person is hindered by using AI because they do not have the skills, but there are people out there who gain a net benefit.
        • kesslern 12 hours ago
          Not saying $200/mo isn't a lot, but I think you're underestimating used car payments these days. The average US used car payment is above $500 now.
        • yomismoaqui 12 hours ago
          As company owner the math is simple:

          If I pay $3k/month to a developer and a $200/month tool makes them 10% more productive I will pay it without thinking.

        • nlh 12 hours ago
          I pay $200/month, don’t come near the limits (yet), and if they raised the price to $1000/month for the exact same product I’d gladly pay it this afternoon (Don’t quote me on this Anthropic!)

          If you’re not able to get US$thousands out of these models right now either your expectations are too high or your usage is too low, but as a small business owner and part/most-time SWE, the pricing is a rounding error on value delivered.

          • rune-dev 12 hours ago
            As a business expense to make profit, I can understand being ok with this price point.

            But as an individual with no profit motive, no way.

            I use these products at work, but not as much personally because of the bill. And even if I decided I wanted to pursue a for profit side project I’d have to validate it’s viability before even considering a 200$ monthly subscription

            • Wowfunhappy 10 hours ago
              I'm paying $100 per month even though I don't write code professionally. It is purely personal use. I've used the subscription to have Claude create a bunch of custom apps that I use in my daily life.

              This did require some amount of effort on my part, to test and iterate and so on, but much less than if I needed to write all the code myself. And, because these programs are for personal use, I don't need to review all the code, I don't have security concerns and so on.

              $100 every month for a service that writes me custom applications... I don't know, maybe I'm being stupid with my money, but at the moment it feels well worth the price.

            • yomismoaqui 11 hours ago
              You can do it for $40 month. What I'm doing:

              - $20 for Claude Pro (Claude Code) - $20 for ChatGPT Plus (Codex) - Amp Free Plan (with ads and you get about $10 of daily value)

              So you get to use 3 of the top coding agents for $40 month.

            • __turbobrew__ 10 hours ago
              Some tools are not meant for individuals. That 100k software defined radio isn’t meant for you either.
          • geraneum 12 hours ago
            We’re gonna see an economic boom any minute.
          • dangus 10 hours ago
            "Rounding error" lol, you can hire an actual full time human in India for $1000/month.
            • nmfisher 9 hours ago
              Will they be better than Opus though?
            • bdangubic 10 hours ago
              wouldn’t hire one for $15/month…

              with the US salaries for SWEs $1000/month is not a rounding error for all but definitely for some. say you make $100/hr and CC saves you say 30hrs / month? not rounding error but no brainer. if you make $200+/hr it starts to become a rounding error. I have multiple max accounts at my disposal and at this point would for sure pay $1000/month for max plan. it comes down to simple math

          • imiric 12 hours ago
            I'm curious: what concrete value have you extracted using these tools that is worth US$thousands?
        • bryanlarsen 11 hours ago
          That's one of 3 possible futures.

          1. 1-3 LLM vendors are substantially higher quality than other vendors and none of those are open source. This is an oligarchy and the scenario you described will play out.

          2. >3 LLM vendors are all high quality and suitable for the tasks. At least one of these is open source. This is the "commodity" scenario, and we'll end up paying roughly the cost of inference. This still might be hundreds per month, though.

          3. Somewhere in between. We've got >3 vendors, but 1-3 of them are somewhat better than the others, so the leaders can charge more. But not as much more than they can in scenario #1.

        • Wowfunhappy 10 hours ago
          > I’m saying this to point out that we’re really already in the enshittification phase before the rapid growth phase has even ended. You’re paying $200 and acting like that’s a cheap SaaS product for an individual.

          Traditional SaaS products don't write code for me. They also cost much less to run.

          I'm having a lot of trouble seeing this as enshittification. I'm not saying it won't happen some day, but I don't think we're there. $200 per month is a lot, but it depends on what you're getting. In this case, I'm getting a service that writes code for me on demand.

        • buzzerbetrayed 12 hours ago
          If you can’t get $200 of value out of Claude Code Max, then you need to really step up your game. That’s user error.
        • meowface 12 hours ago
          I could write an essay about how almost everything you wrote either is extremely incorrect or is extremely likely to be incorrect. I am too lazy to, though, so I will just have to wait for another commenter to do the equivalent.
          • dangus 10 hours ago
            Why not make your AI tool do it for you?
            • meowface 9 hours ago
              Because, while I have been a huge AI optimist for decades, I generally don't like their current writing output. And even if I did, it would feel like plagiarism unless I prepended it with "an AI responded with this:", which would make me seem lazy. (Though I did already just admit I am very lazy in my first post, so perhaps that is what I will do going forward once they become better writers.)
    • emp17344 13 hours ago
      Especially for what’s basically an experiment. Gas town didn’t really work, so there’s no guarantee this will even produce anything of value.
    • rahimnathwani 13 hours ago
      Many many companies can afford to hire a junior engineer for $150k/year (plus employer payroll taxes, employee benefits etc.).

      Are you spending more than $150k per year on AI?

      (Also, you're talking about the cost of your Cursor subscription, when the article is about Claude Code. Maybe try Claude Max instead?)

      • freeone3000 13 hours ago
        If it could do anything that a junior dev could, that’d be a valid point of comparison. But it continually, wildly performs slower and falls short every time I’ve tried.
        • rahimnathwani 12 hours ago

            But it continually, wildly performs slower and falls short every time I’ve tried.
          
          If it falls short every time you've tried, it's likely that one or more of these is true:

          A. You're working on some really deep thing that only world-class expects can do, like optimizing graphics engines for AAA games.

          B. You're using a language that isn't in the top ~10 most popular in AI models' training sets.

          C. You have an opportunity to improve your ability to use the tools effectively.

          How many hours have you spent using Claude Code?

          • RollAHardSix 9 hours ago
            Trying to make a media player, media server, all by using ffmpeg and a pre-built media streaming engine as it's core. Python and SQLite. About a week's worth of effort every time until it begins to go too far off the rails to be reliable to continue to develop with. It never did get the ffmpeg commands right, I had to go back to crafting those by hand, it never did get the streaming engine to play in the browser's video player in the supported hls and dash formats. Asked it to build a file and file metadata caching layer and then had to continue to re-prompt it to poll the caching layers before trying to get values from the database. Never even got to the library, metadata, or library image functionality. Had to ask it to create the rbac permissions model I wanted despite it being very junior-level common sense (super-admin, user-admin, metadata admin, image admin).

            Not exactly world-class software.

            • frankc 5 hours ago
              I recently built something in the same universe - using ffmpeg to receive streams from obs to capture audio and video - don't want to get into details beyond except to say it involved a fairly involved pipeline of ray actors and a significant admin interface with nicegui. I had no problem doing this with claude. You need to give it access to look up how do things, like context7. If you are doing something very specific, you need to have a session that does research to build a skill so it doesn't need to redo that research every time. And yes, you do need to tell it the architecture and be fairly detailed with something like how you want rbac.

              Using these tools takes quite a bit of effort but even after doing all those steps to use the tool well, I still got this project done in a few days when it otherwise would have taken me 1-2 months and likely simply would never happened at all.

            • rahimnathwani 9 hours ago
              I'm curious which harness and which model(s) you've been using.

              And whether you have a decent PRD or spec. Are you trying to prompt the harness with one bit at a time, or did you give it a complete spec and ask it to analyze it and break it down into individual issues with dependencies (e.g. using beads and beads_viewer)?

              I'm not looking for reasons to criticize your approach or question your experience, but your answers may point to opportunities for you to get more out of these tools.

              If you're using Claude Code and you have a friend who has had more success with these tools, consider exporting your transcripts and letting them have a look: https://simonwillison.net/2025/Dec/25/claude-code-transcript...

          • astrange 10 hours ago
            > A. You're working on some really deep thing that only world-class expects can do, like optimizing graphics engines for AAA games.

            This is a relatively common skill. One thing I always notice about the video game industry is it's much more globally distributed than the rest of the software industry.

            Being bad at writing software is Japan's whole thing but they still make optimized video games.

          • freeone3000 11 hours ago
            It’s a simple compiler optimization over bayesian statistics. It’s masters-level stuff at best, given that I’m on it instead of some expert. The codebase is mixed python and rust, neither of which are uncommon.

            The issues I ran into are primarily “tail-chasing” ones - it gets into some attractor that doesn’t suit the test case and fails to find its way out. I re-benchmark every few months, but so far none of the frontier models have been able to make changes that have solved the issue without bloating the codebase and failing the perf tests.

            It’s fine for some boilerplate dedup or spinning up some web api or whatever, but it’s still not suitable for serious work.

            • rahimnathwani 10 hours ago
              Would you expect a junior engineer to perform better than this?
          • bryanlarsen 11 hours ago
            > like optimizing graphics engines for AAA games.

            Claude would be worse than an expert at this, but this is a benchmarkable task. Claude can do experiments a lot quicker than a human can. The hard part would be ensure that the results aren't just gaming your benchmark.

          • imiric 11 hours ago
            The possibility that the performance of these tools still isn't at the level some people need it to be is not an option?

            It's insulting that criticism is often met with superficial excuses and insinuation that the user lacks the required skills.

            • indemnity 5 hours ago
              When really solid programmers who started skeptical (and even have a ban policy if PR submitters don’t disclose they used AI) now show how their workflows have been improved by AI agents, it may be worth trying to understand what they are doing and you are not.

              https://mitchellh.com/writing/my-ai-adoption-journey

              My experience mirrors that of Mitchell. It absolutely is at the level now where AI can free up time to do the really interesting stuff.

            • rahimnathwani 11 hours ago
              That possibility is covered by A and B.

              GP said 'falls short every time I’ve tried'. Note the word 'every'.

        • andkenneth 12 hours ago
          Companies are not comparing it straight to juniors. They're more making a comparison between a Senior with the assistance of one more more juniors, vs a Senior with the assistance of AI Agents.

          I feel like comparison just to a junior developer is also becoming a fairly outdated comparison. Yes, it is worse in some ways, but also VASTLY superior in others.

          • taurath 12 hours ago
            It’s funny so many companies making people RTO and spending all this money on offices to get “hallway” moments of innovation, while emptying those offices of the people most likely to have a new perspective.
        • buzzerbetrayed 12 hours ago
          I am way more productive with $200/month of AI than I would be with $5,000/month of junior developer. And it isn’t close.
          • poslathian 9 hours ago
            What if you are going to spend 5400 either way, you go all agent or get an apprentice and an agent for them too.
    • reactordev 12 hours ago
      You know those VC funded startups with just two founders… them.
    • jwpapi 13 hours ago
      I mean what you get for Claude Code Max is insane its 30x on the token price. If you don’t spend that all it’s your own fault. That must be below elecricity cost
  • mcintyre1994 15 hours ago
    I’ve been mostly holding off on learning any of the tools that do this because it seemed so obvious that it’ll be built natively. Will definitely give this a go at some point!
  • pronik 16 hours ago
    To the folks comparing this to GasTown: keep in mind that Steve Yegge explicitely pitched agent orchestrators to among others Anthropic months ago:

    > I went to senior folks at companies like Temporal and Anthropic, telling them they should build an agent orchestrator, that Claude Code is just a building block, and it’s going to be all about AI workflows and “Kubernetes for agents”. I went up onstage at multiple events and described my vision for the orchestrator. I went everywhere, to everyone. (from "Welcome to Gas Town" https://steve-yegge.medium.com/welcome-to-gas-town-4f25ee16d...)

    That Anthropic releases Agent Teams now (as rumored a couple of weeks back), after they've already adopted a tiny bit of beads in form of Tasks) means that either they've been building them already back when Steve pitched orchestrators or they've decided that he's been right and it's time to scale the agents. Or they've arrived at the same conclusions independently -- it won't matter in the larger scale of things. I think Steve greately appreciates it existing; if anything, this is a validation of his vision. We'll probably be herding polecats in a couple of months officially.

    • mohsen1 15 hours ago
      It's not like he was the only one who came up with this idea. I built something like that without knowing about GasTown or Beeds. It's just an obvious next step

      https://github.com/mohsen1/claude-code-orchestrator

      • gbnwl 13 hours ago
        I also share your confusion about him somehow managing to dominate credit in this space, when it doesn't even seem like Gastown ended up being very effective as a tool relative to its insane token usage. Everyone who's used an agentic tool for longer than a day will have had the natural desire for them to communicate and coordinate across context windows effectively. I'm guessing he just wrote the punchiest article about it and left an impression on people who had hitherto been ignoring the space entirely.
      • behnamoh 14 hours ago
        Exactly! I built something similar. These are such low hanging fruit ideas that no one company/person should be credited for coming up with them.
        • yks 8 hours ago
          Seriously, I thought that was what langchain was for back in 2023.
          • simianwords 7 hours ago
            Seriously, what is langchain? It’s so completely useless. Clearly none of the new agents care about it or need it. Irrelevant.
            • yieldcrv 2 hours ago
              > what is langchain?

              and incantation you put on your resume to double your salary for a few months before the company you jumped ship to gets obsoleted by the foundational model

    • bonesss 16 hours ago
      Compare both approaches to mature actor frameworks and they don’t seem to be breaking much ice. These kinds of supervisor trees and hierarchies aren’t new for actor based systems and they’re obvious applications of LLM agents working in concert.

      The fact that Anthropic and OpenAI have been going on this long without such orchestration, considering the unavoidable issues of context windows and unreliable self-validation, without matching the basic system maturity you get from a default Akka installation shows us that these leading LLM providers (with more money, tokens, deals, access, and better employees than any of us), are learning in real time. Big chunks of the next gen hype machine wunder-agents are fully realizable with cron and basic actor based scripting. Deterministically, write once run forever, no subscription needed.

      Kubernetes for agents is, speaking as a krappy kubernetes admin, not some leap, it’s how I’ve been wiring my local doom-coding agents together. I have a hypothesis that people at Google (who are pretty ok with kubernetes and maybe some LLM stuff), have been there for a minute too.

      Good to see them building this out, excited to see whether LLM cluster failures multiply (like repeating bad photocopies), or nullify (“sorry Dave, but we’re not going to help build another Facebook, we’re not supposed to harm humanity and also PHP, so… no.”).

      • ttoinou 16 hours ago
        If it was so obvious and easy, why didn't we have this a year ago ? Models were mature enough back then to make this work
        • CuriouslyC 15 hours ago
          Orchestration definitely wasn't possible a year ago, the only tool that even produced decent results that far back was Aider, it wasn't fully agentic, and it didn't really shine until Gemini 2.5 03-25.

          The truth is that people are doing experiments on most of this stuff, and a lot of them are even writing about it, but most of the time you don't see that writing (or the projects that get made) unless someone with an audience already (like Steve Yegge) makes it.

          • ttoinou 15 hours ago
            Roo Code in VSCode was working fine a year ago, even back in November 2024 with Sonnet 3.5 or 3.7
        • bcrosby95 14 hours ago
          The high level idea is obvious but doing it is not easy. "Maybe agents should work in teams like humans with different roles and responsibilities and be optimized for those" isn't exactly mind bending. I experimented with it too when LLM coding became a thing.

          As usual, the hard part is the actual doing and producing a usable product.

        • lossolo 15 hours ago
          Because gathering training data and doing post-training takes time. I agree with OP that this is the obvious next step given context length limitations. Humans work the same way in organizations, you have different people specializing in different things because everyone has a limited "context length".
        • troupo 12 hours ago
          Because they are not good engineers [1]

          Also, because they are stuck in a language and an ecosystem that cannot reliably build supervisors, hierarchies of processes etc. You need Erlang/Elixir for that. Or similar implementations like Akka that they mention.

          [1] Yes, they claim their AI-written slop in Claude Code is "a tiny game engine" that takes 16ms to output a couple of hundred of characters on screen: https://x.com/trq212/status/2014051501786931427

      • ruined 16 hours ago
        what mature actor frameworks do you recommend?
        • jghn 15 hours ago
          They did mention Akka in their post, so I would assume that's one of them.
        • troupo 12 hours ago
          Elixir/Erlang. It's table stakes for them.
    • isoprophlex 16 hours ago
      There seems to be a lot of convergent evolution happening in the space. Days before the gas town hype hit, I made a (less baroque, less manic) "agent team" setup: a shell script to kick off a ralph wiggum loop, and CLAUDE-MESSAGE-BUS.md for inter-ralph communication (Thread safety was hacked into this with a .claude.lock file).

      The main claude instance is instructed to launch as many ralph loops as it wants, in screen sessions. It is told to sleep for a certain amount of time to periodically keep track of their progress.

      It worked reasonably well, but I don't prefer this way of working... yet. Right now I can't write spec (or meta-spec) files quick enough to saturate the agent loops, and I can't QA their output well enough... mostly a me thing, i guess?

      • CuriouslyC 15 hours ago
        Not a you thing. Fancy orchestration is mostly a waste, validation is the bottleneck. You can do E2E tests and all sorts of analytic guardrails but you need to make sure the functionality matches intent rather than just being "functional" which is still a slow analog process.
      • pronik 16 hours ago
        > Right now I can't write spec (or meta-spec) files quick enough to saturate the agent loops, and I can't QA their output well enough... mostly a me thing, i guess?

        Same for me, however, the velocity of the whole field is astonishing and things change as we get used to them. We are not talking that much about hallucinating anymore, just 4-5 months ago you couldn't trust coding agents with extracting functionality to a separate file without typos, now splitting Git commits works almost without a hinch. The more we get used to agents getting certain things right 100% of the time, the more we'll trust them. There are many many things that I know I won't get right, but I'm absolutely sure my agent will. As soon as we start trusting e.g. a QA agent to do his job, our "project management" velocity will increase too.

        Interestingly enough, the infamous "bowling score card" text on how XP works, has demonstrated inherently agentic behaviour in more way than one (they just didn't know what "extreme" was back then). You were supposed to implement a failing test and then implement just enough functionality for this test to not fail anymore, even if the intended functionality was broader -- which is exactly what agents reliably do in a loop. Also, you were supposed to be pair-driving a single machine, which has been incomprehensible to me for almost decades -- after all, every person has their own shortcuts, hardware, IDEs, window managers and what not. Turns out, all you need is a centralized server running a "team manager agent" and multiple developers talking to him to craft software fast (see tmux requirement in Gas Town).

    • tyre 12 hours ago
      Sorry, are you saying that engineers at Anthropic who work on coding models every day hadn’t thought of multiple of them working together until someone else suggested it?

      I remember having conversations about this when the first ChatGPT launched and I don’t work at an AI company.

      • astrange 10 hours ago
        Claude Code has already had subagent support. Mostly because you have to do very aggressive context window management with Claude or it gets distracted.
    • segmondy 16 hours ago
      This is nothing new, folks have been doing this for since 2023. Lots of paper on arxiv and lots of code in github with implementation of multiagents.

      ... the "limit" were agents were not as smart then, context window was much smaller and RLVR wasn't a thing so agents were trained for just function calling, but not agent calling/coordination.

      we have been doing it since then, the difference really is that the models have gotten really smart and good to handle it.

    • yieldcrv 1 hour ago
      Why is Yegge so.... loud?

      Like, who cares? Judging from his blog recount of this it doesn't seem like anybody actually does. He's an unnecessarily loud and enthused engineer inserting himself into AI conversations instead of just playing office politics to join the AI automation effort inside of a big corporation?

      "wow he was yelling about agent orchestration in March 2025", I was about 5 months behind him, the company I was working for had its now seemingly obligatory "oh fuck, hackathon" back in August 2025

      and we all came to the same conclusions. conferences had everyone having the same conclusion, I went to the local AWS Invent, all the panels from AWS employees and Developer Relations guys were about that

      it stands to reason that any company working on foundational models and an agentic coding framework would also have talent thinking about that sooner than the rest of us

      so why does Yegge want all of this attention and think its important at all, it seems like it would have been a waste of energy to bother with, like in advance everything should have been able to know that. "Anthropic! what are you doing! listen to meeeehhhh let me innnn!"

      doesn't make sense, and gastown's branding is further unhinged goofiness

      yeah I can't really play the attribution games on this one, can't really get behind who cares. I'm glad its available in a more benign format now

    • aaaalone 16 hours ago
      Honestly this is one of plenty ideas I also have.

      But this shows how much stuff is still to do in the ai space

    • dingnuts 16 hours ago
      [dead]
  • GoatOfAplomb 16 hours ago
    I wonder if my $20/mo subscription will last 10 minutes.
    • mohsen1 15 hours ago
      At this point, if you're paying out of pocket you should use Kimi or GLM for it to make sense
      • andai 11 hours ago
        GLM is OK (haven't used it heavily but seems alright so far), a bit slow with ZAI's coding plan, amazingly fast on Cerebras but their coding plan is sold out.

        Haven't tried Kimi, hear good things.

      • bluerooibos 13 hours ago
        These are super slow to run locally, though, unless you've got some great hardware - right?

        At least, my M1 Pro seems to struggle and take forever using them via Ollama.

    • tclancy 15 hours ago
      Ah ok, same. I keep wondering about how this would ever accomplish anything.
    • simlevesque 16 hours ago
      I've had good results with Haiku for certain tasks.
  • traviscline 5 hours ago
    Been using these types of flows across agent harnesses for a while. Check out https://github.com/tmc/it2
  • ottah 17 hours ago
    I absolutely cannot trust Claude code to independently work on large tasks. Maybe other people work on software that's not significantly complex, but for me to maintain code quality I need to guide more of the design process. Teams of agents just sounds like adding a lot more review and refactoring that can just be avoided by going slower and thinking carefully about the problem.
    • nickstinemates 15 hours ago
      You write a generic architecture document on how you want your code base to be organized, when to use pattern x vs pattern y, examples of what that looks like in your code base, and you encode this as a skill.

      Then, in your prompt you tell it the task you want, then you say, supervise the implementation with a sub agent that follows the architecture skill. Evaluate any proposed changes.

      There are people who maximize this, and this is how you get things like teams. You make agents for planning, design, qa, product, engineering, review, release management, etc. and you get them to operate and coordinate to produce an outcome.

      That's what this is supposed to be, encoded as a feature instead of a best practice.

      • satellite2 15 hours ago
        Aren't you just moving the problem a little bit further? If you can't trust it will implement carefully specified features, why would you believe it would properly review those?
        • frde_me 14 hours ago
          It's hard to explain, but I've found LLMs to be significantly better in the "review" stage than the implementation stage.

          So the LLM will do something and not catch at all that it did it badly. But the same LLM asked to review against the same starting requirement will catch the problem almost always

          The missing thing in these tools is that automatic feedback loop between the two LLMs: one in review mode, one in implementation mode.

          • resonious 14 hours ago
            I've noticed this too and am wondering why this hasn't been baked into the popular agents yet. Or maybe it has and it just hasn't panned out?
            • bashtoni 14 hours ago
              Anecdotaly I think this is in Claude Code. It's pretty frequent to see it implement something, then declare it "forgot" a requirement and go back and alter or add to the implementation.
            • bethekidyouwant 11 hours ago
              You have to dump the context window for the review to work good.
      • tclancy 15 hours ago
        How does this not use up tokens incredibly fast though? I have a Pro subscription and bang up against the limits pretty regularly.
        • doctoboggan 15 hours ago
          It _does_ use up tokens incredibly fast, which is probably why Anthropic is developing this feature. This is mostly for corporations using the API, not individuals on a plan.
          • digdugdirk 15 hours ago
            I'd love to see a breakdown of the token consumption of inaccurate/errored/unused task branches for claude code and codex. It seems like a great revenue source for the model providers.
            • shafyy 14 hours ago
              Yeah, that's what I was thinking. They do have an incentive to not get everything right on the first try, as long as they don't over do it... I also feel like that they try to get more token usage by asking unnecesary follow up questions that the user may say yes to etc.
        • indemnity 5 hours ago
          I had to go to Max, Pro is more like a taster.

          At work tho we use Claude Code thru a proxy that uses the model hosted on AWS bedrock. It’s slower than consumer direct-to-Anthropic and you have to wait a bit for the latest models (Opus 4.5 took a while to get), but if our stats are to be believed it’s much much cheaper.

        • nickstinemates 10 hours ago
          I don't know, all I can say is with API-based billing, doing multi-thousand like refactors that would take days to do costs like $4. In terms of value : effort, it's incredible.
        • andyferris 14 hours ago
          It does use tokens faster, yes.
    • aqme28 16 hours ago
      I agree, but I've found that making an "adversarial" model within claude helps with the quality a lot. One agent makes the change, the other picks holes in it, and cycle. In the end, I'm left with less to review.

      This sounds more like an automation of that idea than just N-times the work.

      • Keyframe 15 hours ago
        Glad I'm not the only one. I do the same, but I tend to have gemini be the one that critiques.
      • diego898 15 hours ago
        Do you do this manually? Or some abstraction above that? skills, some light orchestration, etc?
        • aqme28 15 hours ago
          I just tell it to do so, but you could even add that as a requirement to CLAUDE.md
    • stpedgwdgfhgdd 16 hours ago
      Exactly, one out of four or three prompts require tuning, nudging or just stopping it. However it takes seniority to see where it goes astray. I suspect that lots of folks dont even notice that CC is off. It works, it passes the tests, so it is good.
    • turtlebits 16 hours ago
      Humans can't handle large tasks either, which is why you break them into manageable chunks.

      Just ask claude to write a plan and review/edit it yourself. Add success criteria/tests for better results.

    • nprz 16 hours ago
      There is research[0] currently being done on how to divide tasks and combine the answers to LLMs. This approach allows LLMs reach outcomes (solving a problem that requires 1 million steps) which would be impossible otherwise.

      [0]https://arxiv.org/abs/2511.09030

      • woah 15 hours ago
        All they did was prompt an LLM over and over again to execute one iteration of a towers of hanoi algorithm. Literally just using it as a glorified scripting language:

        ```

        Rules:

        - Only one disk can be moved at a time.

        - Only the top disk from any stack can be moved.

        - A larger disk may not be placed on top of a smaller disk.

        For all moves, follow the standard Tower of Hanoi procedure: If the previous move did not move disk 1, move disk 1 clockwise one peg (0 -> 1 -> 2 -> 0).

        If the previous move did move disk 1, make the only legal move that does not involve moving disk1.

        Use these clear steps to find the next move given the previous move and current state.

        Previous move: {previous_move} Current State: {current_state} Based on the previous move and current state, find the single next move that follows the procedure and the resulting next state.

        ```

        This is buried down in the appendix while the main paper is full of agentic swarms this and millions of agents that and plenty of fancy math symbols and graphs. Maybe there is more to it, but the fact that they decided to publish with such a trivial task which could be much more easily accomplished by having an llm write a simple python script is concerning.

        • Spoom 14 hours ago
          Good lord, I can only imagine the wasted electricity.
      • ottah 16 hours ago
        No offense to the academic profession, but they're not a good source of advice for best practices in commercial software development. They don't have the experience or the knowledge sufficient to understand my workplace and tasks. Their skill set and job is orthogonal to the corporate world.
        • nprz 16 hours ago
          Yes, the problem solved in the paper (Tower of Hanoi) is far more easily defined than 99% of actual problems you would find in commercial software development. Still proof of "theoretically possible" and seems like an interesting area of research.
    • BonoboIO 16 hours ago
      You definitely have to create some sort of PLAN.md and PROGRESS.md via a command and an implement command that delegates work. That is the only way that I can get bigger things done no matter how „good“ their task feature is.

      You run out of context so quickly and if you don’t have some kind of persistent guidance things go south

      • ottah 16 hours ago
        It's not sufficient, especially if I am not learning about the problem by being part of the implementation process. The models are still very weak reasoners, writing code faster doesn't accelerate my understanding of the code the model wrote. Even with clear specs I am constantly fighting with it duplicating methods, writing ineffective tests, or implementing unnecessarily complex solutions. AI just isn't a better engineer than me, and that makes it a weak development partner.
        • vonneumannstan 15 hours ago
          >AI just isn't a better engineer than me, and that makes it a weak development partner.

          This would also be true of Junior Engineers. Do you find them impossible to work with as well?

      • koakuma-chan 16 hours ago
        I tried doing that and it didn't work. It still adds "fallbacks" that just hide errors or the fact that there is no actual implementation and "In a real app, we would do X, just return null for now"
    • findjashua 15 hours ago
      you need a reviewer agent for every step of the process - review the plan generated by the planner, the update made by the task worker subagent, and a final reviewer once all tasks are done.

      this does eat up tokens _very_ quickly though :(

  • d4rkp4ttern 13 hours ago
    This sounds very promising. Using multiple CC instances (or mix of CLI-agents) across tmux panes has always been a workflow of mine, where agents can use the tmux-cli [1] skill/tool to delegate/collaborate with others, or review/debug/validate each others work.

    This new orchestration feature makes it much more useful since they share a common task list and the main agent coordinates across them.

    [1] https://github.com/pchalasani/claude-code-tools?tab=readme-o...

    • vardalab 4 hours ago
      Yeah, I've been using your tools for a while. They've been nice.
  • bhasi 17 hours ago
    Seems similar to Gas Town
    • rafram 16 hours ago
      I'm not anti-whimsy, but if your project goes too hard on the whimsy (and weird AI-generated animal art), it's kind of inevitable that someone else is going to create a whimsy-free clone, and their version will win because it's significantly less embarrassing to explain to normal people.
    • reissbaker 16 hours ago
      Where are the polecats, though? What about the mayor's dog?
    • koakuma-chan 16 hours ago
      I don't know what Gas Town is, but Claude Code Agent Teams is what I was doing for a while now. You use your main conversation only to spawn sub agents to plan and execute, allowing you to work for a long time without losing context or compacting, because all token-heavy work is done by sub agents in their own context. Claude Code Agent Teams just streamlines this workflow as far as I can tell.
    • nickorlow 17 hours ago
      yeah, seems like a much simpler design though (i.e. only seems like one 'special/leader' agent, and the rest are all workers vs gastown having something like 8 different roles mayor, polecat, witnesses, etc).

      Wonder how they compare?

      • greenfish6 17 hours ago
        i would have to imagine the gastown design isn't optimal though? why 8, and why does there need to multiple hops of agent communications before two arbitrary agents communicate with each other as opposed to single shared filespace?
        • Ethee 16 hours ago
          I've been using Gas Town a decent bit since it was released. I'd agree with you that it's design is sub-optimal, but I believe that's more due to the way the actual agents/harnesses have been designed as opposed to optimal software design. The problem you often run into is that agents will sometimes hang thinking they need human input for a problem they are on, or they think they're at a natural stopping point. If you're trying to do fully orchestrated agentic coding where you don't look at the code at all (putting aside whether that's good or not for a second) then this is sub-optimal behavior, and so these extra roles have been designed to 'keep the machine going' as it were.

          Often times if I'm only working on a single project or focus, then I'm not using most of those roles at all and it's as you describe, one agent divvying out tasks to other agents and compiling reports about them. But due to the fact that my velocity with this type of coding is now based on how fast I can tell that agent what I want, I'm often working on 3 or 4 projects simultaneously, and Gas Town provides the perfect orchestration framework for doing this.

          • cstejerean 13 hours ago
            the problem with gastown is it tries to use agents for supervision when it should be possible to use much simpler and deterministic approaches to supervision, and also being a lot more token efficient
        • nickorlow 14 hours ago
          yegge's article does come off as complicated design for the sake of complication
    • temuze 17 hours ago
      Yeah but worse

      No polecats smh

    • ramesh31 17 hours ago
      >"Seems similar to Gas Town"

      I love that we are in this world where the crazy mad scientists are out there showing the way that the rest of us will end up at, but ahead of time and a bit rough around the edges, because all of this is so new and unprecedented. Watching these wholly new abstractions be discovered and converged upon in real time is the most exciting thing I've seen in my career.

      • bredren 17 hours ago
        The action is hot, no doubt. This reminds me of Spacewar! -> Galaxy Game / Computer Space.
  • Sol- 17 hours ago
    With stuff like this, might be that all the infra build-out is insufficient. Inference demand will go up like crazy.
    • RGamma 16 hours ago
      Unlocking the next order of magnitude of software inefficiency!

      Though I do hope the generated code will end up being better than what we have right now. It mustn't get much worse. Can't afford all that RAM.

      • Sol- 15 hours ago
        Dunno, it's probably less energy efficient than a human brain, but being able to turn electricity into intelligence is pretty amazing. RAM and power generation are engineering problems to be solved for civilization to benefit from this.
    • kylehotchkiss 17 hours ago
      It'd be nice if CC could figure out all the required permissions upfront and then let you queue the job to run overnight
    • Der_Einzige 17 hours ago
      Anyone paying attention has known that demand for all type of compute than can run LLMs (i.e. GPUs, TPUs, hell even CPUs) was about to blow up, and will remain extremely large for years to come.

      It's just HN that's full of "I hate AI" or wrong contrarian types who refuse to acknowledge this. They will fail to reap what they didn't sow and will starve in this brave new world.

      • sciencejerk 15 hours ago
        Agreed, agent scaling and orchestration indicates that demand for compute is going to blow up, if it hasn't already. The rationale for building all those datacenters they can't build fast enough is finally making sense.
      • emp17344 16 hours ago
        This reads like a weird cult-ish revenge fantasy.
        • RGamma 16 hours ago
          And what about you? Show your "I used AI today" badge, right now!
      • anthem2025 16 hours ago
        [dead]
      • ffffuuuuuccck 16 hours ago
        [flagged]
        • aaaalone 16 hours ago
          If ai progresses slow enough, we will end in a society were high unemployment numbers are the norm and we are stuck in capitalism.

          And if I think about one 'senior' in my team I would pref an expensive ai subscription over that one person already.

        • Der_Einzige 16 hours ago
          [flagged]
          • sciencejerk 15 hours ago
            Blue collar work won't be safe for long. Just longer.
          • emp17344 16 hours ago
            What the fuck is wrong with you? This guy is either a troll or legitimately mentally ill.
      • mrkeen 16 hours ago
        Oh yeah I mean if you're a webdev and you haven't built several data centres already you're basically asking to be homeless.
  • nkmnz 17 hours ago
    I’m looking for something like this, with opus in the driver seat, but the subagents should be using different LLMs, such as Gemini or Codex. Anyone know if such a tool? just-every/code almost does this, but the lead/orchestrator is always codex, which feels too slow compared to opus or Gemini.
    • nikcub 17 hours ago
      I use opus for coding and codex for reviews. I trigger the reviews in each work task with a review skill that calls out to codex[0]

      I don't need anything more complicated than that and it works fine - also run greptile[1] on PR's

      [0] https://github.com/nc9/skills/tree/main/review

      [1] https://www.greptile.com/

    • eaf7e281 14 hours ago
      These two basically do what you want, let Claude be the manager and Codex/Gemini be the worker. Many say that Coder-Codex-Gemini is easier to understand than CCG-Workflow, which has too many commands to start with.

      https://github.com/FredericMN/Coder-Codex-Gemini https://github.com/fengshao1227/ccg-workflow

      This one also seems promising, but I haven't tried it yet.

      https://github.com/bfly123/claude_code_bridge

      All of them are made by Chinese dev. I know some people are hesitant when they see Chinese products, so I'll address that first. But I have tried all of them, and they have all been great.

    • khaliqgant 14 hours ago
      You can accomplish this with https://github.com/AgentWorkforce/relay and make the Lead/Orchestrator any harness you want. At the core agent-relay is agent to agent communication but it unlocks quite a few multi agent orchestration paradigms. I wrote about some learnings here as well https://x.com/khaliqgant/status/2019124627860050109?s=46
    • fosterfriends 17 hours ago
      I think this is where future cursor features will be great - to coordinate across many different model providers depending on the sub-jobs to be done
      • nkmnz 17 hours ago
        What I want is something else: I want them to work in parallel on the same problem, and the orchestrator to then evaluate and consolidate their responses. I’m currently doing this manually, but it’s tedious.
    • knes 17 hours ago
      At Augment' we've been working on this. Multi agents orchestration, spec driven, different models for different tasks, etc.

      https://www.augmentcode.com/product/intent

      can use the code AUGGIE to skip the queue. Bring your own agent (powered by codex, CC, etc) coming to it next week.

    • sathish316 17 hours ago
      You can run an ensemble of LLMs (Opus, Gemini, Codex) in Claude Code Router via OpenRouter or any Agent CLI that supports Subagents and not tied to a single LLM like Opencode. I have an example of this in Pied-Piper, a subagent orchestrator that runs in Claude Code or ClaudeCodeRouter and uses distinct model/roles for each Subagent:

      1. GPT-5.2 Codex Max for planning

      2. Opus 4.5 for implementation

      3. Gemini for reviews

      It’s easy to swap models or change responsibilities. Doc and steps here: https://github.com/sathish316/pied-piper/blob/main/docs/play...

  • giancarlostoro 15 hours ago
    I was working on my own alternative to Beads... then I realized I could do exactly this with something similar to Beads, I'm planning on open sourcing it soon because I like what I have so far, I also made it so I can sync my tasks directly to my GitHub projects as well. I think its more useful to have agent tasks eventually synched back up to real ticketing systems for historical reasons. Besides, its better to have alternatives that are agent agnostic.
  • asdev 16 hours ago
    I personally have no use for this type of workflow. I like parallel claude code instances in worktrees but nothing beyond that
    • hpdigidrifter 4 hours ago
      Am not a fan of dealing with worktrees Maybe for larger longer lived tasks but the time spent on merges from different agents is definitely a big headwind for parallel work.

      This seems handled by this new agent which is cool.

      I gave up on worktrees and hacked together a solution with fine-grained lockfiles for editing, running builds, etc that worked surprisingly good for what it was

  • drbscl 14 hours ago
    I just built a quick plugin to automatically add agents & skills then fire off a team with them, depending on your task: https://github.com/drbscl/dream-team
  • khaliqgant 14 hours ago
    Been waiting for this to drop and excited to test it out. We've been building something in this space - https://github.com/AgentWorkforce/relay, a real-time messaging layer that lets AI coding agents talk to each other across any CLI.

    Assign roles to different models and have them coordinate: Claude as the lead, Codex on backend, Gemini on frontend, etc.

    I wrote about my experiences with multi-agent orchestration here: https://x.com/khaliqgant/status/2019124627860050109?s=46

  • ndesaulniers 16 hours ago
    Subagents are out, put it all on agent teams!
  • greenfish6 17 hours ago
    something i really like from tryin git out over the last 10 minutes is that the main agent will continue talking to you while other agents are working, so you don't have to queue a message
  • taikahessu 17 hours ago
    Clean up the team
  • Retr0id 17 hours ago
    Claude Town
  • greenfish6 17 hours ago
    Excited to try this out. I've seen a lot of working systems on my own computer that share files to talk between different Claude Code agents and I think this could work similarly to that.

    (i thought gas town was satire? people in comments here seem to be saying that gas town also had multi-agent file sharing for work tracking)

  • dangus 12 hours ago
    A cynical read of this is that it’s all a ploy to maximize usage.

    Why do agents need to speak to each other if they’re just doing the work correctly the first time?

    Is it an admission that a single agent is not useful and reliable enough?

  • morleytj 17 hours ago
    Gas Town decimated by Claude bomb from orbit
  • avereveard 16 hours ago
    "finish Claude tokens quota in 3 minutes, largely over delegation and result messages instead of code writing"
  • imiric 11 hours ago
    I find it amusing that the innovation in this space for the past year+ has been mostly centered around engineering: MCP, "agents", "skills", etc. Now "agent" orchestration is the new hotness.

    Meanwhile, the same issues that have plagued these tools since their inception are largely ignored: hallucination, innacuracy, context collapse, etc. These won't be solved by engineering, but by new research and foundational improvements.

    On one hand, solid engineering was sorely needed, and can extract a lot of value from the current tech. But on the other, all these announcements and improvements feel like companies grasping at straws to keep the hype cycle going by any means necessary. Charts must go up and to the right, or investors get antsy.

    It's all adding to the mountain of signs that suggest that this isn't the path to artificial intelligence. It's interesting tech, with possibly many valuable applications, but the "AI" narrative is frankly tiring. I wish I could fast forward on this speculative phase, go past the inevitable crash, and arrive at a timeframe where we've figured out what this tech is actually good for, and where we hopefully use it more for good than evil.

  • IhateAI 17 hours ago
    Any self respecting engineer should recognize that these tools and models only serve to lower the value of your labor. They aren't there to empower you, they aren't going to enable you to join the ruling class with some vibe-rolled slop SaaS.

    Using these things will fry your brain's ability to think through hard solutions. It will give you a disease we haven't even named yet. Your brain will atrophy. Do you want your competency to be correlated 1:1 to the quality and quantity of tokens you can afford (or be loaned!!)?

    Their main purpose is to convince C-suite suits that they don't need you, or they should be justified in paying you less.This will of course backfire on them, but in the meantime, why give them the training data, why give them the revenue??

    I'd bet anything these new models / agentic-tools are designed to optimize for token consumption. They need the revenue BADLY. These companies are valued at 200 X Revenue.. Google IPO'd at 10-11 x lmfao . Wtf are we even doing? Can't wait to watch it crash and burn :) Soon!

    • tjr 17 hours ago
      People often compare working with AI agents to being something like a project manager.

      I've been a project manager for years. I still work on some code myself, but most of it is done by the rest of the team.

      On one hand, I have more bandwidth to think about how the overall application is serving the users, how the various pieces of the application fit together, overall consistency, etc. I think this is a useful role.

      On the other hand, I definitely have felt mental atrophy from not working in the code. I still think; I still do things and write things and make decisions. But I feel mentally out of shape; I lack a certain sharpness that I perceived when I was more directly in tune with the code.

      And I'm talking, all orthogonal to AI. This is just me as a project manager with other humans on the project.

      I think there is truth to, well, operate at a higher level! Be more systems-minded, architecture-minded, etc. I think that's true. And there are surely interesting new problems to solve if we can work not on the level of writing programs, but wielding tools that write programs for us.

      But I think there's also truth to the risk of losing something by giving up coding. Whether if that which might be lost is important to you or not, is your own decision, but I think the risk is real.

      • sathish316 17 hours ago
        I do think there’s a real risk of Brain Atrophy when you rely on AI coding tools for everything and while learning something new. About a year ago, I dealt with this problem by using Neovim and having shortcuts like below to easily toggle GitHub Copilot on/off. Now that AI is baked into almost every part of the toolchain in VSCode, Cursor, ClaudeCode, Intellij, I don't know how the newer engineers will learn without AI assistance.
        • IhateAI 17 hours ago
          I think in-line autocomplete is likely not that dangerous, if it's used in this manner responsibly, it's the large agentic tools that are problematic for your brain imo. But in-line autocompletes aren't going to raise billions of dollars and aren't flashy.
          • xpct 16 hours ago
            I'd say autocomplete introduces a certain level of fuzziness into the code we work with, though to a lower degree. I used autocomplete for over a year, and initially it did feel like a productivity boost, yet when I later stopped using them, it never felt like my productivity decreased. I stopped because something about losing explicit intent of my code feels uncomfortable to me.
      • majormajor 17 hours ago
        It's very difficult to operate effectively at a higher level for a continued period of time without periodically getting back into the lower levels to try new things and learn new approaches or tools.

        That doesn't even have to be writing a ton of code, but reading the code, getting intimately familiar with the metrics, querying the logs, etc.

      • IhateAI 17 hours ago
        I definitely think what you're losing is extremely important, and can't be compensated with LLMs once its gone.

        Back when automatic piano players came out, if all the world's best piano players stopped playing and mostly just composing/writing music instead, would the quality of the music have increased or decreased. I think the latter.

    • M4R5H4LL 17 hours ago
      From an economic standpoint this is basically machines doing work humans used to do. We’ve already gone through this many times. We built machines that can make stuff orders of magnitude faster than humans, and nobody really argues we should preserve obsolete tools and techniques as a valued human craft. Obviously automation messes with jobs and identity for some people, but historically a large chunk of human labor just gets automated as the tech gets better. So I feel that arguing about whether automation is good or bad in the abstract is a bit beside the point. The more interesting question imho is how people and companies adapt to it, because it’s probably going to happen either way.
      • IhateAI_2 14 hours ago
        I had to create a new account, because HN is protecting their investments and basically making it impossible to post for anyone who is critical of LLMs (said I was crawling, I'm on a dedicated proxy that definitely hasn't ever crawled HN lol).

        Automation can be good overall for society, but you also can't ignore the fact that basically all automation has decreased the value of the labor it replaced or subsidized.

        This automation isn't necessarily adding value to society. I don't see any software being built that's increasing the quality of people's life, I don't see research being accelerated. There is no economic data to support this either. The economic gains are only reflected in the values of companies who are selling tokens, or have been able to decrease their employee-counts with token allowances.

        All I see is people sharing CRUD apps on twitter, 50 clones of the same SaaS, ,people constantly complaining about how their favorite software/OS has more bugs, the cost of hardware and electricity going up and people literally going into psychosis. (I have a list of 70+ people on twitter that I've been adding too that are literally manic and borderline insane because of these tools). I can see LLMs being genuinely useful to society, like helping with real time the blind, and disabled, but noone is doing that! It doesn't make money, automation is for capital owning class, not for the working class.

        But hey, at least your favorite LLM shill from that podcast you loved can afford the $20,000/night resort this summer...

        I'd be more okay with these mostly useless automation tools if the models were open source and didn't require $500k to run locally, but until then they basically only serve to make existing billionaires pad unnecessary zeros onto their net worth, and help prevent anyone from catching up with them.

        I recommend people read this essay by Thomas Pynchon, actually read it, don't judge it by the title: https://www.nytimes.com/1984/10/28/books/is-it-ok-to-be-a-lu...

    • azan_ 13 hours ago
      Of course it's to save businesses money (and not to empower programmers)! Software engineers for years automated jobs of other people, but when it's SEs that are getting automated, suddenly progress becomes bad?
      • IhateAI 11 hours ago
        So because those people didn't defend their livelihoods we shouldn't either?

        I'd say there's very little jobs that SWE automated away outside of SOME data entry, SWE's built abstractions on top of existing processes. LLM companies want to abstract away the human entirely.

    • theappsecguy 17 hours ago
      The crash and burn can't come soon enough.
    • aaaalone 16 hours ago
      When I use Google maps, I learn faster.

      And I haven't to solve real hard problems for ages.

      Some people will have problems some will not.

      Future will tell.

    • ottah 16 hours ago
      Honestly my job is to ensure code quality and to protect the customer. I love working with claude code, it makes my life easier, but in no way would a team of agents improve code quality or speed up development. I would spend far too much time reviewing and fixing laziness and bad design decisions.

      When you hear execs talking about AI, it's like listening to someone talk about how they bought some magic beans that will solve all their problems. IMO the only thing we have managed to do is spend alot more money on accelerated compute.

    • fooker 17 hours ago
      It would be tragically ironic if this post is AI generated.
    • wantlotsofcurry 16 hours ago
      I agree on all parts. I do not understand why anyone in the software industry would bend over backwards to show their work is worth less now.
    • ramesh31 17 hours ago
      >I'd bet anything these new models / agentic-tools are designed to optimize for token consumption.

      You would think, but Claude Code has gotten incredibly more efficient over time. They are doing so much dogfooding with these things at this point that it makes more sense to optimize.

    • spelunker 16 hours ago
      How Butlerian of you.
    • markab21 17 hours ago
      Shaking fist at clouds!!
      • IhateAI 17 hours ago
        Wow, a bunch of NFT people used to say the same thing.

        lmao, please explain to me why these companies should be valued at 200x revenue.. They are providing autocomplete APIs.

        How come Google's valuation hasn't increased 100-200x, they provide foundation models + a ton more services as well and are profitable. None of this makes sense, its destined to fail.

        • OsrsNeedsf2P 17 hours ago
          I like your name, it suggests you're here for a good debate.

          Let me start by conceding on the company value front; they should not have such value. I will also concede that these models lower your value of labor and quality of craft.

          But what they give in return is the ability to scale your engineering impact to new highs - Talented engineers know which implementation patterns work better, how to build debuggable and growable systems. While each file in the code may be "worse" (by whichever metric you choose), the final product has more scope and faster delivery. You can likewise choose to hone in the scope and increase quality, if that's your angle.

          LLMs aren't a blanket improvement - They come with tradeoffs.

          • IhateAI_2 15 hours ago
            (I had to create a new account, because HN doesn't like LLM haters (don't mess with the bag ig)

            the em dashes in your reply scare me, but I'll assume you're a real person lol.

            I think your opinion is valid, but tell that to the C Suite who's laid of 400k tech workers in the last 16 months in the USA. These tools don't seem to be used to empower high quality engineering, only to naively increase the bottom line by decreasing the number of engineers, and increasing workloads on those remaining.

            Full disclosure, I haven't been laid off ever, but I see what's happening. I think when the trade-off is that your labor is worth a fraction of what it used to be and you're also expected to produce more, then that trade-off isn't worth it.

            It would be a lot different if the signaling from business leaders was the reverse. If they believed these tools empowered labor's impact to a business, and planned on rewarding on that, it would be a different story. That's not what we are seeing, and they are very open about their plans for the future of our profession.

            Automation can be good overall for society, but you also can't ignore the fact that basically all automation has decreased the value of the labor it replaced or subsidized.

            This automation isn't necessarily adding value to society. I don't see any software being built that's increasing the quality of people's life, I don't see research being accelerated. There is no economic data to support this either. The economic gains are only reflected in the values of companies who are selling tokens, or have been able to decrease their employee-counts with token allowances.

            All I see is people sharing CRUD apps on twitter, 50 clones of the same SaaS, ,people constantly complaining about how their favorite software/OS has more bugs, the cost of hardware and electricity going up and people literally going into psychosis. (I have a list of 70+ people on twitter that I've been adding too that are literally manic and borderline insane because of these tools).

            But hey, at least your favorite AI evangelist from that podcast you loved can afford the $20,000/night resort this summer...

        • tock 17 hours ago
          Google is valued at 4T. Up from 1.2T in 2022.
        • hareykrishna 16 hours ago
          it's too late to hateAI!
    • dangoodmanUT 15 hours ago
      username checks out
    • cstrahan 16 hours ago
      > Any self respecting engineer should recognize that these tools and models only serve to lower the value of your labor.

      Depends on what the aim of your labor is. Is it typing on a keyboard, memorizing (or looking up) whether that function was verb_noun() or noun_verb(), etc? Then, yeah, these tools will lower your value. If your aim is to get things done, and generate value, then no, I don't think these tools will lower your value.

      This isn't all that different from CNC machining. A CNC machinist can generate a whole lot more value than someone manually jogging X/Y/Z axes on an old manual mill. If you absolutely love spinning handwheels, then it sucks to be you. CNC definitely didn't lower the value of my brother's labor -- there's no way he'd be able to manually machine enough of his product (https://www.trtvault.com/) to support himself and his family.

      > Using these things will fry your brain's ability to think through hard solutions.

      CNC hasn't made machinists forget about basic principles, like when to use conventional vs climb milling, speeds and feeds, or whatever. Same thing with AI. Same thing with induction cooktops. Same thing with any tool. Lazy, incompetent people will do lazy, incompetent things with whatever they are given. Yes, an idiot with a power tool is dangerous, as that tool magnifies and accelerates the messes they were already destined to make. But that doesn't make power tools intrinsically bad.

      > Do you want your competency to be correlated 1:1 to the quality and quantity of tokens you can afford (or be loaned!!)?

      We are already dependent on electricity. If the power goes out, we work around that as best as we can. If you can't run your power tool, but you absolutely need to make progress on whatever it is you're working on, then you pick up a hand tool. If you're using AI and it stops working for whatever reason, you simply continue without it.

      I really dislike this anti-AI rhetoric. Not because I want to advocate for AI, but because it distracts from the real issue: if your work is crap, that's on you. Blaming a category of tool as inherently bad (with guaranteed bad results) suggests that there are tools that are inherently good (with guaranteed good results). No. That's absolutely incorrect. It is people who fall on the spectrum of mediocrity-to-greatness, and the tools merely help or hinder them. If someone uses AI and generates a bunch of slop, the focus should be on that person's ineptitude and/or poor judgement.

      We'd all be a lot better off if we held each other to higher standards, rather than complaining about tools as a way to signal superiority.

      • sciencejerk 15 hours ago
        Your brother's livelihood is not safe from AI, nor is any other livelihood. A small slice of lucky, smart, well-placed, protected individuals will benefit from AI, and I presume many unlucky people with substantial disabilities or living in poverty will benefit as well. Technology seems to continue the improve the outcomes at the very top and very bottom, while sacrificing the biggest group in the middle. Many HN Software Engineers here immensely benefitted from Big Tech over the past 15 years -- they were a part of that lucky privileged group winning 300k+ USD salaries plus equity for a long time. AI has completely disrupted this space and drastically decreased the value of their work, and it largely did this by stealing open source code for training data. These Software Engineers are right to feel upset and threatened and oppose these AI tools, since they are their replacement. I believe that is why you see so much AI hate in HN
      • IhateAI 10 hours ago
        I'm not trying to signal superiority, I'm legitimately worried about the value of my livelihood and skills I'm passionate about. What if McDonalds went around telling chefs that they're cooking wrong, that there's no reason to cook food in a traditional manner when you can increase profit and speed with their methods?

        It would be insulting, they'd get screamed out of the kitchen. Now imagine they're telling those chefs they're going to enforce those methods on them regardless whether they like it or not.

    • hareykrishna 16 hours ago
      [flagged]