31 comments

  • simonw 1 day ago
    This is a notably better demonstration of a coding agent generated browser than Cursor's FastRender - it's a fraction of the size (20,000 lines of Rust compared to ~1.6m), uses way fewer dependencies (just system libraries for rendering images and text) and the code is actually quite readable - here's the flexbox implementation, for example: https://github.com/embedding-shapes/one-agent-one-browser/bl...

    Here's my own screenshot of it rendering my blog - https://bsky.app/profile/simonwillison.net/post/3mdg2oo6bms2... - it handles the layout and CSS gradiants really well, renders the SVG feed icon but fails to render a PNG image.

    I thought "build a browser that renders HTML+CSS" was the perfect task for demonstrating a massively parallel agent setup because it couldn't be productively achieved in a few thousand lines of code by a single coding agent. Turns out I was wrong!

    • g947o 19 hours ago
      I think most people would agree that this is much more superior than Cursor's "browser" from an engineering perspective -- it doesn't do much but does it well, as you pointed out.

      What it tells me is that "effectively using agents" can be much more important than just throwing tokens at a problem and see what comes out. I myself have completely deleted several small vibe-coded projects without even going over the code, because what often happens is that, two days after the code is generated, I realize that I was solving the wrong problem or using the wrong approach.

      A coding agent doesn't care. It most likely just does whatever you ask it to do with no pushback. While in some cases it's worth using them to validate an idea, often you dig a deeper hole for yourself if you go down a wrong path in the first place.

      • embedding-shape 19 hours ago
        Yeah, I agree with all of what you wrote, how these are used seems (to me) to be more important than how they're built. If you don't know software engineering, a software engineering agent isn't suddenly gonna make you one, but someone who already knows the craft, can be very effective with one.

        Amplifiers, rather than replacements. I think the community at large still thinks LLMs and agents are gonna be "replacing" knowledge, which I think is far from the truth.

        • menaerus 6 hours ago
          I built a moderately complex and very good looking website in ~2 hours with the coding agent. Next step would be to write a backend+storage, and given how well the agent performs in these type of tasks, I assume I will be able to do that in the manner of hours too. I have never ever touched any of the technology involving the web development so, in my case, I can say that I no more need a full-stack dev that in normal circumstances I would definitely do. And the cost is ridiculous - few hours invested + $20 subscription.

          I agree however on the point that no prior software engineering skills would make this much more difficult.

          • queenkjuul 1 minute ago
            Nobody ever needed a full stack dev to build a website
          • embedding-shape 5 hours ago
            Yeah, I don't doubt you, it's really effective at knocking out "simple" projects, I've had success vibe-coding for days, but eventually unless you have some reins on the architecture/design, it falls down over it's own slop, and it's very noticeable as the agent spends more and more time trying to work in the changes, but it's unable to.

            So the first day or two, each change takes 20-30 minutes. Next day it takes 30-40 minutes per change, next day up to an hour and so on, as the requirements start to interact with each other, together with the ball of spaghetti they've composed and are now trying to change without breaking other parts.

            Contrast that with when you really own the code and design, then you can keep going for weeks, all changes take 20-30 minutes, as at day one. But also means I'm paying attention to what's going on, so no vibe-coding, but pair programming with LLMs, and also requires you to understand both the domain, what you're actually aiming for and the basics of design/architecture.

            • menaerus 3 hours ago
              The point was not in simplicity but rather in if AI is replacing some people's jobs. I say that it certainly is, as given by the example, but I also acknowledge that the technology is still not at the point where human engineers are no more required in the loop.

              I built other things too which would not be considered trivial or "simple", or as you say they're architecturally complex, and they involve very domain specific knowledge about programming languages, compilers, ASTs, databases, high-performance optimizations, etc. And for a long time, or shall I say never, have I felt this productive tbh. If I were to setup a company around this, which I believe I could, in pre-LLM era I'd quite literally have to hire 3-5 experienced engineers with sufficient domain expertise to build this together with me - and I mean not in every possible potential but the concrete work I've done in ~2 weeks.

              • Imustaskforhelp 42 minutes ago
                > The point was not in simplicity but rather in if AI is replacing some people's jobs. I say that it certainly is, as given by the example, but I also acknowledge that the technology is still not at the point where human engineers are no more required in the loop.

                I feel like you have missed emsh's point which is that AI agents significantly become muddled up if your project's complex.

                I feel the same way personally. If I don't know how the AI code interacts with each other, I feel a frustration as long as the project continues precisely because of the fact that they mention about first taking less time and then taking longer and longer time having errors which it missed etc.

                I personally vibe code projects too but I will admit that there is this error.

                I have this feeling that anything really complex will fall heels first if complexity really grows a lot or you don't unclog the slop.

                This is also why we are seeing "AI slop janitors" humans whose task is to unsloppify the slop.

                Personally I have this intution that AI will create really good small products, there is no denying in that, but those were already un-monetizable or if they were, then even in the past, they were really easy to replicate, this probably just lowered the friction

                Now if your project is osmething commercial and large, I don't know how much AI slop can people trust. At some point if people depend on your project which is having these issues because people can understand if the project's AI generated or not, then that would have it issues too.

                And I am speaking this from experience after building something like whmcs in golang in AI. At first, I am surprised and I feel as if its good enough for my own personal use case (gvisor) and maybe some really small providers. But when I want it to say hook to proxmox, have the tmate server be connected with an api to allow re-opening easier, have the idea of live migration from one box to another etc., create drivers for the custom firecrackers-ssh idea that I implemented once again using AI.

                One can realize how quickly complexity adds in projects and how as emsh's points out that it becomes exponentially harder to use AI.

    • vidarh 1 day ago
      I think the human + agent thing absolutely will make a huge difference. I see regularly that Claude can totally off piste and eventually claw itself back with a proper agent setup but it will take a lot of time if I don't spot it and get it back on track.

      I have one project Claude is working on right now where I'm testing a setup to attempt to take myself more out of the loop, because that is the hard part. It's "easy" to get an agent to multiply your output. It's hard to make that scale with your willingness to spend on tokens rather than with your ability to read and review and direct.

      I've ended up with roughly this (it's nothing particularly special):

      - Runs a evaluator that evaluates the current state and assigns scores across multiple metrics.

      - If a given score is above a given threshold, expand the test suite automatically.

      - If the score is below a given threshold, spawn a "research agent" that investgates why the scores don't meet expectations.

      - The research agent delivers a report, that is passed to an implementation agent.

      - The main agent re-runs the scoring, and if it doesn't show an improvement on one or more of the metrics, the commit is discarded, and notes made of what was tried, and why it failed.

      It takes a bit of trial and error to get it right (e.g. "it's the test suite that is wrong" came up early, and the main agent was almost talked into revising the test suite to remove the "problematic" tests) but a division sort of like this lets Claude do more sensible stuff for me. Throwing away commits feels drastic - an option is to let it run a little cycle of commit -> evaluate -> redo a few times before the final judgement, maybe - but it so far it feels like it'll scale better. Less crap makes it into the project.

      And I think this will work better than to treat these agents as if they are developers whose output costs 100x as much.

      Code so cheap it is disposable should change the workflows.

      So while I agree this is a better demonstration of a good way to build a browser, it's a less interesting demonstration as well. Now that we've seen people show that something like FastRender is possible, expect people to experiment with similarly ambitious projects but with more thought put into scoring/evaluation, including on code size and dependencies.

      • embedding-shape 1 day ago
        > I think the human + agent thing absolutely will make a huge difference.

        Just the day(s) before, I was thinking about this too, and I think what will make the biggest difference is humans who posses "Good Taste". I wrote a bunch about it here: https://emsh.cat/good-taste/

        I think the ending is most apt, and where I think we're going wrong right now:

        > I feel like we're building the wrong things. The whole vibe right now is "replace the human part" instead of "make better tools for the human part". I don't want a machine that replaces my taste, I want tools that help me use my taste better; see the cut faster, compare directions, compare architectural choices, find where I've missed things, catch when we're going into generics, and help me make sharper intentional choices.

        • vidarh 1 day ago
          For some projects, "better tools for the human part" is sufficient and awesome.

          But for other projects, being able to scale with little or no human involvement suddenly turns some things that were borderline profitable or not possible to make profitable at all with current salaries vs. token costs into viable businesses.

          Where it works, it's a paradigm shift - for both good and bad.

          So it depends what you're trying to solve for. I have projects in both categories.

          • embedding-shape 1 day ago
            Personally I think the part where you try to eliminate humans from involvement, is gonna lead to too much trouble, being too inflexible and the results will be bad. It's what I've seen so far, haven't seen anything pointing to it being feasible, but I'd be happy to be corrected.
            • vidarh 1 day ago
              It really depends on the type of tasks. There are many tasks LLMs do for me entirely autonomously already, because they do it well enough that it's no longer worth my time.
    • Imustaskforhelp 21 hours ago
      To me I really like how embedding shapes took things in his own hands and actually built it. It really proved a point at such a scale where I don't think any recent example can point to.

      It's great to see hackernews be so core part of it haha.

      > I thought "build a browser that renders HTML+CSS" was the perfect task for demonstrating a massively parallel agent setup because it couldn't be productively achieved in a few thousand lines of code by a single coding agent. Turns out I was wrong!

      I do wonder if tech people from future/present are gonna witness this as a goliath vs david story. 20k 1 human 1 agent beats 5 million$ 1.6 millions loc browser changing how even the massive AI users/pioneers at the time thought about the use of AI

      Looks like I have watched some documentaries recently but why do I feel like a documentary about this whole thing can be created in future.

      But also, More and more I am feeling like AI is an absolute black box, nobody knows how to do things but we are all kind of doing experiments with it and seeing what sticks (like how we now have definitive proof that 1 human 1 agent > many agents no human in the loop)

      And this is when we are 1 month in 2026, who knows what other experiments and proofs happen this year to find more about this black box, and about its usefulness or not.

      Simon, it would be interesting if you could read the thread of predictions of 2026 thread in hn each month or quaterly to see how many people were wrong or right about AI as we figure out more things perhaps.

    • rananajndjs 23 hours ago
      [dead]
  • embedding-shape 1 day ago
    I set some rules for myself: three days of total time, no 3rd party Rust crates, allowed to use commonly available OS libraries, has to support X11/Windows/macOS and can render some websites.

    After three days, I have it working with around 20K LOC, whereas ~14K is the browser engine itself + X11, then 6K is just Windows+macOS support.

    Source code + CI built binaries are available here if you wanna try it out: https://github.com/embedding-shapes/one-agent-one-browser

    • bhadass 20 hours ago
      very impressive!

      it's amazing how far we've come in 20 years. i was a (very minor) contributor to khtml/konqueror (before apple got involved w/ webkit) in the early 2000s, and back then it was such a labor intensive process to even create a halfway working engine. like, months of work just to get basic rendering somewhat correct on a very small portion of the web (which was obv much smaller)

      in addition to agentic coding, i think for this specific task having css-spec/html-spec/web-platform-tests as machine readable test suites helps a LOT. the agent can actually validate against real specs.

      back in the day, despite having gecko as an open source reference, in practice the "standards" were whatever IE was doing. so you'd spend weeks implementing something only to discover every site was coded for IE's quirks lmao. for all of their other faults, google/apple and other contributors helped bring in discipline to that.

      • embedding-shape 20 hours ago
        > i think for this specific task having css-spec/html-spec/web-platform-tests as machine readable test suites helps a LOT

        You know, I placed the specs in the repository with that goal (even sneaked in a repo that needs compiling before being usable), but as far as I can see, the agent never actually peeked into that directory nor read anything from them in the end.

        It'll be easier to see once I made all the agent sessions public, and I might be wrong (I didn't observe the agent at all times), but seems the agent never used though.

        • bhadass 20 hours ago
          oh interesting, so it just... didn't use them? lol. i guess the model's training data already has enough web knowledge baked in that it could wing it. curious if explicitly prompting it to reference the specs would change the output quality or time to solution.

          very excited to see the agentic sessions when you release them.. that kind of transparency is super valuable for the community. i can see "build a browser from scratch" becoming a popular challenge as people explore the limits of agentic coding and try to figure out best practices for workflows/prompting. like the new "build a ray tracer" or say nanogtp but for agents.

    • chatmasta 21 hours ago
      Did you use Claude code? How many tokens did you burn? What’d it cost? What model did you use?
      • embedding-shape 21 hours ago
        Codex, no idea about tokens, I'll upload the session data probably tomorrow so you could see exactly what was done. I pay ~200 EUR/month for the ChatGPT Pro plan, prorating days I guess it'll be ~19 EUR for three days. Model used for everything was gpt-5.2 with reasoning effort set to xhigh.
        • oneneptune 1 hour ago
          Thanks in advance, I can't wait to see your prompts and how you architected this...
        • forgotpwd16 20 hours ago
          >I'll upload the session data probably tomorrow so you could see exactly what was done.

          That'll be dope. The tokens used (input,output,total) are actually saved within codex's jsonl files.

        • storystarling 19 hours ago
          That 19 EUR figure is basically subscription arbitrage. If you ran that volume through the API with xhigh reasoning the cost would be significantly higher. It doesn't seem scalable for non-interactive agents unless you can stay on the flat-rate consumer plan.
          • embedding-shape 18 hours ago
            Yeah, no way I'd do this if I paid per token. Next experiment will probably be local-only together with GPT-OSS-120b which according to my own benchmarks seems to still be the strongest local model I can run myself. It'll be even cheaper then (as long as we don't count the money it took to acquire the hardware).
            • mercutio2 15 hours ago
              What toolchain are you going to use with the local model? I agree that’s a Strong model, but it’s so slow for be with large contexts I’ve stopped using it for coding.
              • embedding-shape 9 hours ago
                I have my own agent harness, and the inference backend is vLLM.
                • storystarling 7 hours ago
                  Curious how you handle sharding and KV cache pressure for a 120b model. I guess you are doing tensor parallelism across consumer cards, or is it a unified memory setup?
                  • embedding-shape 7 hours ago
                    I don't, fits on my card with the full context, I think the native MXFP4 weights takes ~70GB of VRAM (out of 96GB available, RTX Pro 6000), so I still have room to spare to run GPT-OSS-20B alongside for smaller tasks too, and Wayland+Gnome :)
                    • storystarling 5 hours ago
                      I thought the RTX 6000 Ada was 48GB? If you have 96GB available that implies a dual setup, so you must be relying on tensor parallelism to shard the model weights across the pair.
        • soiltype 20 hours ago
          Thank you in advance for that! I barely use AI to generate code so I feel pretty lost looking at projects like this.
    • jacquesm 1 day ago
      Those are excellent constraints.
  • jFriedensreich 2 hours ago
    My community and me are waiting for browserBench for a while now and happy to see it finally starting. Browsers are arguably one of the most complex and foundational piece of software, the ability to create something like this from scratch will be an important evaluation as limits of what is possible are harder and harder to find.
  • aix1 11 hours ago
    Functionality aside, I'd find it very interesting to see a security audit of a code base like this.

    I searched for "security" and "vuln" in both the article and this discussion thread, and found no matches.

    I guess the code being in Rust helps, but to what exent can one just rely on guarantees provided by the language?

    (I know practically nothing about Rust.)

    • embedding-shape 6 hours ago
      Hah, yeah, zero regards to security, don't run this without sandbox and load arbitrary websites :)

      I don't think Rust helps much except preventing some very basic issues, for example, I don't think it even checks that URLs aren't referencing local files on disk, who knows how the path handling works, might be able to put absolute paths on remote pages and load local content? Unsure, but wouldn't surprise me.

      Might be a bit safer due to no JS engine, so even if someone did what I outlined before, they couldn't really exfiltrate anything, there is no POST/PUT requests or forms or anything :)

      I'm sure if someone did a proper audit they'd find double-digit high severity issues, at least.

  • socalgal2 11 hours ago
    I'm having a hard time imagining how 20k lines of code gets a browser with no libraries. Just zlib by itself is 12k lines. freetype is 30k lines, or stb_truetype is 5k lines. Something doesn't seem like it's adding up. Am I missing something? Is this just calling into the OS for rendering?
    • embedding-shape 9 hours ago
      No Rust dependencies. Commonly available system libraries/frameworks are used for the actual drawing. The README of the repository outlines exactly which ones are being used on what OS.
    • senko 3 hours ago
      On Linux, there are 78 dynamically linked libraries, such as for X11, vector graphics, glib/gobjectlibgobject, graphics formats, crypto, encryption, etc.
  • micimize 2 hours ago
    An obvious nice thing here compared to the cursor post is the human involvement gives some minimum threshold confidence that the writer of the post has actually verified the claims they've made :^) Illustrates how human comprehension is itself a valuable "artifact" we won't soon be able to write off.

    My comment on the cursor post for context: https://news.ycombinator.com/item?id=46625491

  • QuadmasterXLII 22 hours ago
    The rendering is pretty chaotic when I tried it- not that far off from just the text in the html tags, in some size, color, and placement on the screen. This sounds like unfairness, but there is some motte-and-bailey where if you claim to be a browser, I get to evaluate on stuff like links being consistently blue and underlined ( as is, they are sometimes blue and sometimes underlined, without a clear pattern- if they were never formatted differently from standard text, I would just buy this as a feature not implemented yet). It may be that some of the rendering is not supported on windows- the back button certainly isn't. I guess if I want to make my criticism actually legitimate I should make a "one human and no agent browser" post that just regexes out stuff that looks like content and formats it at random. The binary I downloaded definitely overperforms at the hacker news homepage and simonw's blog.
    • embedding-shape 20 hours ago
      It's a really basic browser. It's made less as an independent thing, and more as a reply to https://cursor.com/blog/scaling-agents, so as long as it does more or less the same as theirs, but is less LOC, it does what I set out for it to do :)

      > I get to evaluate on stuff like links being consistently blue and underlined

      Yeah, this browser doesn't have a "default stylesheet" like a regular browser. Probably should have added that, but was mostly just curious about rendering the websites from the web, rather than using what browsers think the web should look like.

      > It may be that some of the rendering is not supported on windows- the back button certainly isn't.

      Hmm, on Windows 11 the back button should definitively work, tried that just last night. Are you perhaps on Windows 10? I have not tried that myself, should work but might be why.

      • QuadmasterXLII 20 hours ago
        It is both extraordinarily impressive in an absolute sense, and fairly disappointing specifically comparing my result on a a random smattering of other no-js websites, to the expectation I had from the simonw screenshot (which to be clear is not an expectation you had control over, as you are not simonw). I'm familiar with this pattern from all the rest of my trying frontier ML results!

        Yep, I ran it on an old windows 10 VM I had puttering about.

        I think it must have a default link styling somewhere, as some links are the classic blue that as far as I know I intentionally styled to be black- but this could be css spaghetti in tufte.css finally coming to haunt me.

        • embedding-shape 19 hours ago
          > I'm familiar with this pattern from all the rest of my trying frontier ML results!

          Well, that's how this browser came to be, because I felt something similar to with how Cursor presented their results :) So I guess we're in the same club, somehow.

          And yeah, lots of websites render poorly, for obvious reasons, if it's better or worse than Cursor's I guess will be up to the public, I'm sure if I actually treated it as a professional project I could probably get it to work quite nicely rather than the abomination it currently is.

  • fabrice_d 19 hours ago
    This is a cool project, and to render Simon's blog will likely become the #1 goal of AI produced "web browsers".

    But we're very far from a browser here, so that's not that impressive. Writing a basic renderer is really not that hard, and matches the effort and low LoC from that experiment. This is similar to countless graphical toolkits that have been written since the 70s.

    I know Servo has a "no AI contribution" policy, but I still would be more impressed by a Servo fork that gets missing APIs implemented by an AI, with WPT tests passing etc. It's a lot less marketable I guess. Go add something like WebTransport for instance, it's a recent API so the spec should be properly written and there's a good test suite.

    • Dave3of5 5 hours ago
      100% agree this isn't a browser. It's better than the previous attempt but fails to render even basic html websites correctly and crashes constantly.

      The fact that it compiles is better the the cursor dude. "It Compiles" is a very low bar to working software.

      • embedding-shape 5 hours ago
        I think what I wanted to demonstrate here was less "You can build a browser with an agent", and more how bullshit Cursor's initial claim was, that "hundreds of agents" somehow managed to build something good, autonomously. It's more of a continuation of a blog post I wrote some days ago (https://emsh.cat/cursor-implied-success-without-evidence/), than a standalone proof of "agents can build browsers".

        Unfortunately, this context is kind of implicit, I don't actually mention it in the blog post, which I probably should have done, that's my fault.

  • happytoexplain 20 hours ago
    What kind of time frame do you ballpark this would have taken you on your own?

    I know it's a little apples-and-oranges (you and the agent wouldn't produce the exact same thing), but I'm not asking because I'm interested in the man-hour savings. Rather, I want to get a perspective on what kind of expertise went into the guidance (without having to read all the guidance and be familiar with browser implementation myself). "How long this would have taken the author" seems like one possible proxy for "how much pre-existing experience went into this agent's guidance".

    • embedding-shape 20 hours ago
      > What kind of time frame do you ballpark this would have taken you on your own?

      I don't think I'd be able to do this on my own. Not that I don't know Rust, but because I don't know X11 (nor macOS or Windows) well enough to even know where to begin.

      I've been a Linux user for almost two decades, so I know my way around my system, but never developed X11 applications or anything, I'm mostly a web developer who jumped around various roles through the years. Spent a lot of time caring deeply about testing, infrastructure, architecture/design and communication between humans, might have given me a slight edge in programming together with agents.

      • happytoexplain 18 hours ago
        Hmm, well I'm more interested in the browser part rather than the windowing part - I feel like it makes more sense that LLMs can be somewhat competent with windowing frameworks even if the prompter is not super experienced. Regardless, there's probably not a concise way to get what I'm looking for - instead, I'm looking forward to seeing your config/input! I'm super curious.
        • embedding-shape 17 hours ago
          Ah :) On the browser part, I've spent huge chunks of time inside of the browser viewport as a frontend engineer, also as a backend engineer and finally managing infrastructure, but never much inside browser internals and painting, layouting and that sort of stuff. I wouldn't even say that frontend performance (re trashing, re-calculating layouts, etc) is my forte, mostly been focusing on being able to mold codebases into something that doesn't turn into spaghetti after a year of various developers working on it.

          The prompts themselves were basically "I'd like this website to render correct: https://medium.com, here's how it looks for me in Firefox with JavaScript turned off: [Image], figure out what features are missing, add them one-by-one, add regression texts and follow REQUIREMENTS.md and AGENTS.md closely" and various iterations/variations of that, so I didn't expressively ask it to implement specific CSS/HTML features, as far as I can remember. Maybe the first 2-3 prompts I did, I'll upload all the session files in a viewable way so everyone can see for themselves what exactly went on :)

    • simonw 20 hours ago
      I have a fun little tool which runs the year-2000-era sloccount algorithm (which is Perl and C so I run it in WebAssembly) to estimate the time and cost of a project here: https://tools.simonwillison.net/sloccount

      If you paste https://github.com/embedding-shapes/one-agent-one-browser into the "GitHub Repository" tab it estimates 4.58 person-years and $618,599 by year-2000 standards, or 5.61 years and $1,381,079 according to my very non-trustworthy 2025 estimate upgrade.

      • pizlonator 14 hours ago
        I pasted a subset of the Fil-C source code into your tool and it says 6 person years. I just pasted the compiler pass and the obvious parts of the runtime.

        Note that I started the project in Nov 2023 and can only work on it maybe 1-2 hours a day because it's just a side project.

        So I think your tool either estimates based on very bad programmers, or it's just wrong. Or maybe 10x programmers are real and I am him

        • lifthrasiir 8 hours ago
          These metrics necessarily have to underestimate programmer skills because those are not directly controllable. If there is any sort of rigor in these metrics (i.e. I don't know if COCOMO is one of them) they will probably assume, say, a mundane programmer whose performance is worse than 90/95/99% of all other programmers.
        • simonw 12 hours ago
          Here's more about the COCOMO model it uses: https://dwheeler.com/sloccount/sloccount.html#cocomo
          • pizlonator 12 hours ago
            Sounds like nonsensical pseudoscience
            • simonw 2 hours ago
              I don't take those results very seriously myself, but have you seen anything better?
              • pizlonator 8 minutes ago
                No

                To me this is a case where knowing that you don't have data is better than having data and pretending it means anything

  • jacquesm 1 day ago
    This post is far more interesting than many others on the same subject, not because of what is built but because of how it it is built. There is a ton of noise on this subject and most of it seems to focus on the thing - or even on the author - rather than on the process, the constraints and the outcome.
    • embedding-shape 1 day ago
      Thanks, means a lot. As the author of one such article (that might have been the catalyst even), I'm guilty of this myself, and as I dove deeper into understanding what Cursor actually built, and what they think was the "success", the less sense everything made to me.

      That's why taking a step back and seeing what's actually hard in the process and bad with the output, felt like it made more sense to chase after, rather than anything else.

      • jacquesm 1 day ago
        I think the Cursor example is as bad as it gets and this is as good as it gets.

        FWIW I ran your binary and was pleasantly surprised, but my low expectations probably helped ;)

        • embedding-shape 23 hours ago
          I'm glad I could take people on a journey that first highlighted what absolutely sucks, to presenting something that seemingly people get pleasantly surprised by! Can't ask for more really :)
          • jacquesm 23 hours ago
            What is interesting is that yours is the first example of what this tech can do that resonates with me, the things I've seen posted so far do not pass the test for excitement, it's just slop and it tries to impress by being a large amount of slop. I've done some local experiments but the results were underwhelming (to put it mildly) even for tiny problems.

            The next challenge I think would be to prove that no reference implementation code leaked into the produced code. And finally, this being the work product of an AI process you can't claim copyright, but someone else could claim infringement so beware of that little loophole.

            • embedding-shape 22 hours ago
              Knowing you browse HN quite a lot (not that I'm not guilty of that too), that's some high praise! Thank you :)

              I think the focus with LLM-assisted coding for me has been just that, assisted coding, not trying to replace whole people. It's still me and my ideas driving (and my "Good Taste", explained here: https://emsh.cat/good-taste/), the LLM do all the things I find more boring.

              > prove that no reference implementation code leaked into the produced code

              Hmm, yeah, I'm not 100% sure how to approach this, open to ideas. Basic comparing text feels like it'd be too dumb, using an LLM for it might work, letting it reference other codebase perhaps. Honestly, don't know how I'd do that.

              > And finally, this being the work product of an AI process you can't claim copyright, but someone else could claim infringement so beware of that little loophole.

              Good point to be aware of, and I guess I by instinct didn't actually add any license to this project. I thought of adding MIT as I usually do, but I didn't actually make any of this so ended up not assigning any license. Worst case scenario, I guess most jurisdictions would deem either no copyright or that I (implicitly) hold copyright. Guess we'll take that if we get there :)

  • mwcampbell 22 hours ago
    Impressive work.

    I wonder if you've looked into what it would take to implement accessibility while maintaining your no-Rust-dependencies rule. On Windows and macOS, it's straightforward enough to implement UI Automation and the Cocoa NSAccessibility protocols respectively. On Unix/X11, as I see it, your options are:

    1. Implement AT-SPI with a new from-scratch D-Bus implementation.

    2. Implement AT-SPI with one of the D-Bus C libraries (GLib, libdbus, or sdbus).

    3. Use GTK, or maybe Qt.

  • sosodev 20 hours ago
    The browser works shockingly well considering it was created in 72 hours. It can render Wikipedia well enough to read and browse articles. With some basic form handling and browser standards (url bar, history, bookmarks, etc) it would be a viable way to consume text based content.
    • embedding-shape 20 hours ago
      I can't say my fingers (codex's fingers) haven't been itching to add some small features which would basically make it a viable browser for myself at least, for 90% of my browsing.

      But I think this is one of those experiments that I need to put a halt to sooner rather than later, because the scope can always grow, my mind really likes those sorts of projects, and I don't have the time for that right now :)

      • GaggiX 3 hours ago
        It would be really cool if it was able to render Wikipedia correctly, I really like the idea of a browser with minimal dependencies having the ability to navigate most static websites, this one for now compiles instantly and it's incredibly small.
        • embedding-shape 3 hours ago
          Yeah, my mind battled with what websites to use as examples for adding support, Wikipedia should have been an obvious one, that's on me!

          You're not the only one to say this, maybe there is a value in a minimal HTML+CSS browser that still works with the modern (non-JS using) web, although I'm not sure how much.

          Another idea I had, was to pile another experiment on top of this one, more about "N humans + N agents = one browser", in a collaborative fashion, lets see if that ends up happening :)

          • GaggiX 6 minutes ago
            Maybe you can divide the task into verifiable environments like an HTML5 parser environment where an agent is going to build the parser and also check the progress against a test suites (the https://github.com/html5lib/html5lib-tests in this case) and then write the API into a .md, the job of the human is going to be at the beginning to create the various environments where the agents are going to build the components from (and also how much it can be divided into standalone components).
  • dvrp 11 hours ago
    It's interesting to think that—independently of what you think of Cursor's browser implementation being truly "from scratch" or not—the fact that people are implementing browsers from scratch with agents happened because of Cursor's post. In other words, in a twisted and funny way, this browser exists because of Cursor's agent.

    This is how we should be thinking about AI safety!

    • embedding-shape 9 hours ago
      I mean I wanted to demonstrate further how wrong and misleading I think their initial blog post was so yeah, I made this because of what they said and marketed :)
  • rahimnathwani 1 day ago
    This is awesome. Would you be willing to share more about your prompts? I'm particularly interested in how you prompted it to get the first few things working.
    • embedding-shape 1 day ago
      Yes, I'm currently putting it all together and will make it public via the blog post. Just need to go through all of it first to ensure nothing secret/private leaks, will update once I've made it public.
  • hedgehog 20 hours ago
    This looks pretty solid. I think you can make this process more efficient by decomposing the problem into layers that are more easily testable, e.g. testing topological relationships of DOM elements after parse, then spatial after layout, then eventually pixels on things like ACID2 or whatever the modern equivalent is. The models can often come up with tests more accurately than they get the code right the first time. There are often also invariants that can be used to identify bugs without ground truth, e.g rendering the page with slightly different widths you can make some assertions about how far elements will move.
    • embedding-shape 20 hours ago
      > There are often also invariants that can be used to identify bugs without ground truth, e.g rendering the page with slightly different widths you can make some assertions about how far elements will move.

      That's really interesting and sounds useful! I'm wondering if there are general guidelines/requirements (not specific to browsers) that could kind of "trigger" those things in the agent, without explicitly telling it. I think generally that's how I try to approach prompting.

      • hedgehog 17 hours ago
        I think if you explain that general idea the models can figure it enough to write into an implementation plan, at least some of the time. Interesting problem though.
        • embedding-shape 16 hours ago
          > that general idea the models can figure it enough to write into an implementation plan

          I'm not having much luck with it, they get lost in their own designs/architectures all the time, even the best models (as far as I've tested stuff). But as long as I drive the design, things don't end up in a ball of spaghetti immediately.

          Still trying to figure out better ways of doing that, feels like we need to focus on tooling that lets us collaborate with LLMs better, rather than trying to replace things with LLMs.

          • hedgehog 15 hours ago
            Yeah, from what I can tell a lot of design ability is somewhere in the weights but the models don't regurgitate it without some coaxing. It may be related to the pattern where after generating some code you can instruct a model review it for correctness and it can find and fix many issues. Regarding tooling, there's a major philosophical divide between LLM maximalists that prefer the model to drive the "agentic" outer loop and what I'll call "traditionalists" that prefer control be run by algorithms more related to classical AI research. My personal suspicion is the second branch is greatly under-exploited but time will tell.
    • socalgal2 11 hours ago
      the modern equivalent is the Web Platform Tests

      https://web-platform-tests.org/

      • hedgehog 21 minutes ago
        Amazing. I think if I were taking on the build-a-browser project I would pair that with the WhatWG HTML spec to come up with a task list (based on the spec line-by-line) linked to specific tests associated with each task. Then of course need an overall architecture and behavioral spec for how the browser part behaves beyond just rendering. A developer steering process full time might be able to get within 80% parity of existing browsers in a month. It would be an interesting experiment.
        • embedding-shape 13 minutes ago
          > I would pair that with the WhatWG HTML spec

          I placed some specifications + WPT into the repository the agent had access to! https://github.com/embedding-shapes/one-agent-one-browser/tr...

          But judging by the session logs, it doesn't seem like the agent saw them, I never pointed it there, and seems none of the searches returned anything from there.

          I'm slightly curious in doing it from scratch again, but this time explicitly point it to the specifications, and see if it gets better or worse.

  • pulkas 21 hours ago
    The Mythical Man-Month, revisited
    • lelele 5 hours ago
      What do you mean?
      • embedding-shape 5 hours ago
        I think it means they'd like to have a baby with me, and the more agents we can add, the faster the baby can incubate. Usual stuff :)
  • avmich 21 hours ago
    Next thing would probably be an OS. With different APIs, the browser could be not constrained by existing standards. Generation of a good set of applications making working in the OS convenient - starting with GNU set? And then we can approach CPU architecture - again, without constraint to existing languages or instruction sets. That should be interesting to play with.
    • forgotpwd16 21 hours ago
      Someone has already done this: https://github.com/viralcode/vib-OS

      Also, someone made a similar comment not too long ago. So people surely are curious if this is possible. Kinda surprised this project's submission didn't got popular.

  • forgotpwd16 21 hours ago
    Impressive. Very nice. (Let's see Paul Allen's browser. /s) Can say is Brooks's law in action. What one human and one agent can do in 3d, one human and hundreds of agents can do in few weeks. A modern retake of the old joke.

    >without using any 3rd party libraries

    Seems to be an easier for coding agents to implement from scratch over using libraries.

  • barredo 20 hours ago
    The binaries are only around 1 MB for Linux, Mac and Windows. Very impressive https://github.com/embedding-shapes/one-agent-one-browser/re...
    • userbinator 16 hours ago
      "only around 1MB" is not particularly impressive in absolute terms... there are a few browsers which are the same or smaller, and yet more functional.

      https://tinyapps.org/network.html

      Of course, "AI-generated browser is 1MB" is neither here nor there.

      • embedding-shape 16 hours ago
        Tried building with some other arguments/configs, went from 1,2M on X11 to 664K, which seems to place it under Lynx (text-only, 714k) but above OffByOne (full HTML 3.2, 409k). Of course, my experiment barely implements anything from a "real" browser, so unfair comparison really.

        Neat collection of apps nonetheless, some really impressive stuff in there.

      • simonw 16 hours ago
        Do any of those handle CSS and SVG?
    • embedding-shape 20 hours ago
      Fun fact, not until someone mentioned how small the binaries did I notice! Fun little side-effect from the various constraints and requirements I set in the REQUIREMENTS.md I suppose.
  • storystarling 21 hours ago
    How did you handle the context window for 20k lines? I assume you aren't feeding the whole codebase in every time given the API costs. I've struggled to keep agents coherent on larger projects without blowing the budget, so I'm curious if you used a specific scoping strategy here.
    • simonw 21 hours ago
      GPT-5.2 has a 400,000 token context window. Claude Opus 4.5 is just 200,000 tokens. To my surprise this doesn't seem to limit their ability to work with much larger codebases - the coding agent harnesses have got really good at grepping for just the code that they need to have in-context, similar to how a human engineer can make changes to a million lines of code without having to hold it all in their head at once.
      • storystarling 20 hours ago
        That explains the coherence, but I'm curious about the mechanics of the retrieval. Is it AST-based to map dependencies or are you just using vector search? I assume you still have to filter pretty aggressively to keep the token costs viable for a commercial tool.
        • simonw 19 hours ago
          No vector search, just grep.
    • embedding-shape 19 hours ago
      I didn't, Codex (tui/cli) did, it does it all by itself. I have one REQUIREMENTS.md which is specific to the project, a AGENTS.md that I reuse across most projects, then I give Codex (gpt-5.2 with reasoning effort set to xhigh) a prompt + screenshot, tells it to get it to work somewhat similar, waits until it completes, reviewed that it worked, then continued.

      Most of the time when I develop professionally, I restart the session after each successful change, for this project, I initially tried to let one session go as long as possible, but eventually I reverted back to my old behavior of restarting from 0 after successful changes.

      For knowing what file it should read/write, it uses `ls`, `tree` and `ag ` most commonly, there is no out-of-band indexing or anything, just a unix shell controlled by a LLM via tool calls.

    • nurettin 21 hours ago
      You don't load the entire project into the context. You let the agent work on a few 600-800 line files one feature at a time.
      • storystarling 19 hours ago
        Right, but how does it know which files to pick? I'm curious if you're using a dependency graph or embeddings for that discovery step, since getting the agent to self-select the right scope is usually the main bottleneck.
        • embedding-shape 19 hours ago
          I gave you a more complete answer here: https://news.ycombinator.com/item?id=46787781

          > since getting the agent to self-select the right scope is usually the main bottleneck

          I haven't found this to ever be the bottleneck, what agent and model are you using?

        • nurettin 12 hours ago
          If you don't trigger the discovery agents, claude cli uses a search tool and greps 50-100 lines at a go. If discovery is triggered, claude sends multiple agents to the code with different tasks which return with overall architecture notes.
  • nenadg 10 hours ago
    >(no JS tho)

    this is a feature

  • rvz 22 hours ago
    > I'm going to upgrade my prediction for 2029: I think we're going to get a production-grade web browser built by a small team using AI assistance by then.

    That is Ladybird Browser if that was not already obvious.

  • madmaniak 11 hours ago
    But when I install Firefox or Chrome it's much faster, much better and also someone else's code. Also copied and pasted by machine. Just I don't claim it's mine.
  • tonyhart7 18 hours ago
    >one human

    >one agent

    >one browser

    >one million nvidia gpu

    • embedding-shape 17 hours ago
      Next time I'll do it on my GPU, then it'll be using just a 10K GPU, that's fine right?
  • deadbabe 18 hours ago
    This is not that impressive, there are numerous examples of browsers for training data to reference.
    • simonw 17 hours ago
      I don't buy this.

      It implies that the agents could only do this because they could regurgitate previous browsers from their training data.

      Anyone who's watched a coding agent work will see why that's unlikely to be what's happening. If that's all they were doing, why did it take three days and thousands of changes and tool calls to get to a working result?

      I also know that AI labs treat regurgitation of training data as a bug and invest a lot of effort into making it unlikely to happen.

      I recommend avoiding the temptation to look at things like this and say "yeah, that's not impressive, it saw that in the training data already". It's not a useful mental model to hold.

      • deadbabe 14 hours ago
        It took three days because... agents suck.

        But yes, with enough prodding they will eventually build you something that's been built before. Don't see why that's particularly impressive. It's in the training data.

        • simonw 13 hours ago
          Not a useful mental model.
          • deadbabe 13 hours ago
            It is useful. If you can whip up something complex fairly quickly with an AI agent, it’s likely because it’s already been done before.

            But if even the AI agent seems to struggle, you may be doing something unprecedented.

            • simonw 12 hours ago
              Except if you spend quality time with coding agents you realize that's not actually true.

              They're equally useful for novel tasks because they don't work by copying large scale patterns from their training data - the recent models can break down virtually any programming task to a bunch of functions and components and cobble together working code.

              If you can clearly define the task, they can work towards a solution with you.

              The main benefit of concepts already in the training data is that it lets you slack off on clearly defining the task. At that point it's not the model "cheating", it's you.

              • deadbabe 1 hour ago
                Good long lived software is not a bunch of functions and components cobbled together.

                You need to see the big picture and visions of the future state in order to ensure what is being built will be able to grow and breathe into that. This requires an engineer. An agent doesn’t think much about the future, they think about right now.

                This browser toy built by the agent, it has NO future. Once it has written the code, the story is over.

              • aix1 11 hours ago
                Simon, do you happen to have some concrete examples of a model doing a great job at a clearly novel, clearly non-trivial coding task?

                I'd find it very interesting to see some compelling examples along those line.

              • keybored 8 hours ago
                > Except if you spend quality time with coding agents you realize that's not actually true.

                Agent engineering seems to be (from the outside!) converging on quality lived experience. Compared to Stone Age manual coding it’s less about technical arguments and more about intuition.

                Vibes in short.

                You can’t explain sex to someone who has not had sex.

                Any interaction with tools is partly about intuition. It’s a difference of degree.

    • embedding-shape 17 hours ago
      Damn, ok, what should I attempt instead, that could impress even you?
      • anonymous908213 12 hours ago
        Actually good software that is suitable for mass adoption would go a long way to convincing a lot of people. This is just, yet another, proof-of-concept. Something which LLMs obviously can do, and which never seems to translate to real-world software people use. Parsing and rendering text is really not the hard part of building a browser, and there's no telling how closely the code mirrors existing open-source implementations if you aren't versed on the subject.

        That said, I think some credit is due. This is still a nice weekend project as far as LLMs go, and I respect that you had a specific goal in mind (showing a better approach than Cursor's nonsense, that gets better results in less time with less cost) and achieved it quickly and decisively. It has not really changed my priors on LLMs in any way, though. If anything it just confirms them, particularly that the "agent swarm" stuff is a complete non-starter and demonstrates how ridiculous that avenue of hype is.

        • embedding-shape 8 hours ago
          > Actually good software that is suitable for mass adoption would go a long way to convincing a lot of people.

          Yeah, that's obviously a lot harder, but doable. I've built it for clients, since they pay me, but haven't launch/made public something of my own, where I could share the code, I guess might be useful next project now.

          > This is just, yet another, proof-of-concept.

          It's not even a PoC, it's a demonstration of how far off the mark Cursor are with their "experiment" where they were amazed by what "hundreds of agents" build for week(s).

          > there's no telling how closely the code mirrors existing open-source implementations if you aren't versed on the subject

          This is absolutely true, I tried to get some better answers on how one could even figure that out here: https://news.ycombinator.com/item?id=46784990

    • usef- 16 hours ago
      What would be impressive to you?
      • deadbabe 14 hours ago
        A browser so unique and strange it is literally unlike anything we've ever seen to date, using entirely new UI patterns and paradigms.
  • mdavid626 10 hours ago
    What’s the point of this?
    • embedding-shape 9 hours ago
      What's the point of anything really?

      A more real answer: Read the first 6 words of the submission article.

      • mdavid626 4 hours ago
        It feels like this mentality is taking over the world. Porn instead of sex, short videos instead of real life interactions, AI generated code instead of software engineering, sugar/chemicals instead of food and so on...

        All, just to have fun.

        Very sad.

        • embedding-shape 4 hours ago
          > AI generated code instead of software engineering

          This is exactly what's wrong with Cursor's approach, and why we need better tools for collaborating, so we don't loose the engineering part of software development.

          I, just like you, am fucking tired of all the slop being constantly pushed as something great. This + my previous blog entries are all about pushing back on the slop.

          • mdavid626 1 hour ago
            This is exactly the problem. People use AI to generate projects and expect, that other people will celebrate and value them as they value similar human written projects.

            They fail to see where the value really is. They try to cheat the system, get admiration of others, but without putting in any value.

  • Imustaskforhelp 21 hours ago
    I feel like I have talked to Embedding-shape on Hackernews quite a lot that I recognize him. So it was a proud like moment when I saw his hackernews & github comments on a youtube video [0]about the recent cursor thing

    It's great to see him make this. I didn't know that he had a blog but looks good to me. Bookmarked now.

    I feel like although Cursor burned 5 million$, we saw that and now Embedding shapes takeaway

    If one person with one agent can produce equal or better results than "hundreds of agents for weeks", then the answer to the question: "Can we scale autonomous coding by throwing more agents at a problem?", probably has a more pessimistic answer than some expected.

    Effectively to me this feels like answering the query which was being what if we have thousands of AI agents who can build a complex project autonomously with no Human. That idea seems dead now. Humans being in the loop will have a much higher productivity and end result.

    I feel like the lure behind the Cursor project was to find if its able to replace humans completely in a extremely large project and the answer's right now no (and I have a feeling [bias?] that the answer's gonna stay that way)

    Emsh I have a question tho, can you tell me about your background if possible? Have you been involved in browser development or any related endeavours or was this a first new one for you? From what I can feel/have talked with you, I do feel like the answer's yes that you have worked in browser space but I am still curious to know the answer.

    A question which is coming to my mind is how much would be the difference between 1 expert human 1 agent and 1 (non expert) say Junior dev human 1 agent and 1 completely non expert say a normal person/less techie person 1 agent go?

    What are your guys prediction on it?

    How would the economics of becoming an "expert" or becoming a jack of all trades (junior dev) in a field fare with this new technology/toy that we got.

    how much productivity gains could be from 1 non expert -> junior dev and the same question for junior -> senior dev in this particular context

    [0] Cursor Is Lying To Developers… : https://www.youtube.com/watch?v=U7s_CaI93Mo

    • simonw 21 hours ago
      I don't think the Cursor thing was about replacing humans entirely.

      (If it was that's bad news for them as a company that sells tools to human developers!)

      It was about scaling coding agents up to much larger projects by coordinating and running them in parallel. They chose a web browser for that not because they wanted to build a web browser, but because it seemed like the ideal example of a well specified but enormous (million line+) project which multiple parallel agents could take on where a single agent wouldn't be able to make progress.

      embedding-shape's project here disproves that last bit - that you need parallel agents to build a competent web renderer - by achieving a more impressive result with just one Codex agent in a few days.

      • Imustaskforhelp 21 hours ago
        > I don't think the Cursor thing was about replacing humans entirely.

        I think how I saw things was that somehow Cursor was/is still targetted very heavily on vibe coding in a similar fashion of bolt.dev or lovable and I even saw some vibe coders youtube try to see the difference and honestly at the end Cursor had a preferable pricing than the other two and that's how I felt Cursor was.

        Of course Cursor's for the more techie person as well but I feel as if they would shift more and more towards Claude Code or similar which are subsidized by the provider (Anthropic) itself, something not possible for Cursor to do unless burning big B's which it already has done.

        So Cursor's growth was definitely towards the more vibe coders side.

        Now coming to my main point which is that I had the feeling that what cursor was trying to achieve wasn't trying to replace humans entirely but replace humans from the loop Aka Vibe coding. Instead of having engineers, if suppose the Cursor experiment was sucessful, the idea (which people felt when it was first released instantly) was that the engineering itself would've been dead & instead the jobs would've turned into management from a bird's eye view (not managing agent's individually or being aware of what they did or being in any capacity within the loop)

        I feel like this might've been the reason they were willing to burn 5 million$ for.

        If you could've been able to convince engineers considering browsers are taken as the holy grail of hardness that they are better off being managers, then a vibe coding product like Cursor would be really lucrative.

        Atleast that's my understanding, I can be wrong I usually am and I don't have anything against Cursor. (I actually used to use Cursor earlier)

        But the embedding shapes project shows that engineering is very much still alive and beneficial net. He produced a better result with very minimal costs than 5 million$ inference costs project.

        > embedding-shape's project here disproves that last bit - that you need parallel agents to build a competent web renderer - by achieving a more impressive result with just one Codex agent in a few days.

        Simon, I think that browsers got the idea of this autonomous agents partially because of your really famous post about how independent tests can lead to easier ports via agents. Browsers have a lot of independent tests.

        So Simon, perhaps I may have over-generalized but do you know of any ideas where the idea of parallel agents is actually good now that browsers are off, personally after this project, I can't really think of any. When the Cursor thing first launched or when I first heard of it recently, I thought that browsers did make sense for some reason but now that its out of the window, I am not sure if there are any other projects where massively parallel agents might be even net positive over 1 human + 1 agent as Emsh.

        • simonw 20 hours ago
          No, I'm still waiting to see concrete evidence that the "swarms of parallel agents" thing is worthwhile. I use sub-agents in Claude Code occasionally - for problems that are easily divided - and that works fine as a speed-up, but I'm still holding out for an example of a swarm of agents that's really compelling.

          The reason I got excited about the Cursor FastRender example was that it seemed like the first genuine example of thousands of agents achieving something that couldn't be achieved in another way... and then embedding-shapes went and undermined it with 20,000 lines of single-agent Rust!

          • Imustaskforhelp 19 hours ago
            Edit 2: looks like the project took literally the last token I had to create a big buggy implementation in golang haha!

            I kind of left the agents to do what they wanted just asking for a port.

            Your website does look rotated and the image is the only thing visible in my golang port.

            Let me open source it & I will probably try to hammer it some more after I wake up to see how good Kimi is in real world tasks.

            https://github.com/SerJaimeLannister/golang-browser

            I must admit that its not working right now and I am even unable to replicate your website that was able to first display even though really glitchy and image zoomed to now only a white although oops looks like I forgot the i in your name and wrote willson instead of willison as I wasn't wearing specs. Sorry about that

            Now Let me see yeah now its displaying something which is extremely glitchy

            https://github.com/SerJaimeLannister/golang-browser/blob/mai...

            I have a file to show how glitchy it is. I mean If anything I just want someone to tinker around with if a golang project can reasonably be made out of this rust project.

            Simon, I see that you were also interested in go vibe coding haha, this project has independent tests too! Perhaps you can try this out as well and see how it goes! It would be interesting to see stuff then!

            Alright time for me to sleep now, good night!

          • Imustaskforhelp 20 hours ago
            Haha yea, Me and emsh were actually talking about it on bluesky (which I saw after seeing your bluesky, I didn't know both you and emsh were on bsky haha)

            https://bsky.app/profile/emsh.cat/post/3mdgobfq4as2p

            But basically I got curious and you can see from my other comments on you how much I love golang so decided to port the project from rust to golang and emsh predicts that the project's codebase can even shrink to 10k!

            (although one point tho is that I don't have CC, I am trying it out on the recently released Kimi k2.5 model and their code but I decided to use that to see the real world use case of an open source model as well!)

            Edit: I had written this comment just 2 minutes before you wrote but then I decided to write the golang project

            I mean, I think I ate through all of my 200 queries in kimi code & it now does display me a (browser?) and I had the shell script as something to test your website as the test but it only opens up blank

            I am gonna go sleep so that the 5 hour limits can get recharged again and I will continue this project.

            I think it will be really interesting to see this project in golang, there must be good reason for emsh to say the project can be ~10k in golang.

            • embedding-shape 19 hours ago
              > I think it will be really interesting to see this project in golang, there must be good reason for emsh to say the project can be ~10k in golang.

              Oh no, don't read too much into my wild guesses! Very hunch-based, and I'm only human after all.

  • TalkWithAI 16 hours ago
    [dead]
  • augusteo 21 hours ago
    [flagged]
    • simonw 21 hours ago
      (This relates to my note at the end of https://simonwillison.net/2026/Jan/27/one-human-one-agent-on... )

      The things that make me think this is still a huge project include:

      1. JavaScript and the DOM. There's a LOT there, especially making sure that when the DOM is updated the page layout reflows promptly and correctly.

      2. Security. Browsers are an incredibly high-risk environment, especially once you start implementing JavaScript. There are a ton of complex specs involved here too, like CORS and CSP and iframe sandbox and so on. I want these to be airtight and I want solid demonstrations of how airtight they are.

      3. WebAssembly in its various flavors, including WebGPU and WebGL

      4. It has to be able to render the real Web - starting with huge and complex existing applications like Google Maps and Google Docs and then working through that long tail of weird old buggy websites that the other browsers have all managed to render.

      I expect that will keep people pretty busy for a while yet, no matter how many agents they throw at it.

      • augusteo 20 hours ago
        simonw replied to me! Achievement unlocked! Big fan!

        And yes there's definitely still a lot to do. Security is def a big one.

        Very exciting time to be alive.

    • Yoric 21 hours ago
      Erm... security?
    • croisillon 21 hours ago
      are all your comments written by AI? jfc
      • layer8 21 hours ago
        From his profile, this is scary:

        > I lead AI & Engineering at Boon AI (Startup building AI for Construction).

      • augusteo 20 hours ago
        I work in AI and wrote code with AI everyday. The robots haven't replaced me yet, but I'll let you know :)
      • penic 21 hours ago
        dude these people are deranged it's unreal