O1 isn't a chat model (and that's the point)

(latent.space)

165 points | by gmays 17 days ago

22 comments

  • geor9e 17 days ago
    Instead of learning the latest workarounds for the kinks and quirks of a beta AI product, I'm going to wait 3 weeks for the advice to become completely obsolete
    • gwern 17 days ago
      What people are discovering with the latest models is that often their errors are due to entirely reasonable choices and assumptions... which happen to be wrong in your specific case. They call a library you don't have installed, or something like that. Short of inventing either telepathy or spice which can allow LLMs to see the future, it will increasingly be the case that you cannot use the best models efficiently without giving them extensive context. Writing 'reports' where you dump in everything even tangentially relevant is the obvious way to do so, and so I would expect future LLMs to be even more so than o1-preview/pro.
      • sheepscreek 17 days ago
        I get much better output from o1* models when I dump a lot of context + leave a detailed but tightly scoped prompt with minimal ambiguity. Sometimes I even add - don’t assume, ask me if you are unsure. What I get back is usually very very high quality. To the point that I feel my 95th percentile coding skills have diminishing returns. I find that I am more productive researching and thinking about the what and leaving the how (implementation details) to the model - nudging it along.

        One last thing, anecdotally - I find that it’s often better to start a new chat after implementing a chunky bit/functionality.

        • gwern 15 days ago
          Yes, I've tried out both: ordering it to ask me questions upfront, and sometimes restarting with an edited 'report' and a prototype implementation for a 'clean start'. It feels like it sometimes helps... but I have no numbers or rigorous evidence on that.
      • bionhoward 16 days ago
        The economics of the deal over the long term are exponentially more critical than the performance in the short term, right?

        In that context, how is convincing intelligent people to pay OpenAI to help train their own replacements while agreeing not to compete with them anything but the biggest, dumbest, most successful nerd snipe in history?

        Dumping more context just implies getting brain raped even harder. Y’all are the horses paying to work at the glue factory. “Pro” users paying extra for that privilege, no thank you!

      • pizza 17 days ago
        Maximum likelihood training tinges, nay, corrupts, everything it touches. That’s before you pull apart the variously-typed maximum likelihood training processes that the artifice underwent..

        Your model attempts to give you a reasonably maximum likelihood output (in terms of kl-ball constrained preference distributions not too far from language), and expects you to be the maximum likelihood user (since its equilibriation is intended for the world in which you the user are just like the people who ended up in the training corpus) for which the prompt that you gave would be a maximum likelihood query (implying that there are times it’s better to ignore you-specific contingencies in your prompt to instead rather re-envision your question instead as being a noisily worded version of a more normal question).

        I think there are probably some ways to still use maximum likelihood but you switch out over the ‘what’ that is being assumed as likely - eg models that attenuate dominant response strategies as needed by the user, and easy ux affordances for the user to better and more fluidly align the model with their own dispositional needs.

        • cap11235 16 days ago
          MLE is a basic statistical technique. Feel free to go REEEE when Amazon recommends products.
          • pizza 15 days ago
            Exactly, it trivializes the actual structure of the real world problem setup. It’s ML’s spherical cow.
      • lumost 17 days ago
        Alternatively, we can standardize the environment. It takes humans weeks to adapt to a new interface or starting point. Why would ai be different?
    • raincole 17 days ago
      There was a debate over whether to integrate Stable Diffusion into the curriculum in a local art school here.

      Personally while I consider AI a useful tool, I think it's quite pointless to teach it in school, because whatever you learn will be obsolete next month.

      Of course some people might argue that the whole art school (it's already quite a "job-seeking" type, mostly digital painting/Adobe After Effect) will be obsolete anyway...

      • simonw 17 days ago
        The skill that's worth learning is how to investigate, experiment and think about these kinds of tools.

        A "Stable Diffusion" class might be a waste of time, but a "Generative art" class where students are challenged to explore what's available, share their own experiments and discuss under what circumstances these tools could be useful, harmful, productive, misleading etc feels like it would be very relevant to me, no matter where the technology goes next.

        • moritzwarhier 17 days ago
          Very true regarding the subjects of a hypothetical AI art class.

          What's also important is the teaching of how commercial art or art in general is conceptualized, in other words:

          What is important and why? Design thinking. I know that phrase might sound dated but that's the work what humans should fear being replaced on / foster their skills.

          That's also the line that at first seems to be blurred when using generative text-to-image AI, or LLMs in general.

          The seemingly magical connection between prompt and result appears to human users like the work of a creative entity distilling and developing an idea.

          That's the most important aspect of all creative work.

          If you read my reply, thanks Simon, your blog's an amazing companion in the boom of generative AI. Was a regular reader in 2022/2023, should revisit! I think you guided me through my first local LLama setup.

      • londons_explore 17 days ago
        All knowledge degrades with time. Medical books from the 1800's wouldn't be a lot of use today.

        There is just a different decay curve for different topics.

        Part of 'knowing' a field is to learn it and then keep up with the field.

      • dutchbookmaker 16 days ago
        I would say it really depends on the goal of art school.

        There is a creative purist idea that the best thing that can happen to an art student is to be thrown out of school early on before it ruins your creativity.

        If you put that aside, a stable diffusion art school class just sounds really cool to me as an elective class. Especially the group that would be in this class. The problem I find with these tools is they are overwhelmed by the average person, non-artist, making pictures of cats and darth vader so it is so hard to find what real artists are doing in the space.

      • dyauspitr 17 days ago
        Integrating it into the curriculum is strange. They should do one time introductory lectures instead.
      • swyx 17 days ago
        > whatever you learn will be obsolete next month

        this is exactly the kind of attitude that turns university courses into dinosaurs with far less connection to the “real world” industry than ideal. frankly its an excuse for laziness and luddism at this point. much of what i learned about food groups and economics and politics and writing in school is obsolete at this point, should my teachers not have bothered at all? out of what? fear?

        the way stable diffusion works hasn’t really changed, and in fact people have just built comfyui layers and workflows on top of it in the ensuing 3 years, and the more you stick your head in the sand because you already predetermined the outcome you are mostly piling up the debt that your students will have to learn on their own because you were too insecure to make a call without trusting that your students can adjust as needed

        • loktarogar 17 days ago
          The answer in formal education is probably somewhere in the middle. The stuff you learn shouldn't be obsolete by the time you graduate but at the same time they should be integrating new advancements sooner.

          The problem has also always been that those who know enough about cutting edge stuff are generally not interested in teaching for a fraction of what they can get doing the stuff.

    • thornewolf 17 days ago
      To be fair, the article basically says "ask the LLM for what you want in detail"
      • fullstackwife 17 days ago
        great advice, but difficult to apply given very small context window of o1 models
    • jameslk 17 days ago
      The churn is real. I wonder if so much churn due to innovation in a space can prevent enough adoption such that it actually reduces innovation
      • dartos 17 days ago
        It’s churn because every new model may or may not break strategies that worked before.

        Nobody is designing how to prompt models. It’s an emergent property of these models, so they could just change entirely from each generation of any model.

        • kyle_grove 17 days ago
          IMO the lack of real version control and lack of reliable programmability have been significant impediments to impact and adoption. The control surfaces are more brittle than say, regex, which isn’t a good place to be.

          I would quibble that there is a modicum of design in prompting; RLHF, DPO and ORPO are explicitly designing the models to be more promptable. But the methods don’t yet adequately scale to the variety of user inputs, especially in a customer-facing context.

          My preference would be for the field to put more emphasis on control over LLMs, but it seems like the momentum is again on training LLM-based AGIs. Perhaps the Bitter Lesson has struck again.

          • dartos 16 days ago
            I’d agree with your quibble.

            People are trying to design how to prompt, but it’s very different in both implementation and result than designing a programming language or a visual language, ofc.

      • miltonlost 17 days ago
        A constantly changing "API" coupled with a inherently unreliable output is not conducive to stable business.
        • ithkuil 17 days ago
          It's interesting that despite all these real issues you're pointing out a lot of people nevertheless are drawn to interact with this technology.

          It looks as if it touches some deep psychological lever: have an assistant that can help to carry out tasks that you don't have to bother learning the boring details of a craft.

          Unfortunately lead cannot yet be turned into gold

          • dartos 17 days ago
            > a lot of people nevertheless are drawn to interact with this technology.

            To look at this statement cynically, a lot of people are drawn to anything with billions of dollars behind it… like literally anything.

            Not to mention the amount companies spend on marketing AI products.

            > It looks as if it touches some deep psychological lever: have an assistant that can help to carry out tasks

            That deep lever is “make value more cheaply and with less effort”

            From what I’ve seen, most of the professional interest in AI is based on cost cutting.

            There are a few (what I would call degenerate) groups who believe there is some consciousness behind these AI, but theyre very small group.

            • pixl97 17 days ago
              When I was a kid I messed with computers because they were new, fun, and interesting. At the time I never realized they'd be my source of a living in the future.

              Current AI has brought back a lot of that wonder and interest for me, and I'm sure the same is true for a lot of other computer nerds.

              • dartos 16 days ago
                I’d consider myself one of those nerds. I’ve been in love with programming since I was 9 (20+ years ago now)

                I was mystified by LLMs a couple years ago. But after really understanding how they work and running into their limitations, a lot of that sheen was lost.

                There’s not a ton of interesting technology happening with LLM, more a ton of interesting math. (Math, especially linalg, is not the part of computer science I, personally, fell in love with.)

                The outputs of LLMs, unlike programming languages, is pretty random and trial and error based. There’s never any real skill or expertise being built by playing with these tools. My control over the output isn’t as direct or understandable as with programming.

                There’s no joy of discovery, only joy of getting the slot machine to give me what I want once in a while.

                I’ve regained a lot of that wonder, recently, by doing graphics programming and learning lisp. Going against industry trends in my recreational programming has helped the field feel fresh to me.

                Regardless, I don’t think the extreme minority of people who are truly nerdily passionate about tech are the “a lot of people” OC or I was talking about.

        • bbarnett 17 days ago
          Unless your business is customer service reps, with no ability to do anything but read scripts, who have no real knowledge of how things actually work.

          Then current AI is basically the same, for cheap.

          • dartos 17 days ago
            Many service reps do have some expertise in the systems they support.

            Once you get past the tier 1 incoming calls, support is pretty specialized.

            • bbarnett 17 days ago
              Many service reps do have some expertise in the systems they support.

              I said "Unless your business is customer service reps, with...". It's a conditional. It doesn't mean all service reps are clueless, or scripted.

              But I bet you've encountered the ones that are!

    • QuantumGood 17 days ago
      Great summary of how AI compresses the development (and hype) product cycle
    • AbstractH24 16 days ago
      Out of curiosity, where do you look for the advice?
    • icpmacdo 17 days ago
      Modern AI both shortens the useful lifespan of software and increases the importance of development speed. Waiting around doesn’t seem optimal right now.
  • goolulusaurs 17 days ago
    The reality is that o1 is a step away from general intelligence and back towards narrow ai. It is great for solving the kinds of math, coding and logic puzzles it has been designed for, but for many kinds of tasks, including chat and creative writing, it is actually worse than 4o. It is good at the specific kinds of reasoning tasks that it was built for, much like alpha-go is great at playing go, but that does not actually mean it is more generally intelligent.
    • madeofpalk 17 days ago
      LLMs will not give us "artificial general intelligence", whatever that means.
      • righthand 17 days ago
        AGI currently is an intentionally vague and undefined goal. This allows businesses to operate towards a goal, define the parameters, and relish in the “rocket launches”-esque hype without leaving the vague umbrella of AI. It allows businesses to claim a double pursuit. Not only are they building AGI but all their work will surely benefit AI as well. How noble. Right?

        It’s vagueness is intentional and allows you to ignore the blind truth and fill in the gaps yourself. You just have to believe it’s right around the corner.

        • pzs 17 days ago
          "If the human brain were so simple that we could understand it, we would be so simple that we couldn’t." - without trying to defend such business practice, it appears very difficult to define what are necessary and sufficient properties that make AGI.
          • pizza 17 days ago
            What about if the human brain were so complex that we could be complex enough to understand it?
      • swalsh 17 days ago
        In my opinion it's probably closer to real agi then it's not. I think the missing piece is learning after the pretraining phase.
      • UltraSane 17 days ago
        An AGI will be able to do any task any humans can do. Or all tasks any human can do. An AGI will be able to get any college degree.
        • layer8 17 days ago
          > any task any humans can do

          That doesn’t seem accurate, even if you limit it to mental tasks. For example, do we expect an AGI to be able to meditate, or to mentally introspect itself like a human, or to describe its inner qualia, in order to constitute an AGI?

          Another thought: The way human perform tasks is affected by involuntary aspects of the respective individual mind, in a way that the involuntariness is relevant (for example being repulsed by something, or something not crossing one’s mind). If it is involuntary for the AGI as well, then it can’t perform tasks in all the different ways that different humans would. And if it isn’t involuntary for the AGI, can it really reproduce the way (all the ways) individual humans would perform a task? To put it more concretely: For every individual, there is probably a task that they can’t perform (with a specific outcome) that however another individual can perform. If the same is true for an AGI, then by your definition it isn’t an AGI because it can’t perform all tasks. On the other hand, if we assume it can perform all tasks, then it would be unlike any individual human, which raises the question of whether this is (a) possible, and (b) conceptually coherent to begin with.

          • pixl97 17 days ago
            The biggest issue with AGI is how poorly we've described GI up until now.

            Moreso, I see an AI that can do any (intelligence) task a human can will be far beyond human capabilities because even individual humans can't do everything.

            • UltraSane 16 days ago
              One AI being able to do every task every human can do would be superhuman. But it is much more likely that at least at first AIs would be customized to narrower skill sets like Mathematician or programmer or engineer due to resource limitations.
          • pizza 17 days ago
            > For example, do we expect an AGI to be able to meditate, or to mentally introspect itself like a human, or to describe its inner qualia, in order to constitute an AGI?

            Do you mind sharing the kinds of descriptive criteria for these behaviors that you are envisioning for which there is overlap with the general assumption of them occurring in a machine? I can foresee a sort of “featherless biped” scenario here without more details about the question.

          • philwelch 17 days ago
            > For example, do we expect an AGI to be able to meditate, or to mentally introspect itself like a human, or to describe its inner qualia, in order to constitute an AGI?

            How would you know if it could? How do you know that other human beings can? You don’t.

          • madeofpalk 17 days ago
            > For example, do we expect an AGI to be able to meditate, or to mentally introspect itself like a human, or to describe its inner qualia, in order to constitute an AGI?

            ...Yes. This is what I think 'most' people consider a real AI to be.

            • perfmode 17 days ago
              Meditation requires Beingness.
              • adrianN 17 days ago
                Neither of those words have a sufficiently precise meaning that you could tell whether a human has/does it or not.
                • perfmode 16 days ago
                  You’re right that ‘Beingness’ and ‘meditation’ are hard to define with precision, but the essence of meditation isn’t about external markers—it’s about an inner, subjective awareness of presence that can’t be fully reduced to objective measures.
        • nkrisc 17 days ago
          So it’s not an AGI if it can’t create an AGI?
          • UltraSane 17 days ago
            Humans might create AGI without fully understanding how.
            • ithkuil 17 days ago
              Thus a machine can solve tasks without "understanding" them
              • pixl97 17 days ago
                Humans do this all the time to.
                • ithkuil 16 days ago
                  Yes that was my point.
        • qup 16 days ago
          You can't do any task humans can do
      • Xmd5a 16 days ago
        No but they gave us GAI. The fact they flipped the frame problem(s) upside down is remarkable but not often discussed.
      • nurettin 17 days ago
        I think it means a self-sufficient mind, which LLMs inherently are not.
        • ben_w 16 days ago
          What is "self-sufficient" in this case?

          Lots of debate since ChatGPT and Stable Diffusion can be summarised as A: "AI cheated by copying humans, it just mixes the bits up really small like a collage" B: "So like humans learning from books and studying artists?" A: "That doesn't count, it's totally different"

          Even though I am quite happy to agree that differences exist, I have yet to see a clear answer as to what about people even mean when asserting that AI learning from books is "cheating" given that it's *mandatory* for humans in most places.

          • nurettin 14 days ago
            I just think that language is a big part of the puzzle, but it is not the only one. Simply generating tokens may sometimes look like thought, but as you feed the output back at itself, it quickly devolves into repeating nonsense and looks nothing like introspection. Self-sufficiency would reliably form new ideas and angles.
      • swyx 17 days ago
        it must be wonderful to live life with such supreme unfounded confidence. really, no sarcasm, i wonder what that is like. to be so sure of something when many smarter people are not, and when we dont know how our own intelligence fully works or evolved, and dont know if ANY lessons from our own intelligence even apply to artificial ones.

        and yet, so confident. so secure. interesting.

        • sandspar 17 days ago
          Social media doesn't punish people for overconfidence. In fact social media rewards people's controversial statements by giving them engagement - engagement like yours.
    • adrianN 17 days ago
      So-so general intelligence is a lot harder to sell than narrow competence.
    • kilroy123 17 days ago
      Yes, I don't understand their ridiculous AGI hype. I get it you need to raise a lot of money.

      We need to crack the code for updating the base model on the fly or daily / weekly. Where is the regular learning by doing?

      Not over the course of a year, spending untold billions to do it.

      • tomohelix 17 days ago
        Technically, the models can already learn on the fly. Just that the knowledge it can learn is limited to the context length. It cannot, to use the trendy word, "grok" it and internally adjust the weights in its neural network yet.

        To change this you would either need to let the model retrain itself every time it receives new information, or to have such a great context length that there is no effective difference. I suspect even meat models like our brains is still struggling to do this effectively and need a long rest cycle (i.e. sleep) to handle it. So the problem is inherently more difficult to solve than just "thinking". We may even need an entire new architecture different from the neural network to achieve this.

        • chikere232 17 days ago
          > Technically, the models can already learn on the fly. Just that the knowledge it can learn is limited to the context length.

          Isn't that just improving the prompt to the non-learning model?

        • mike_hearn 17 days ago
          Google just published a paper on a new neural architecture that does exactly that, called Titans.
        • KuriousCat 17 days ago
          Only small problem is that models are neither thinking nor understanding, I am not sure how this kind of wording is allowed with these models.
          • ben_w 16 days ago
            All words only gain meaning through common use: where two people mean different things by some word, we influence each other until we're in agreement.

            Words about private internal state don't get feedback about what they actually are on the inside, just about what they look like on the outside* — "thinking" and "understanding" map to what AI give the outward impression of, even if the inside is different in whatever ways you regard as important.

            * This is also how people with aphantasia keep reporting their surprise upon realising that scenes in films where a character is imagining something are not merely artistic license.

      • ninetyninenine 17 days ago
        I understand the hype. I think most humans understand why a machine responding to a query like never before in the history of mankind is amazing.

        What you’re going through is hype overdose. You’re numb to it. Like I can get if someone disagrees but it’s a next level lack of understanding human behavior if you don’t get the hype at all.

        There exists living human beings who are still children or with brain damage with comparable intelligence to an LLM and we classify those humans as conscious but we don’t with LLMs.

        I’m not trying to say LLMs are conscious but just saying that the creation of LLMs marks a significant turning point. We crossed a barrier 2 years ago somewhat equivalent to landing on the moon and i am just dumb founded that someone doesn’t understand why there is hype around this.

        • bbarnett 17 days ago
          The first plane ever flies, and people think "we can fly to the moon soon!".

          Yet powered flight has nothing to do with space travel, no connection at all. Gliding in the air via low/high pressure doesn't mean you'll get near space, ever, with that tech. No matter how you try.

          AI and AGI are like this.

          • dTal 17 days ago
            And yet, the moon was reached a mere 66 years after the first powered flight. Perhaps it's a better heuristic than you are insinuating...

            In all honesty, there are lots of connections between powered flight and space travel. Two obvious ones are "light and strong metallurgy" and "a solid mathematical theory of thermodynamics". Once you can build lightweight and efficient combustion chambers, a lot becomes possible...

            Similarly, with LLMs, it's clear we've hit some kind of phase shift in what's possible - we now have enough compute, enough data, and enough know-how to be able to copy human symbolic thought by sheer brute-force. At the same time, through algorithms as "unconnected" as airplanes and spacecraft, computers can now synthesize plausible images, plausible music, plausible human speech, plausible anything you like really. Our capabilities have massively expanded in a short timespan - we have cracked something. Something big, like lightweight combustion chambers.

            The status quo ante is useless to predict what will happen next.

            • bbarnett 17 days ago
              By that metric, there are lots of connections between space flight and any other aspect of modern society.

              No plane, relying upon air pressure to fly, can ever use that method to get to the moon. Ever. Never ever.

              If you think it is, you're adding things to make a plane capable of space flight.

              • dTal 17 days ago
                >By that metric, there are lots of connections between space flight and any other aspect of modern society.

                Indeed. But there's a reason "aerospace" is a word.

                >No plane, relying upon air pressure to fly, can ever use that method to get to the moon

                No indeed. But if you want to build a moon rocket, the relevant skillsets are found in people who make airplanes. Who built Apollo? Boeing. Grumman. McDonnell Douglas. Lockheed.

              • mlyle 17 days ago
                I feel like aeronautics and astronautics are deeply connected. Both depend upon aerodynamics, 6dof control, and guidance in forward flight. Advancing aviation construction techniques were the basis of rockets, etc.

                Sure, rocketry to LEO asks more in strength of materials, and aviation doesn’t require liquid fueled propulsion or being able to control attitude in vacuum.

                These aren’t unconnected developments. Space travel grew straight out of aviation and military aviation. Indeed, look at the vertical takeoff aircraft from the 40s and 50s, evolving into missile systems with solid propulsion and then liquid propulsion.

                • bbarnett 17 days ago
                  AGI may use the same hardware, or same compute concepts.

                  But LLMs (like low/high pressure wing flight) will never result in AGI (you won't get to the moon with a wing).

                  You're making my point.

                  • mlyle 17 days ago
                    I thought your point was terrible about aerospace. And since you're insisting I follow you further into the analogy, I think it's terrible here.

                    LLMs may be a key building block for early AGI. The jury is still out. Will a LLM alone do it? No. You can't build a space vehicle from fins and fairings and control systems alone.

                    O1 can reach pretty far beyond past LLM capabilities by adding infrastructure for metacognition and goal seeking. Is O1 the pinnacle, or can we go further?

                    In either case, planes and rocket-planes did a lot to get us to space-- they weren't an unrelated evolutionary dead end.

                    > Yet powered flight has nothing to do with space travel, no connection at all.

                    Fully disagree.

                    • bbarnett 15 days ago
                      You're missing the point, I think.

                      The relationships you are describing are why airflight/spaceflight and AI/AGI are a good comparison.

                      We will never get AGI from an LLM. We will never fly to the moon via winged flight. These are examples of how one method of doing a thing, will never succeed in another.

                      Citing all the similarities between airflight and spaceflight makes my point! One may as well discuss how video games are on a computer platform, and LLMs are on a computer platform, and say "It's the same!", as say airflight and spaceflight are the same.

                      Note how I was very clear, and very specific, and referred to "winged flight" and "low/high pressure", which will never, ever, ever get one even to space. Nor allow anyone to navigate in space. There is no "lift" in space.

                      Unless you can describe to me how a fixed wing with low/high pressure is used to get to the moon, all the other similarities are inconsequential.

                      Good grief, people are blathering on about metallurgy. That's not a connection, it's just modern tech, has nothing to do with the method of flying (low/high pressure around the wing), and is used in every industry.

                      I love how incapable everyone has been in this thread of concept focus, incapable of separating the specific from the generic. It's why people think, generically, that LLMs will result in AGI, too. But they won't. Ever. No amount of compute will generate AGI via LLM methods.

                      LLMs don't think, they don't reason, they don't infer, they aren't creative, they come up with nothing new, it's easiest to just say "they don't".

                      One key aspect here is that knowledge has nothing to do with intelligence. A cat is more intelligent than any LLM that will ever exist. A mouse. Correlative fact regurgitation is not what intelligence is, any more than a book on a shelf is intelligence, or the results of Yahoo search 10 years ago were.

                      The most amusing is when people mistake shuffled up data output from an LLM as "signs of thought".

                  • qup 16 days ago
                    Your point is good enough any spaceflight, despite some quibbling from commenters.

                    But I haven't seen where you make a compelling argument why it's the same thing in AI/AGI.

                    In your old analogy, we're all still the guys on the ground saying it'll work. You're saying it won't. But nobody has "been to space" yet. You have no idea if LLMs will take us to AGI.

                    I personally think they'll be the engine on the spaceship.

                    • bbarnett 15 days ago
                      This is a fair response, thank you.

                      From another post:

                      No amount of compute will generate AGI via LLM methods.

                      LLMs don't think, they don't reason, they don't infer, they aren't creative, they come up with nothing new, it's easiest to just say "they don't".

                      One key aspect here is that knowledge has nothing to do with intelligence. A cat is more intelligent than any LLM that will ever exist. A mouse. Correlative fact regurgitation is not what intelligence is, any more than a book on a shelf is intelligence, or the results of Yahoo search 10 years ago were.

                      The most amusing is when people mistake shuffled up data output from an LLM as "signs of thought".

                      From where I sit, I don't even see LLMs as being some sort of memory store for AGIs even. The knowledge isn't reliable enough. An AGI would need to ingress and then store knowledge in its own mind, not use an LLM as a reference.

                      Part of what makes intelligence, intelligent, is the ability to see information and learn on the spot. And further to learn via its own senses.

                      Let's look at bats. A bat is very close to humans, genetically. Yet if somehow we took "bat memories", and were able to implant them in humans, how on earth would that help? How do you use bat memories of using sound for navigation, to "see" work? Of flying? Of social structure?

                      For example, we literally don't have them brain matter to see spatially the same way bats do. So when access those memories, they would be so foreign, that their usefulness is greatly reduced. They'd be confusing, unhelpful.

                      Think of it. Ingress of data and information is sensorially derived. Our mental image of the world depends upon this data. Our core being is built upon this foundation. An AGI using an LLM as "memories" would be experiencing something just as foreign.

                      So even if LLMs were used to allow an AGI to query things, it wouldn't be used as "memory". And the type of memory store that LLMs exhibit, is most certainly not how intelligence as we know it stores memory.

                      We base our knowledge upon directly observed and verified fact, but further based upon the senses we have. And all information derived from those senses is actually filtered, and processed by specialized parts of our brains, before we even "experience" it.

                      Our knowledge is so keyed in and tailored directly to our senses, and the processing of that data, that there is no way to separate the two. Our skill, experience, and capabilities are "whole body".

                      An LLM is none of this.

                      The only true way to create an AGI via LLMs would be to simulate a brain entirely, and then start scanning human brains during specific learning events. Use that data to LLM your way into an averaged and probabilistic mesh, and then use that output to at least provide full sense memory input to an AGI.

                      Even so, I suspect that may be best used to create a reliable substrate. Use that method to simulate and validate and modify that substrate so it is capable of using such data, thereby verifying that it stands solid as a model for an AGI's mind.

                      Then wipe and allow learning to begin entirely separately.

                      Yet to do even this, we'd need to ensure that sensor input at least to a degree enables the same sort of sense input. I think that Neuralink might be best in play to enable this, for as it works at creating an interface for, say, sight, and other senses... it could then use this same series of mapped inputs for a simulated human brain.

                      This of course works best with a physical form to also taste the environment around it, and who also is working on an actual android for day to day use?

                      You might say this focuses too much on creating a human style AGI, but frankly it's the only thing we can try to make and work into creating a true AGI. We have no other real world examples of intelligence to use, and every brain on the planet is part of the same evolutionary tree.

                      So best to work with something we know, something we're getting more and more apt at understanding, and with brain implants of the calibre and quality that neurolink is devising, something we can at least understand in far more depth than ever before.

                      • mlyle 15 days ago
                        > The first plane ever flies, and people think "we can fly to the moon soon!". Yet powered flight has nothing to do with space travel, no connection at all.

                        You eventually said winged flight much later-- trying to make your point a little more defensible. That's why I started explaining to you the very big connections between powered flight and space travel ;)

                        I pretty much completely disagree with your wall of text, and it's not a very well reasoned defense of your prior handwaving. I'm going to move on now.

                        • bbarnett 15 days ago
                          Yet powered flight has nothing to do with space travel, no connection at all. Gliding in the air via low/high pressure doesn't mean you'll get near space, ever, with that tech. No matter how you try.

                          Winged flight == "low/high pressure" flight, it's how an airplane wing works and provides lift.

                          • mlyle 15 days ago
                            Maybe you just said what you wanted to say extremely poorly. Like "wing technology doesn't get you closer to space." I mean, of course, fins and distribution of pressure are important, but a relatively small piece.

                            On the other hand, powered flight and the things we started building for powered flight got us to the moon. "Powered flight" got us to turbojets, and turbomachinery is the number one key space launch technology.

                            Anyways, bye.

                            • bbarnett 15 days ago
                              Maybe you just said what you wanted to say extremely poorly.

                              Or maybe you didn't read closely? You claimed I didn't mention winged flight, yet I mentioned that and the method of winged flight. Typically, that means you say "Oh, sorry, I missed that" instead of blaming others.

                              I have refuted technology paths in prior posts. Refute those comments if you wish, but just restating your position without refuting mine doesn't seem like it will go anywhere.

                              And if you don't want a reply? Just stop talking. Don't play the "Oh, I'm going to say things, then say 'bye' to induce no response" game.

                              Just debate fairly.

                              • mlyle 14 days ago
                                You gave a big wall of text. You made statements that can't really be defended. If you'd been talking just about wings, you could have made that clear (and not in one possible reading of a sentence that follows an absolutist one).

                                > Just debate fairly.

                                The thing I felt like responding to, you were like "noooo, i didn't mean that at all.

                                > > > > > Yet powered flight has nothing to do with space travel, no connection at all.

                                Pretty absolute statement.

                                > > > > > Gliding in the air via low/high pressure doesn't mean you'll get near space, ever, with that tech.

                                Then, I guess you're saying this sentence is trying to restrict it to "airfoils aren't enough to go to space", and not talk about how powered flight lead directly to space travel... Through direct evolution of propulsion (turbo-machinery), control, construction techniques, analysis methods, and yes, airfoils.

                                I guess we can stay here debating the semantics of what you originally said if you really want to keep talking. But since you're walking away from what I saw as your original point, I'm not sure what you see as productive to say.

          • Xmd5a 16 days ago
            The first plane to ever fly was in fact ignored by the general public for several years

            https://bigthink.com/the-past/wright-brothers-ignored/

          • ninetyninenine 17 days ago
            That’s not true. There was not endless hype about flying to the moon when the first plane flew.

            People are well aware of the limits of LLMs.

            As slow as the progress is, we now have metrics and measurable progress towards agi even when there are clear signs of limitations on LLMs. We never had this before and everyone is aware of this. No one is delusional about it.

            The delusion is more around people who think other people are making claims of going to the moon in a year or something. I can see it in 10 to 30 years.

            • bbarnett 17 days ago
              That’s not true. There was not endless hype about flying to the moon when the first plane flew.

              I didn't say there was endless hype, I gave an example of how one technology would never result in another... even if to a layperson it seems connected.

              (The sky, and the moon, are "up")

              People are well aware of the limits of LLMs.

              Surely you mean "Some people". Because the point in this thread is that there is a lot of hype, and FOMO, and "OMG AGI!" chatter running around LLMs. Which will never ever make AGI.

              • ninetyninenine 17 days ago
                You said you didn’t comprehend why there was hype and I explained why there was hype.

                Then you made an analogy and I said your analogy is irrelevant because nobody thinks LLMs are agi nor do they think agi is coming out of LLMs this coming year.

                • bbarnett 15 days ago
                  Actually, plenty of people think LLMs will result in AGI. That's what the hype is about, because those same people think "any day now". People are even running around saying that LLMs are showing signs of independent thought, absurd as it is.

                  And hype doesn't mean "this year" regardless.

                  Anyhow, I don't think we'll close this gap between our assessment.

          • pizza 17 days ago
            And yet, the overall path of unconcealment of science and technological understanding definitely traces a line that goes from the Wright brothers to Vostok 1. There is no reason to think a person from the time of the Wright brothers would find it to be a simple one easily predicted by the methods of their times, but I doubt that no person who worked on Vostok 1 would say that their efforts were epochally unrelated to the efforts of the Wright brothers.
    • golol 17 days ago
      This is kind if true. I feel like the reasoning power if O1 is really only truly available on the kinds of math/coding tasks it was trained on so much.
    • raincole 17 days ago
      Which sounds like... a very good thing?
  • samrolken 17 days ago
    I have a lot of luck using 4o to build and iterate on context and then carry that into o1. I’ll ask 4o to break down concepts, make outlines, identify missing information and think of more angles and options. Then at the end, switch on o1 which can use all that context.
  • ttul 17 days ago
    FWIW: OpenAI provides advice on how to prompt o1 (https://platform.openai.com/docs/guides/reasoning/advice-on-...). Their first bit of advice is to, “Keep prompts simple and direct: The models excel at understanding and responding to brief, clear instructions without the need for extensive guidance.”
    • jmcdonald-ut 17 days ago
      The article links out to OpenAI's advice on prompting, but it also claims:

          OpenAI does publish advice on prompting o1, 
          but we find it incomplete, and in a sense you can
          view this article as a “Missing Manual” to lived
          experience using o1 and o1 pro in practice.
      
      To that end, the article does seem to contradict some of the advice OpenAI gives. E.g., the article recommends stuffing the model with as much context as possible... while OpenAI's docs note to include only the most relevant information to prevent the model from overcomplicating its response.

      I haven't used o1 enough to have my own opinion.

      • irthomasthomas 17 days ago
        Those are contradictory. Openai claim that you don't need a manual, since O1 performs best with simple prompts. The author claims it performs better with more complex prompts, but provides no evidence.
        • Terretta 16 days ago
          The claims are not contradictory.

          They are bimodal.

          Bottom 20% of users can't prompt because they don't understand what they're looking for or couldn't describe it well if they did. This model handles them asking briefly, then breaks it down, seeks implications, and prompts itself. OpenAI's How to Prompt is for them.

          Top 20% of users understand what they're looking for and how to frame and contextualize well. The article is for them.

          The middle 60%, YMMV. (But in practice, they're probably closer to bottom 20 in not knowing how to get the most from LLMs, so the bottom 20 guide saves typing.)

          • irthomasthomas 16 days ago
            I'm not saying it won't work. I'm just asking for evidence. You don't think its strange that none of the authors or promoters of this idea provided any evals? Not even a small sample of prompt/response pairs that demonstrate the benefit of this method?
        • orf 17 days ago
          In case you missed it

              OpenAI does publish advice on prompting o1, 
              but we find it incomplete, and in a sense you can
              view this article as a “Missing Manual” to lived
              experience using o1 and o1 pro in practice.
          
          
          The last line is important
          • irthomasthomas 17 days ago
            But extraordinary claims require extraordinary proof. Openai tested the model for months and concluded that simple prompts are best. The author claims that complex prompts are best, but cites no evidence.
            • threatripper 16 days ago
              Requiring only simple prompts surely sells better. I would not assume the documentation provided by OpenAI is totally unbiased and independent of business goals.
            • orf 17 days ago
              I find it surprising that you think documentation issues are “extraordinary”.

              You have read literally any documentation before, right?

            • ttul 17 days ago
              I mean, OpenAI not only tested the model, they literally trained the model. Training a model involves developing evaluations for the model. It’s a gargantuan effort. I’m fairly certain that OpenAI is the authority on how to prompt o1.
    • yzydserd 17 days ago
      I think there is a distinction between “instructions”, “guidance” and “knowledge/context”. I tend to provide o1 pro with a LOT of knowledge/context, a simple instruction, and no guidance. I think TFA is advocating same.
    • chikere232 17 days ago
      So in a sense, being an early adopter for the previous models makes you worse at this one?
    • wahnfrieden 17 days ago
      The advice is wrong
    • 3abiton 17 days ago
      But the way they did their PR for O1 made it sound like it was the next step, while in reality it was a side step. A branching from the current direction towards AGI.
  • isoprophlex 17 days ago
    People agreeing and disagreeing about the central thesis of the article, which is fine because i enjoy the discussion...

    no matter where you stand in the specific o1/o3 discussion the concept of "question entropy" is very enlightening.

    what is the question of theoretical minimum complexity that still solves your question adequately? or for a specific model, are its users capable of supplying the minimum required intellectual complexity the model needs?

    Would be interesting to quantify these two and see if our models are close to converging on certain task domains.

    • dutchbookmaker 16 days ago
      Good stuff.

      I am going to try to start measuring my prompts.

      Thinking about it, I am not sure what the entropy is for the above vs "start measuring prompts".

      • isoprophlex 15 days ago
        That's a tough one, I'm not sure how to get a quantifyable number on it except for painstakingly ablating a prompt until the answer you get becomes significantly degraded.

        But then still, how do you measure how much you have ablated your prompt? How do you measure objectively how badly the answer has degraded?

  • martythemaniak 17 days ago
    One thing I'd like to experiment with is "prompt to service". I want to take an existing microservice of about 3-5kloc and see if I can write a prompt to get o1 to generate the entire service, proper structure, all files, all tests, compiles and passes etc. o1 certainly has the context window to do this at 200k input and 100k output - code is ~10 tokens per line of code, so you'd need like 100k input and 50k output tokens.

    My approach would be:

    - take an exemplar service, dump it in the context

    - provide examples explaining specific things in the exemplar service

    - write a detailed formal spec

    - ask for the output in JSON to simplify writing the code - [{"filename":"./src/index.php", "contents":"<?php...."}]

    The first try would inevitably fail, so I'd provide errors and feedback, and ask for new code (ie complete service, not diffs or explanations), plus have o1 update and rewrite the spec based on my feedback and errors.

    Curious if anyone's tried something like this.

  • swyx 17 days ago
    coauthor/editor here!

    we recorded a followup conversation after the surprise popularity of this article breaking down some more thoughts and behind the scenes: https://youtu.be/NkHcSpOOC60?si=3KvtpyMYpdIafK3U

    • cebert 17 days ago
      Thanks for sharing this video, swyx. I learned a lot from listening to it. I hadn’t considered checking prompts for a project into source control. This video has also changed my approach to prompting in the future.
      • swyx 17 days ago
        thanks for watching!

        “prompts in source control” is kinda like “configs in source control” for me. recommended for small projects, but at scale eventually you wanna abstract it out into some kind of prompt manager software for others to use and even for yourself to track and manage over time. git isnt the right database for everything.

    • dutchbookmaker 16 days ago
      Great stuff, thanks for this.
  • keizo 17 days ago
    I made a tool for manually collecting context. I use it when copying and pasting multiple files is cumbersome: https://pypi.org/project/ggrab/
    • franze 17 days ago
      i creates thisismy.franzai.com for the same reason
  • patrickhogan1 17 days ago
    The buggy nature of o1 in ChatGPT is what prevents me from using it the most.

    Waiting is one thing, but waiting to return to a prompt that never completes is frustrating. It’s the same frustration you get from a long running ‘make/npm/brew/pip’ command that errors out right as it’s about to finish.

    One pattern that’s been effective is

    1. Use Claude Developer Prompt Generator to create a prompt for what I want.

    2. Run the prompt on o1 pro mode

  • swalsh 17 days ago
    Work with chat bots like a junior dev, work with o1 like a senior dev.
  • inciampati 17 days ago
    o1 appears to not be able to see it's own reasoning traces. Or it's own context is potentially being summarized to deal with the cost of giving access to all those chain of thought traces and the chat history. This breaks the computational expressivity or chain of thought, which supports universal (general) reasoning if you have reliable access to the things you've thought, and is threshold circuit (TC0) or bounded parallel pattern matcher when not.
    • PoignardAzur 17 days ago
      My understanding is that o1's chain-of-thought tokens are in its own internal embedding, and anything human-readable the UI shows you is a translation of these CoT tokens into natural language.
      • inciampati 15 days ago
        I found this documentation from openai that supports my hunch: https://platform.openai.com/docs/guides/reasoning/advice-on-...

        The reasoning tokens from each step are lost. And there is no indication that they are different tokens than regular tokens.

      • inciampati 17 days ago
        Where is that documented? Fwiw, interactive use suggests they are not available to later invocations of the model. Any evidence this isn't the case?
  • timewizard 17 days ago
    > To justify the $200/mo price tag, it just has to provide 1-2 Engineer hours a month

    > Give a ton of context. Whatever you think I mean by a “ton” — 10x that.

    One step forward. Two steps back.

  • adamgordonbell 17 days ago
    I'd love to see some examples, of good and bad prompting of o1

    I'll admit I'm probably not using O1 well, but I'd learn best from examples.

  • mediumsmart 17 days ago
    I agree with the article and found the non pro version very good at creating my local automation tool chain. It writes the scripts for every step and then you hand them all back to it and it links them up as a single dothiscomplicatedthing.sh
  • sklargh 17 days ago
    This echoes my experience. I often use ChatGPT to help with D&D module design and I found that O1 did best when I told it exactly what k required, dumped in a large amount of info and did not expect to use it to iterate multiple times.
  • irthomasthomas 17 days ago
    Can you provide prompt/response pairs? I'd like to test how other models perform using the same technique.
  • iovrthoughtthis 17 days ago
    this is hilarious
  • fpgaminer 17 days ago
    It does seem like individual prompting styles greatly effects the performance of these models. Which makes sense of course, but the disparity is a lot larger than I would have expected. As an example, I'd say I see far more people in the HN comments preferring Claude over everything else. This is in stark contrast to my experience, where ChatGPT has and continues to be my go to for everything. And that's on a range of problems: general questions, coding tasks, visual understanding, and creative writing. I use these AIs all day, every day as part of my research, so my experience is quite extensive. Yet in all cases Claude has performed significantly worse for me. Perhaps it just comes down to the way that I prompt versus the average HN user? Very odd.

    But yeah, o1 has been a _huge_ leap in my experience. One huge thing, which OpenAI's announcement mentions as well, is that o1 is more _consistently_ strong. 4o is a great model, but sometimes you have to spin the wheel a few times. I much more rarely need to spin o1's wheel, which mostly makes up for its thinking time. (Which is much less these days compared to o1-preview). It also has much stronger knowledge. So far it has solved a number of troubleshooting tasks that there were _no_ fixes for online. One of them was an obscure bug in libjpeg.

    It's also better at just general questions, like wanting to know the best/most reputable store for something. 4o is too "everything is good! everything is happy!" to give helpful advice here. It'll say Temu is a "great store for affordable options." That kind of stuff. Whereas o1 will be more honest and thus helpful. o1 is also significantly better at following instructions overall, and inferring meaning behind instructions. 4o will be very literal about examples that you give it whereas o1 can more often extrapolate.

    One surprising thing that o1 does that 4o has never done, is that it _pushes back_. It tells me when I'm wrong (and is often right!). Again, part of that being less happy and compliant. I have had scenarios where it's wrong and it's harder to convince it otherwise, so it's a double edged sword, but overall it has been an improvement in the bot's usefulness.

    I also find it interesting that o1 is less censored. It refuses far less than 4o, even without coaxing, despite its supposed ability to "reason" about its guidelines :P What's funny is that the "inner thoughts" that it shows says that it's refusing, but its response doesn't.

    Is it worth $200? I don't think it is, in general. It's not really an "engineer" replacement yet, in that if you don't have the knowledge to ask o1 the right questions it won't really be helpful. So you have to be an engineer for it to work at the level of one. Maybe $50/mo?

    I haven't found o1-pro to be useful for anything; it's never really given better responses than o1 for me.

    (As an aside, Gemini 2.0 Flash Experimental is _very_ good. It's been trading blows with even o1 for some tasks. It's a bit chaotic, since its training isn't done, but I rank it at about #2 between all SOTA models. A 2.0 Pro model would likely be tied with o1 if Google's trajectory here continues.)

  • miltonlost 17 days ago
    oh god using an LLM for medical advice? and maybe getting 3/5 right? Barely above a coin flip.

    And that Warning section? "Do not be wrong. Give the correct names." That this is necessary to include is an idiotic product "choice" since its non-inclusion implies the bot is able to be wrong and give wrong names. This is not engineering.

    • isoprophlex 17 days ago
      Not if you're selecting out of 10s or 100s of possible diagnoses
      • PollardsRho 17 days ago
        It's hard to characterize the entropy of the distribution of potential diseases given a presentation: even if there are in theory many potential diagnoses, in practice a few will be a lot more common.

        It doesn't really matter how much better the model is than random chance on a sample size of 5, though. There's a reason medicine is so heavily licensed: people die when they get uninformed advice. Asking o1 if you have skin cancer is gambling with your life.

        That's not to say AI can't be useful in medicine: everyone doesn't have a dermatologist friend, after all, and I'm sure for many underserved people basic advice is better than nothing. Tools could make the current medical system more efficient. But you would need to do so much more work than whatever this post did to ascertain whether that would do more good than harm. Can o1 properly direct people to a medical expert if there's a potentially urgent problem that can't be ruled out? Can it effectively disclaim its own advice when asked about something it doesn't know about, the way human doctors refer to specialists?

        • dutchbookmaker 16 days ago
          iatrogenic harm though is like 25-30% with the current system.

          It is really something that these models should help out a great deal with but people want to believe their doctor has some kind of medical omniscience.

        • MarcoZavala 17 days ago
          [dead]
      • miltonlost 17 days ago
        ?????? What?

        > Just for fun, I started asking o1 in parallel. It’s usually shockingly close to the right answer — maybe 3/5 times. More useful for medical professionals — it almost always provides an extremely accurate differential diagnosis.

        THIS IS DANGEROUS TO TELL PEOPLE TO DO. OpenAI is not a medical professional. Stop using chatbots for medical diagnoses. 60% is not almost always extremely accurate. This whole post, because of this bullet point, shows the author doesn't actually know the limitations of the product they're using and instead passing along misinformation.

        Go to a doctor, not your chatbot.

        • simonw 17 days ago
          I honestly think trusting exclusively your own doctor is a dangerous thing to do as well. Doctors are not infallible.

          It's worth putting in some extra effort yourself, which may include consulting with LLMs provided you don't trust those blindly and are sensible about how you incorporate hints they give you into your own research.

          Nobody is as invested in your own health as you are.

        • MarcoZavala 17 days ago
          [dead]
  • refulgentis 17 days ago
    This is a bug, and a regression, not a feature.

    It's odd to see it recast as "you need to give better instructions [because it's different]" -- you could drop the "because it's different" part, and it'd apply to failure modes in all models.

    It also begs the question of how it's different: and that's where the rationale gets cyclical. You have to prompt it different because it's different because you have to prompt it different.

    And where that really gets into trouble is the "and that's the point" part -- as the other comment notes, it's expressly against OpenAI's documentation and thus intent.

    I'm a yuge AI fan. Models like this are a clear step forward. But it does a disservice to readers to leave the impression that the same techniques don't apply to other models, and recasts a significant issue as design intent.

    • inciampati 17 days ago
      Looking at o1's behavior, it seems there's a key architectural limitation: while it can see chat history, it doesn't seem able to access its own reasoning steps after outputting them. This is particularly significant because it breaks the computational expressivity that made chain-of-thought prompting work in the first place—the ability to build up complex reasoning through iterative steps.

      This will only improve when o1's context windows grow large enough to maintain all its intermediate thinking steps, we're talking orders of magnitude beyond current limits. Until then, this isn't just a UX quirk, it's a fundamental constraint on the model's ability to develop thoughts over time.

      • skissane 17 days ago
        > This will only improve when o1's context windows grow large enough to maintain all its intermediate thinking steps, we're talking orders of magnitude beyond current limits.

        Rather than retaining all those steps, what about just retaining a summary of them? Or put them in a vector DB so on follow-up it can retrieve the subset of them most relevant to the follow-up question?

        • throwup238 17 days ago
          That’s kind of what (R/C)NNs did before the Attention is all you need paper introduced the attention mechanism. One of the breakthroughs that enabled GPT is giving each token equal “weight” through cross attention instead of letting them get attenuated in some sort of summarization mechanism.
      • refulgentis 17 days ago
        Is that relevant here? the post discussed writing a long prompt to get a good answer, not issues with ex. step #2 forgetting what was done in step #1.
        • inciampati 15 days ago
          https://platform.openai.com/docs/guides/reasoning/advice-on-... this explains the bug. o1 can't see its past thinking. This would seem to limit the expressivity of the chain of thought. Maybe within one step it's UTM, but with the loss of memory, extra steps will be needed to make sure the right information is passed forward. The model is likely to start to forget key ideas which it had that it didn't write down in the output. This will tend to make it drift and start to focus more on its final statements and less (or not at all) on some of the things which led it to them.
        • inciampati 16 days ago
          Yes it is, because the post discussed this approach precisely because unrolling the actual chain of thought in interactive chat does not work.

          And it's doubly relevant because chain of thought let's transformers break out of TCO complexity and be UTM. This matters because TC0 is pattern matching while UTM is general intelligence. Forgetting what the model thought breaks this and (ironically) probably forces the model back into one-shot pattern matching. https://arxiv.org/abs/2310.07923

    • HarHarVeryFunny 17 days ago
      It's different because a chat model has been post-trained for chat, while o1/o3 have been post-trained for reasoning.

      Imagine trying to have a conversation with someone who's been told to assume that they should interpret anything said to them as a problem they need to reason about and solve. I doubt you'd give them high marks for conversational skill.

      Ideally one model could do it all, but for now the tech is apparently being trained using reinforcement learning to steer the response towards a singular training goal (human feedback gaming, or successful reasoning).

      • refulgentis 17 days ago
        TFA, and my response, are about a de novo relationship between task completion and input prompt. Not conversational skill.
        • HarHarVeryFunny 16 days ago
          Yes, and the "de novo" explanation appears obvious as indicated - the model was trained differently - different reinforcement learning goals (reasoning vs human feedback for chat). The necessity for different prompting derives from the different operational behavior of a model trained in this way (to support self-evaluation based on the data present in the prompt, backtracking when veering away from the goals established in the prompt, etc - the handful of reasoning behaviors that have been baked into the model via RL).
    • torginus 17 days ago
      I wouldn't be so harsh - you cold have a 4o style LLM turn vague user queries into precise constraints for an o1 style AI - this is how a lot of stable diffusion image generators work already.
      • refulgentis 17 days ago
        Correct, you get it: its turtles all the way down, not "it's different intentionally"