Gemini 3 Flash: Frontier intelligence built for speed

(blog.google)

631 points | by meetpateltech 5 hours ago

75 comments

samyok 5 hours ago
Don’t let the “flash” name fool you, this is an amazing model.
I have been playing with it for the past few weeks, it’s genuinely my new favorite; it’s so fast and it has such a vast world knowledge that it’s more performant than Claude Opus 4.5 or GPT 5.2 extra high, for a fraction (basically order of magnitude less!!) of the inference time and price
[-]
- thecupisblue 5 hours ago
  Oh wow - I recently tried 3 Pro preview and it was too slow for me.
  After reading your comment I ran my product benchmark against 2.5 flash, 2.5 pro and 3.0 flash.
  The results are better AND the response times have stayed the same. What an insane gain - especially considering the price compared to 2.5 Pro. I'm about to get much better results for 1/3rd of the price. Not sure what magic Google did here, but would love to hear a more technical deep dive comparing what they do different in Pro and Flash models to achieve such a performance.
  Also wondering, how did you get early access? I'm using the Gemini API quite a lot and have a quite nice internal benchmark suite for it, so would love to toy with the new ones as they come out.
- mips_avatar 39 minutes ago
  OpenAI made a huge mistake neglecting fast inferencing models. Their strategy was gpt 5 for everything, which hasn't worked out at all. I'm really not sure what model OpenAI wants me to use for my applications that require lower latency. If I follow their advice in their API docs about which models I should use for faster responses I get told either use GPT 5 low thinking, or replace gpt 5 with gpt 4.1, or switch to the mini model. Now as a developer I'm doing evals on all three of these combinations. I'm running my evals on gemini 3 flash right now, and it's outperforming gpt5 thinking without thinking. OpenAI should stop trying to come up with ads and make models that are useful.
  [-]
  - danpalmer 1 minute ago
    Hardware is a factor here. GPUs are necessarily higher latency than TPUs for equivalent compute on equivalent data. There are lots of other factors here, but latency specifically favours TPUs.
    The only non-TPU fast models I'm aware of are things running on Cerebras can be much faster because of their CPUs, and Grok has a super fast mode, but they have a cheat code of ignoring guardrails and making up their own world knowledge.
  - simonw 28 minutes ago
    Yeah, I'm surprised that they've been through GT-5.1 and GPT-5.1-Codex and GPT-5.1-Codex-Max and now GPT-5.2 but their most recent mini model is still GPT-5-mini.
- lambda 1 hour ago
  I'm a significant genAI skeptic.
  I periodically ask them questions about topics that are subtle or tricky, and somewhat niche, that I know a lot about, and find that they frequently provide extremely bad answers. There have been improvements on some topics, but there's one benchmark question that I have that just about every model I've tried has completely gotten wrong.
  Tried it on LMArena recently, got a comparison between Gemini 2.5 flash and a codenamed model that people believe was a preview of Gemini 3 flash. Gemini 2.5 flash got it completely wrong. Gemini 3 flash actually gave a reasonable answer; not quite up to the best human description, but it's the first model I've found that actually seems to mostly correctly answer the question.
  So, it's just one data point, but at least for my one fairly niche benchmark problem, Gemini 3 Flash has successfully answered a question that none of the others I've tried have (I haven't actually tried Gemini 3 Pro, but I'd compared various Claude and ChatGPT models, and a few different open weights models).
  So, guess I need to put together some more benchmark problems, to get a better sample than one, but it's at least now passing a "I can find the answer to this in the top 3 hits in a Google search for a niche topic" test better than any of the other models.
  Still a lot of things I'm skeptical about in all the LLM hype, but at least they are making some progress in being able to accurately answer a wider range of questions.
- scrollop 2 hours ago
  Alright so we have more benchmarks including hallucinations and flash doesn't do well with that, though generally it beats gemini 3 pro and GPT 5.1 thinking and gpt 5.2 thinking xhigh (but then, sonnet, grok, opus, gemini and 5.1 beat 5.2 xhigh) - everything. Crazy.
  https://artificialanalysis.ai/evaluations/omniscience
- giancarlostoro 2 hours ago
  I wonder at what point will everyone who over-invested in OpenAI will regret their decision (expect maybe Nvidia?). Maybe Microsoft doesn't need to care, they get to sell their models via Azure.
  [-]
  - toomuchtodo 2 hours ago
    Amazon Set to Waste $10 Billion on OpenAI - https://finance.yahoo.com/news/amazon-set-waste-10-billion-1... - December 17th, 2025
  - guelo 3 minutes ago
    OpenAI's doom was written when Altman (and Nadella) got greedy, threw away the nonprofit mission, and caused the exodus of talent and funding that created Anthropic. If they had stayed nonprofit the rest of the industry could have consolidated their efforts against Google's juggernaut. I don't understand how they expected to sustain the advantage against Google's infinite money machine. With Waymo Google showed that they're willing to burn money for decades until they succeed.
  - outside1234 1 hour ago
    Very soon, because clearly OpenAI is in very serious trouble. They are scaled and have no business model and a competitor that is much better than them at almost everything (ads, hardware, cloud, consumer, scaling).
  - jack_riminton 19 minutes ago
    But you’re forgetting the Jonny Ive hardware device that totally isn’t like that laughable pin badge thing from Humane
    /s
- mmaunder 4 hours ago
  Thanks, having it walk a hardcore SDR signal chain right now --- oh damn it just finished. The blog post makes it clear this isn't just some 'lite' model - you get low latency and cognitive performance. really appreciate you amplifying that.
- esafak 5 hours ago
  What are you using it for and what were you using before?
- epolanski 4 hours ago
  Gemini 2.0 flash was good already for some tasks of mine long time ago..
- unsupp0rted 4 hours ago
  How good is it for coding, relative to recent frontier models like GPT 5.x, Sonnet 4.x, etc?
  [-]
  - bovermyer 19 minutes ago
    In my own, very anecdotal, experience, Gemini 3 Pro and Flash are both more reliably accurate than GPT 5.x.
    I have not worked with Sonnet enough to give an opinion there.
- tonymet 2 hours ago
  Can you be more specific on the tasks you’ve found exceptional ?
- dfsegoat 14 minutes ago
  > it’s more performant than Claude Opus 4.5 or GPT 5.2 extra high
  ...and all of that done without any GPUs as far as i know! [1]
  [1] - https://www.uncoveralpha.com/p/the-chip-made-for-the-ai-infe...
  (tldr: afaik Google trained Gemini 3 entirely on tensor processing units - TPUs)
- freedomben 4 hours ago
  Cool! I've been using 2.5 flash and it is pretty bad. 1 out of 5 answers it gives will be a lie. Hopefully 3 is better
  [-]
  - samyok 4 hours ago
    Did you try with the grounding tool? Turning it on solved this problem for me.
    [-]
    - Davidzheng 4 hours ago
      what if the lie is a logical deduction error not a fact retrieval error
      [-]
      - rat9988 4 hours ago
        The error rate would still be improved overall and might make it a viable tool for the price depending on the usecase.
- moffkalast 52 minutes ago
  Should I not let the "Gemini" name fool me either?
- encroach 3 hours ago
  How did you get early access?
- yunohn 44 minutes ago
  I love how every single LLM model release is accompanied by pre-release insiders proclaiming how it’s the best model yet…
- tonyhart7 2 hours ago
  I think google is the only one that still produce general knowledge LLM right now
  claude is coding model from the start but GPT is in more and more becoming coding model
  [-]
  - Workaccount2 1 hour ago
    Coding is basically an edge case for LLMs too.
    Pretty much every person in the first (and second) world is using AI now, and only small fraction of those people are writing software. This is also reflected in OAI's report from a few months ago that found programming to only be 4% of tokens.
    [-]
    - aleph_minus_one 37 minutes ago
      > Pretty much every person in the first (and second) world is using AI now
      This sounds like you live in a huge echo chamber. :-(
  - Imustaskforhelp 2 hours ago
    I agree with this observation. Gemini does feel like code-red for basically every AI company like chatgpt,claude etc. too in my opinion if the underlying model is both fast and cheap and good enough
    I hope open source AI models catch up to gemini 3 / gemini 3 flash. Or google open sources it but lets be honest that google isnt open sourcing gemini 3 flash and I guess the best bet mostly nowadays in open source is probably glm or deepseek terminus or maybe qwen/kimi too.
    [-]
    - leemoore 44 minutes ago
      Gemini isn't code red for Anthropic. Gemini threatens none of Anthropic's positioning in the market.
      [-]
      - ralusek 38 minutes ago
        Yes it does. I never use Claude anymore outside of agentic tasks.
    - Uehreka 1 hour ago
      I would expect open weights models to always lag behind; training is resource-intensive and it’s much easier to finance if you can make money directly from the result. So in a year we may have a ~700B open weights model that competes with Gemini 3, but by then we’ll have Gemini 4, and other things we can’t predict now.
      [-]
      - xbmcuser 1 hour ago
        There will be diminishing returns though as the future models won't be thah much better we will reach a point where the open source model will be good enough for most things. And the need for being on the latest model no longer so important.
        For me the bigger concern which I have mentioned on other AI related topics is that AI is eating all the production of computer hardware so we should be worrying about hardware prices getting out of hand and making it harder for general public to run open source models. Hence I am rooting for China to reach parity on node size and crash the PC hardware prices.
        [-]
        FuckButtons 28 minutes ago
        I had a similar opinion, that we were somewhere near the top of the sigmoid curve of model improvement that we could achieve in the near term. But given continued advancements, I’m less sure that prediction holds.
      - baq 23 minutes ago
        If Gemini 3 flash is really confirmed close to Opus 4.5 at coding and a similarly capable model is open weights, I want to buy a box with an usb cable that has that thing loaded, because today that’s enough to run out of engineering work for a small team.
    - Workaccount2 1 hour ago
      Open source models are riding coat tails, they are basically just distilling the giant SOTA models, hence perpetually being 4-6mos behind.
      [-]
      - waffletower 43 minutes ago
        If this quantification of lag is anywhere near accurate (it may be larger and/or more complex to describe), soon open source models will be "simply good enough". Perhaps companies like Apple could be 2nd round AI growth companies -- where they market optimized private AI devices via already capable Macbooks or rumored appliances. While not obviating cloud AI, they could cheaply provide capable models without subscription while driving their revenue through increased device sales. If the cost of cloud AI increases to support its expense, this use case will act as a check on subscription prices.
      - Gigachad 1 hour ago
        So basically the proprietary models are devalued to almost 0 in about 4-6 months. Can they recover the training costs + profit margin every 4 months?
- jauntywundrkind 5 hours ago
  Just to point this out: many of these frontier models cost isn't that far away from two orders of magnitude more than what DeepSeek charges. It doesn't compare the same, no, but with coaxing I find it to be a pretty capable competent coding model & capable of answering a lot of general queries pretty satisfactorily (but if it's a short session, why economize?). $0.28/m in, $0.42/m out. Opus 4.5 is $5/$25 (17x/60x).
  I've been playing around with other models recently (Kimi, GPT Codex, Qwen, others) to try to better appreciate the difference. I knew there was a big price difference, but watching myself feeding dollars into the machine rather than nickles has also founded in me quite the reverse appreciation too.
  I only assume "if you're not getting charged, you are the product" has to be somewhat in play here. But when working on open source code, I don't mind.
  [-]
  - KoolKat23 6 minutes ago
    I struggle to see the incentive to do this, I have similar thoughts for locally run models. It's only use case I can imagine is small jobs at scale perhaps something like auto complete integrated into your deployed application, or for extreme privacy, honouring NDA's etc.
    Otherwise, if it's a short prompt or answer, SOTA (state of the art) model will be cheap anyway and id it's a long prompt/answer, it's way more likely to be wrong and a lot more time is spent on "checking/debugging" any issue or hallucination, so again SOTA is better.
  - happyopossum 4 hours ago
    Two orders of magnitude would imply that these models cost $28/m in and $42/m out. Nothing is even close to that.
    [-]
    - minraws 1 hour ago
      Gpt 5.2 pro is well beyond that iirc
    - jauntywundrkind 2 hours ago
      To me as an engineer, 60x for output (which is most of the cost I see, AFAICT) is not that significantly different from 100x.
      I tried to be quite clear with showing my work here. I agree that 17x is much closer to a single order of magnitude than two. But 60x is, to me, a bulk enough of the way to 100x that yeah I don't feel bad saying it's nearly two orders (it's 1.78 orders of magnitude). To me, your complaint feels rigid & ungenerous.
      My post is showing to me as -1, but I standby it right now. Arguing over the technicalities here (is 1.78 close enough to 2 orders to count) feels besides the point to me: DeepSeek is vastly more affordable than nearly everything else, putting even Gemini 3 Flash here to shame. And I don't think people are aware of that.
      I guess for my own reference, since I didn't do it the first time: at $0.50/$3.00 / M-i/o, Gemini 3 Flash here is 1.8x & 7.1x (1e1.86) more expensive than DeepSeek.
- poopiokaka 2 hours ago
  [dead]
- Sincere6066 5 hours ago
  [flagged]
__jl__ 5 hours ago
This is awesome. No preview release either, which is great to production.
They are pushing the prices higher with each release though: API pricing is up to $0.5/M for input and $3/M for output
For comparison:
Gemini 3.0 Flash: $0.50/M for input and $3.00/M for output
Gemini 2.5 Flash: $0.30/M for input and $2.50/M for output
Gemini 2.0 Flash: $0.15/M for input and $0.60/M for output
Gemini 1.5 Flash: $0.075/M for input and $0.30/M for output (after price drop)
Gemini 3.0 Pro: $2.00/M for input and $12/M for output
Gemini 2.5 Pro: $1.25/M for input and $10/M for output
Gemini 1.5 Pro: $1.25/M for input and $5/M for output
I think image input pricing went up even more.
Correction: It is a preview model...
[-]
- KoolKat23 3 minutes ago
  Token usage also needs to be factored in specifically when thinking is enabled, these newer models find more difficult problems easier and use less tokens to solve.
- srameshc 5 hours ago
  Thanks that was a great breakup of cost. I just assumed before that it was the same pricing. The pricing probably comes from the confidence and the buzz around Gemini 3.0 as one of the best performing models. But competetion is hot in the area and it's not too far where we get similar performing models for cheaper price.
- mips_avatar 5 hours ago
  I'm more curious how Gemini 3 flash lite performs/is priced when it comes out. Because it may be that for most non coding tasks the distinction isn't between pro and flash but between flash and flash lite.
- sunaookami 4 hours ago
  This is a preview release.
- YetAnotherNick 4 hours ago
  For comparison, GPT-5 mini is $0.25/M for input and $2.00/M for output, so double the price for input and 50% higher for output.
  [-]
  - AuthError 4 hours ago
    flash is closer to sonnet than gpt minis though
- uluyol 5 hours ago
  Are these the current prices or the prices at the time the models were released?
  [-]
  - __jl__ 4 hours ago
    Mostly at the time of release except for 1.5 Flash which got a price drop in Aug 2024.
    Google has been discontinuing older models after several months of transition period so I would expect the same for the 2.5 models. But that process only starts when the release version of 3 models is out (pro and flash are in preview right now).
- martythemaniak 3 hours ago
  The price increase sucks, but you really do get a whole lot more. They also had the "Flash Lite" series, 2.5 Flash Lite is 0.10/M, hopefully we see something like 3.0 Flash Lite for .20-.25.
- misiti3780 3 hours ago
  is there a website where i can compare openai, anthropic and gemini models on cost/token ?
  [-]
  - jsnell 2 hours ago
    There are plenty. But it's not the comparison you want to be making. There is too much variability between the number of tokens used for a single response, especially once reasoning models became a thing. And it gets even worse when you put the models into a variable length output loop.
    You really need to look at the cost per task. artificialanalysis.ai has a good composite score, measures the cost of running all the benchmarks, and has 2d a intelligence vs. cost graph.
    [-]
    - misiti3780 1 hour ago
      thanks
RobinL 2 hours ago
Feels like Google is really pulling ahead of the pack here. A model that is cheap, fast and good, combined with Android and gsuite integration seems like such powerful combination.
Presumably a big motivation for them is to be first to get something good and cheap enough they can serve to every Android device, ahead of whatever the OpenAI/Jony Ive hardware project will be, and way ahead of Apple Intelligence. Speaking for myself, I would pay quite a lot for truly 'AI first' phone that actually worked.
[-]
- skerit 52 minutes ago
  Pulling ahead? Depends on the usecase I guess. 3 turns into a very basic Gemini-CLI session and Gemini 3 Pro has already messed up a simple `Edit` tool-call. And it's awfully slow. In 27 minutes it did 17 tool calls, and only managed to modify 2 files. Meanwhile Claude-Code flies through the same task in 5 minutes.
  [-]
  - RobinL 38 minutes ago
    Yeah - agree, Anthropic much better for coding. I'm more thinking about the 'average chat user' (the larger potential userbase), most of whom are on chatgpt.
- exegete 1 hour ago
  Apple Intelligence is going to be Gemini https://www.macrumors.com/2025/11/05/apple-siri-google-gemin...
- anukin 1 hour ago
  What will you use the ai in the phone to do for you? I can understand tablets and smart glasses being able to leverage smol AI much better than a phone which is reliant on apps for most of the work.
  [-]
  - Workaccount2 25 minutes ago
    I desperately want to be able to real-time dictate actions to take on my phone.
    Stuff like:
    "Open Chrome, new tab, search for xyz, scroll down, third result, copy the second paragraph, open whatsapp, hit back button, open group chat with friends, paste what we copied and send, send a follow-up laughing tears emoji, go back to chrome and close out that tab"
    All while being able to just quickly glance at my phone. There is already a tool like this, but I want the parsing/understanding of an LLM and super fast response times.
    [-]
    - procaryote 6 minutes ago
      is that faster to say than do, or is it an accessibility or while-driving need?
Workaccount2 33 minutes ago
So gemini 3 flash (non thinking) is now the first model to get 50% on my "count the dog legs" image test.
Gemini 3 pro got 20%, and everyone else has gotten 0%. I saw benchmarks showing 3 flash is almost trading blows with 3 pro, so I decided to try it.
Basically it is an image showing a dog with 5 legs, an extra one photoshopped onto it's torso. Every models counts 4, and gemini 3 pro, while also counting 4, said the dog had a "large male anatomy". However it failed a follow-up saying 4 again.
3 flash counted 5 legs on the same image, however I added distinct a "tattoo" to each leg as an assist. These tattoos didn't help 3 pro or other models.
So it is the first out of all the models I have tested to count 5 legs on the "tattooed legs" image. It still counted only 4 legs on the image without the tattoos. I'll give it 1/2 credit.
fariszr 5 hours ago
These flash models keep getting more expensive with every release.
Is there an OSS model that's better than 2.0 flash with similar pricing, speed and a 1m context window?
Edit: this is not the typical flash model, it's actually an insane value if the benchmarks match real world usage.
> Gemini 3 Flash achieves a score of 78%, outperforming not only the 2.5 series, but also Gemini 3 Pro. It strikes an ideal balance for agentic coding, production-ready systems and responsive interactive applications.
The replacement for old flash models will be probably the 3.0 flash lite then.
[-]
- sosodev 39 minutes ago
  Nvidia released Nemotron 3 nano recently and I think it fits your requirements for an OSS model: https://huggingface.co/nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B...
  It's extremely fast on good hardware, quite smart, and can support up to 1m context with reasonable accuracy
- thecupisblue 4 hours ago
  Yes, but the 3.0 Flash is cheaper, faster and better than 2.5 Pro.
  So if 2.5 Pro was good for your usecase, you just got a better model for about 1/3rd of the price, but might hurt the wallet a bit more if you use 2.5 Flash currently and want an upgrade - which is fair tbh.
- aoeusnth1 5 hours ago
  I think it's good, they're raising the size (and price) of flash a bit and trying to position Flash as an actually useful coding / reasoning model. There's always lite for people who want dirt cheap prices and don't care about quality at all.
- scrollop 2 hours ago
  This one is more powerful than openai models, including gpt 5.2 (which is worse on various benchmarks than 5.1 which is worse than 5.1, and that's where 5.2 was using XHIGH, whiulst the others were on high eg: https://youtu.be/4p73Uu_jZ10?si=x1gZopegCacznUDA&t=582 )
  https://epoch.ai/benchmarks/simplebench
- mips_avatar 3 hours ago
  For my apps evals Gemini flash and grok 4 fast are the only ones worth using. I'd love for an open weights model to compete in this arena but I haven't found one.
- fullstackwife 5 hours ago
  cost of e2e task resolution should be cheaper, even if single inference cost is higher, you need fewer loops to solve a problem now
  [-]
  - fariszr 5 hours ago
    Sure, but for simple tasks that require a large context window, aka the typical usecase for 2.0 flash, it's still significantly more expensive.
simonsarris 5 hours ago
Even before this release the tools (for me: Claude Code and Gemini for other stuff) reached a "good enough" plateau that means any other company is going to have a hard time making me (I think soon most users) want to switch. Unless a new release from a different company has a real paradigm shift, they're simply sufficient. This was not true in 2023/2024 IMO.
With this release the "good enough" and "cheap enough" intersect so hard that I wonder if this is an existential threat to those other companies.
[-]
- bgirard 5 hours ago
  Why wouldn't you switch? The cost to switch is near zero for me. Some tools have built in model selectors. Direct CLI/IDE plug-ins practically the same UI.
  [-]
  - azuanrb 5 hours ago
    Not OP, but I feel the same way. Cost is just one of the factor. I'm used to Claude Code UX, my CLAUDE.md works well with my workflow too. Unless there's any significant improvement, changing to new models every few months is going to hurt me more.
    [-]
    - bgirard 4 hours ago
      I used to think this way. But I moved to AGENTS.md. Now I use the different UI as a mental context separation. Codex is working on Feature A, Gemini on feature B, Claude on Feature C. It has become a feature.
      [-]
      - rolisz 2 hours ago
        You're assuming that different models need the same stuff in AGENTS.md
        In my experience, to get the best performance out of different models, they need slightly different prompting.
  - nevir 2 hours ago
    I think a big part of the switching cost is the cost of learning a different model's nuances. Having good intuition for what works/doesn't, how to write effective prompts, etc.
    Maybe someday future models will all behave similarly given the same prompt, but we're not quite there yet
- theLiminator 5 hours ago
  For me, the last wave of models finally started delivering on their agentic coding promises.
  [-]
  - orourke 5 hours ago
    This has been my experience exactly. Even over just the last few weeks I’ve noticed a dramatic drop in having to undo what the agents have done.
  - inquirerGeneral 5 hours ago
    [dead]
- calflegal 5 hours ago
  I asked a similar question yesterday:
  https://news.ycombinator.com/item?id=46290797
- nprateem 5 hours ago
  But for me the previous models were routinely wrong time wasters that overall added no speed increase taking the lottery of whether they'd be correct into account.
- catigula 4 hours ago
  Correct. Opus 4.5 'solved' software engineering. What more do I need? Businesses need uncapped intelligence, and that is a very high bar. Individuals often don't.
  [-]
  - gaigalas 4 hours ago
    If Opus is one-size-fits-all, then why Claude keeps the other series? (rethorical).
    Opus and Sonnet are slower than Haiku. For lots of less sophisticated tasks, you benefit from the speed.
    All vendors do this. You need smaller models that you can rapid-fire for lots of other reasons than vibe coding.
    Personally, I actually use more smaller models than the sophisticated ones. Lots of small automations.
- alex1138 3 hours ago
  I just can't stop thinking though about the vulnerability of training data
  You say good enough. Great, but what if I as a malicious person were to just make a bunch of internet pages containing things that are blatantly wrong, to trick LLMs?
  [-]
  - calflegal 3 hours ago
    The internet has already tried this, for about a few decades. The garbage is in the corpus; it gets weighted as such
- szundi 5 hours ago
  [dead]
caminanteblanco 5 hours ago
Does anyone else understand what the difference is between Gemini 3 'Thinking' and 'Pro'? Thinking "Solves complex problems" and Pro "Thinks longer for advanced math & code".
I assume that these are just different reasoning levels for Gemini 3, but I can't even find mention of there being 2 versions anywhere, and the API doesn't even mention the Thinking-Pro dichotomy.
[-]
- flakiness 5 hours ago
  It seems:
```
   - "Thinking" is Gemini 3 Flash with higher "thinking_level"
   - Prop is Gemini 3 Pro. It doesn't mention "thinking_level" but I assume it is set to high-ish.
```
- peheje 5 hours ago
  I think:
  Fast = Gemini 3 Flash without thinking (or very low thinking budget)
  Thinking = Gemini 3 flash with high thinking budget
  Pro = Gemini 3 Pro with thinking
  [-]
  - sunaookami 3 hours ago
    It's this, yes: https://x.com/joshwoodward/status/2001350002975850520
    >Fast = 3 Flash
    >Thinking = 3 Flash (with thinking)
    >Pro = 3 Pro (with thinking)
    [-]
    - caminanteblanco 1 hour ago
      Thank you! I wish they had clearer labelling (or at the very least some documentation) explaining this.
- lysace 4 hours ago
  Really stupid question: How is Gemini-like 'thinking' separate from artificial general intelligence (AGI)?
  When I ask Gemini 3 Flash this question, the answer is vague but agency comes up a lot. Gemini thinking is always triggered by a query.
  This seems like a higher-level programming issue to me. Turn it into a loop. Keep the context. Those two things make it costly for sure. But does it make it an AGI? Surely Google has tried this?
  [-]
  - dcre 1 hour ago
    This is what every agentic coding tool does. You can try it yourself right now with the Gemini CLI, OpenCode, or 20 other tools.
  - CamperBob2 1 hour ago
    I don't think we'll get genuine AGI without long-term memory, specifically in the form of weight adjustment rather than just LoRAs or longer and longer contexts. When the model gets something wrong and we tell it "That's wrong, here's the right answer," it needs to remember that.
    Which obviously opens up a can of worms regarding who should have authority to supply the "right answer," but still... lacking the core capability, AGI isn't something we can talk about yet.
    LLMs will be a part of AGI, I'm sure, but they are insufficient to get us there on their own. A big step forward but probably far from the last.
  - criley2 1 hour ago
    Advanced reasoning LLM's simulate many parts of AGI and feel really smart, but fall short in many critical ways.
    - An AGI wouldn't hallucinate, it would be consistent, reliable and aware of its own limitations
    - An AGI wouldn't need extensive re-training, human reinforced training, model updates. It would be capable of true self-learning / self-training in real time.
    - An AGI would demonstrate real genuine understanding and mental modeling, not pattern matching over correlations
    - It would demonstrate agency and motivation, not be purely reactive to prompting
    - It would have persistent integrated memory. LLM's are stateless and driven by the current context.
    - It should even demonstrate consciousness.
    And more. I agree that what've we've designed is truly impressive and simulates intelligence at a really high level. But true AGI is far more advanced.
    [-]
    - waffletower 28 minutes ago
      Humans can fail at some of these qualifications, often without guile: - being consistent and knowing their limitations - people do not universally demonstrate effective understanding and mental modeling.
      I don't believe the "consciousness" qualification is at all appropriate, as I would argue that it is a projection of the human machine's experience onto an entirely different machine with a substantially different existential topology -- relationship to time and sensorium. I don't think artificial general intelligence is a binary label which is applied if a machine rigidly simulates human agency, memory, and sensing.
    - lysace 1 hour ago
      Thanks for humoring my stupid question with a great answer. I was kind of hoping for something like this :).
kingstnap 5 hours ago
It has a SimpleQA score of 69%, a benchmark that tests knowledge on extremely niche facts, that's actually ridiculously high (Gemini 2.5 *Pro* had 55%) and reflects either training on the test set or some sort of cracked way to pack a ton of parametric knowledge into a Flash Model.
I'm speculating but Google might have figured out some training magic trick to balance out the information storage in model capacity. That or this flash model has huge number of parameters or something.
[-]
- scrollop 2 hours ago
  Also
  https://artificialanalysis.ai/evaluations/omniscience
  Prepare to be amazed
  [-]
  - albumen 39 minutes ago
    I’m amazed by how much Gemini 3 flash hallucinates; it performs poorly in that metric (along with lots of other models). In the Hallucination Rate vs. AA-Omniscience Index chart, it’s not in the most desirable quadrant; GPT-5.1 (high), opus 4.5 and 4.5 haiku are.
    Can someone explain how Gemini 3 pro/flash then do so well then in the overall Omniscience: Knowledge and Hallucination Benchmark?
- tanh 4 hours ago
  This will be fantastic for voice. I presume Apple will use it
- leumon 3 hours ago
  Or could it be that it's using tool calls in reasoning (e.g. a google search)?
- GaggiX 5 hours ago
  >or some sort of cracked way to pack a ton of parametric knowledge into a Flash Model.
  More experts with a lower pertentage of active ones -> more sparsity.
xpil 2 hours ago
My main issue with Gemini is that business accounts can't delete individual conversations. You can only enable or disable Gemini, or set a retention period (3 months minimum), but there's no way to delete specific chats. I'm a paying customer, prices keep going up, and yet this very basic feature is still missing.
[-]
- testfrequency 1 hour ago
  This is the #1 thing that keeps me from going all in on Gemini.
  Their retention controls for both consumer and business suck. It’s the worst of any of the leaders.
bayarearefugee 11 minutes ago
Gemini is so awful at any sort of graceful degradation whenever they are under heavy load.
Its great that they have these new fast models, but the release hype has made Gemini Pro pretty much unusable for hours.
"Sorry, something went wrong"
random sign-outs
random garbage replies, etc
simonw 5 hours ago
Quick pricing comparison: https://www.llm-prices.com/#it=100000&ot=10000&sel=gemini-3-...
It's 1/4 the price of Gemini 3 Pro ≤200k and 1/8 the price of Gemini 3 Pro >200k - notable that the new Flash model doesn’t have a price increase after that 200,000 token point.
It’s also twice the price of GPT-5 Mini for input, half the price of Claude 4.5 Haiku.
primaprashant 5 hours ago
Pricing is $0.5 / $3 per million input / output tokens. 2.5 Flash was $0.3 / $2.5. That's 66% increase in input tokens and 20% increase in output token pricing.
For comparison, from 2.5 Pro ($1.25 / $10) to 3 Pro ($2 / $12), there was 60% increase in input tokens and 20% increase in output tokens pricing.
[-]
- simonw 5 hours ago
  Calculating price increases is made more complex by the difference in token usage. From https://blog.google/products/gemini/gemini-3-flash/ :
  > Gemini 3 Flash is able to modulate how much it thinks. It may think longer for more complex use cases, but it also uses 30% fewer tokens on average than 2.5 Pro.
  [-]
  - Tiberium 2 hours ago
    Yes, but also most of the increase in 3 Flash is in the input context price, which isn't affected by reasoning.
zurfer 3 hours ago
It's a cool release, but if someone on the google team reads that: flash 2.5 is awesome in terms of latency and total response time without reasoning. In quick tests this model seems to be 2x slower. So for certain use cases like quick one-token classification flash 2.5 is still the better model. Please don't stop optimizing for that!
[-]
- edvinasbartkus 2 hours ago
  Did you try setting thinkingLevel to minimal?
  thinkingConfig: { thinkingLevel: "low", }
  More about it here https://ai.google.dev/gemini-api/docs/gemini-3#new_api_featu...
  [-]
  - zurfer 1 hour ago
    Yes I tried it with minimal and it's roughly 3 seconds for prompts that take flash 2.5 1 second.
    On that note it would be nice to get these benchmark numbers based on the different reasoning settings.
- bobviolier 10 minutes ago
  This might also have to do with it being a preview, and only available on the global region?
- Tiberium 2 hours ago
  You can still set thinking budget to 0 to completely disable reasoning, or set thinking level to minimal or low.
- retropragma 2 hours ago
  That's more of a flash-lite thing now, I believe
zhyder 5 hours ago
Glad to see big improvement in the SimpleQA Verified benchmark (28->69%), which is meant to measure factuality (built-in, i.e. without adding grounding resources). That's one benchmark where all models seemed to have low scores until recently. Can't wait to see a model go over 90%... then will be years till the competition is over number of 9s in such a factuality benchmark, but that'd be glorious.
meetpateltech 5 hours ago
Deepmind Page: https://deepmind.google/models/gemini/flash/
Developer Blog: https://blog.google/technology/developers/build-with-gemini-...
Model Card [pdf]: https://deepmind.google/models/model-cards/gemini-3-flash/
Gemini 3 Flash in Search AI mode: https://blog.google/products/search/google-ai-mode-update-ge...
[-]
- simonw 5 hours ago
  For anyone from the Gemini team reading this: these links should all be prominent in the announcement posts. I always have to hunt around for them!
  [-]
  - meetpateltech 4 hours ago
    Google actually does something similar for major releases - they publish a dedicated collection page with all related links.
    For example, the Gemini 3 Pro collection: https://blog.google/products/gemini/gemini-3-collection/
    But having everything linked at the bottom of the announcement post itself would be really great too!
    [-]
    - simonw 2 hours ago
      Sadly there's nothing about Gemini 3 Flash on that page yet.
- minimaxir 5 hours ago
  Documentation for Gemini 3 Flash in particular: https://ai.google.dev/gemini-api/docs/gemini-3
outside2344 4 hours ago
I don't want to say OpenAI is toast for general chat AI, but it sure looks like they are toast.
[-]
- scrollop 2 hours ago
  Looking at this they are:
  https://artificialanalysis.ai/evaluations/omniscience
  https://youtu.be/4p73Uu_jZ10?si=x1gZopegCacznUDA&t=582
- Gigachad 1 hour ago
  I’ve fully switched over to Gemini now. It seems significantly more useful, and is less of an automatic glaze machine that just restates your question and how smart you are for asking it.
mmaunder 4 hours ago
I think about what would be most terrifying to Anthropic and OpenAI i.e. The absolute scariest thing that Google could do. I think this is it: Release low latency, low priced models with high cognitive performance and big context window, especially in the coding space because that is direct, immediate, very high ROI for the customer.
Now, imagine for a moment they had also vertically integrated the hardware to do this.
[-]
- JumpCrisscross 3 hours ago
  > think about what would be most terrifying to Anthropic and OpenAI
  The most terrifying thing would be Google expanding its free tiers.
- avazhi 3 hours ago
  "Now, imagine for a moment they had also vertically integrated the hardware to do this."
  Then you realise you aren't imagining it.
  [-]
  - iwontberude 3 hours ago
    “And then imagine Google designing silicon that doesn’t trail the industry. While you are there we may as well start to imagine Google figures out how to support a product lifecycle that isn’t AdSense”
    Google is great on the data science alone, every thing else is an after thought
    [-]
    - avazhi 3 hours ago
      https://blog.google/products/google-cloud/ironwood-google-tp...
      "And then imagine Google designing silicon that doesn’t trail the industry."
      I'm def not a Google stan generally, but uh, have you even been paying attention?
      https://en.wikipedia.org/wiki/Tensor_Processing_Unit
      [-]
      - mmaunder 3 hours ago
        It's not funny when I have to explain the joke.
        [-]
        avazhi 2 hours ago
        Oh I got your joke, sir - but as you can see from the other comment, there are techies who still don't have even a rudimentary understanding of tensor cores, let alone the wider public and many investors. Over the next year or two the gap between Google and everybody else, even those they license their hardware to, is going to explode.
      - iwontberude 2 hours ago
        Exactly my point, they have bespoke offerings but when they compete head to head for performance they get smoked. See more: their Tensor processor that they use in the beleaguered Pixel. They are in last place.
        TPUs on the other hand are ASICs, we are more than familiar with the limited application, high performance and high barriers to entry associated with them. TPUs will be worthless as the AI bubble keeps deflating and excess capacity is everywhere.
        The people who don't have a rudimentary understanding are the wall street boosters that treat it like the primary threat to Nvidia or a moat for Google (hint: it is neither).
rohitpaulk 5 hours ago
Wild how this beats 2.5 Pro in every single benchmark. Don't think this was true for Haiku 4.5 vs Sonnet 3.5.
[-]
- FergusArgyll 5 hours ago
  Sonnet 3.5 might have been better than opus 3. That's my recollection anyhow
tootyskooty 5 hours ago
Since it now includes 4 thinking levels (minimal-high) I'd really appreciate if we got some benchmarks across the whole sweep (and not just what's presumably high).
Flash is meant to be a model for lower cost, latency-sensitive tasks. Long thinking times will both make TTFT >> 10s (often unacceptable) and also won't really be that cheap?
[-]
- happyopossum 3 hours ago
  Google appears to be changing what flash is “meant for” with this release - the capability it has along with the thinking budgets make it superior to previous Pro models in both outcome and speed. The likely-soon-coming flash-lite will fit right in to where flash used to be - cheap and fast.
SyrupThinker 5 hours ago
I wonder if this suffers from the same issue as 3 Pro, that it frequently "thinks" for a long time about date incongruity, insisting that it is 2024, and that information it receives must be incorrect or hypothetical.
Just avoiding/fixing that would probably speed up a good chunk of my own queries.
[-]
- robrenaud 5 hours ago
  Omg, it was so frustrating to say:
  Summarize recent working arxiv url
  And then it tells me the date is from the future and it simply refuses to fetch the URL.
Obertr 4 hours ago
At this point in time I start to believe OAI is very much behind on the models race and it can't be reversed
Image model they have released is much worse than nano banana pro, ghibli moment did not happen
Their GPT 5.2 is obviously overfit on benchmarks as a consensus of many developers and friends I know. So Opus 4.5 is staying on top when it comes to coding
The weight of the ads money from google and general direction + founder sense of Brin brought the google massive giant back to life. None of my companies workflow run on OAI GPT right now. Even though we love their agent SDK, after claude agent SDK it feels like peanuts.
[-]
- avazhi 4 hours ago
  "At this point in time I start to believe OAI is very much behind on the models race and it can't be reversed"
  This has been true for at least 4 months and yeah, based on how these things scale and also Google's capital + in-house hardware advantages, it's probably insurmountable.
  [-]
  - drawnwren 3 hours ago
    OAI also got talent mined. Their top intellectual leaders left after fight with sama, then Meta took a bunch of their mid-senior talent, and Google had the opposite. They brought Noam and Sergey back.
  - mmaunder 3 hours ago
    Yeah the only thing standing in Google's way is Google. And it's the easy stuff, like sensible billing models, easy to use docs and consoles that make sense and don't require 20 hours to learn/navigate, and then just the slew of bugs in Gemini CLI that are basic usability and model API interaction things. The only differentiator that OpenAI still has is polish.
    Edit: And just to add an example: openAI's Codex CLI billing is easy for me. I just sign up for the base package, and then add extra credits which I automatically use once I'm through my weekly allowance. With Gemini CLI I'm using my oauth account, and then having to rotate API keys once I've used that up.
    Also, Gemini CLI loves spewing out its own chain of thought when it gets into a weird state.
    Also Gemini CLI has an insane bias to action that is almost insurmountable. DO NOT START THE NEXT STAGE still has it starting the next stage.
    Also Gemini CLI has been terrible at visibility on what it's actually doing at each step - although that seems a bit improved with this new model today.
    [-]
    - mips_avatar 3 hours ago
      I'd be curious how many people use openrouter byok just to avoid figuring out the cloud consoles for gcp/azure.
      [-]
      - visarga 1 hour ago
        I do. Gave up using Gemini directly.
        [-]
        mips_avatar 49 minutes ago
        I mean I do too, had a really odd Gemini bug until I did byok on openrouter
      - mmaunder 3 hours ago
        Agreed. It's ridiculous.
- GenerWork 3 hours ago
  I'm actually liking 5.2 in Codex. It's able to take my instructions, do a good job at planning out the implementation, and will ask me relevant questions around interactions and functionality. It also gives me more tokens than Claude for the same price. Now, I'm trying to white label something that I made in Figma so my use case is a lot different from the average person on this site, but so far it's my go to and I don't see any reason at this time to switch.
  [-]
  - gpt5 3 hours ago
    I've noticed when it comes to evaluating AI models, most people simply don't ask difficult enough questions. So everything is good enough, and the preference comes down to speed and style.
    It's when it becomes difficult, like in the coding case that you mentioned, that we can see the OpenAI still has the lead. The same is true for the image model, prompt adherence is significantly better than Nano Banana. Especially at more complex queries.
    [-]
    - GenerWork 49 minutes ago
      I'd argue that 5.2 just barely squeaks past Sonnet 4.5 at this point. Before this was released, 4.5 absolutely beat Codex 5.1 Medium and could pretty much oneshot UI items as long as I didn't try to create too many new things at once.
    - fellowniusmonk 2 hours ago
      I have a very complex set of logic puzzles I run through my own tests.
      My logic test and trying to get an agent to develop a certain type of ** implementation (that is published and thus the model is trained on to some limited extent) really stress test models, 5.2 is a complete failure of overfitting.
      Really really bad in an unrecoverable infinite loop way.
      It helps when you have existing working code that you know a model can't be trained on.
      It doesn't actually evaluate the working code it just assumes it's wrong and starts trying to re-write it as a different type of **.
      Even linking it to the explanation and the git repo of the reference implementation it still persists in trying to force a different **.
      This is the worst model since pre o3. Just terrible.
- int32_64 3 hours ago
  Is there a "good enough" endgame for LLMs and AI where benchmarks stop mattering because end users don't notice or care? In such a scenario brand would matter more than the best tech, and OpenAI is way out in front in brand recognition.
  [-]
  - xbmcuser 2 hours ago
    Google biggest advantage over time will be costs. They have their own hardware which they can and will optimise for their LLMS. And Google has experience of getting market share over time by giving better results, performance or space. ie gmail vs hotmail/yahoo. Chrome vs IE/Firefox. So don't discount them if the quality is better they will get ahead over time.
  - crazygringo 3 hours ago
    For average consumers, I think very much yes, and this is where OpenAI's brand recognition shines.
    But for anyone using LLM's to help speed up academic literature reviews where every detail matters, or coding where every detail matters, or anything technical where every detail matters -- the differences very much matter. And benchmarks serve just to confirm your personal experience anyways, as the differences between models becomes extremely apparent when you're working in a niche sub-subfield and one model is showing glaring informational or logical errors and another mostly gets it right.
    And then there's a strong possibility that as experts start to say "I always trust <LLM name> more", that halo effect spreads to ordinary consumers who can't tell the difference themselves but want to make sure they use "the best" -- at least for their homework. (For their AI boyfriends and girlfriends, other metrics are probably at play...)
    [-]
    - smashed 3 hours ago
      I haven't seen any LLM tech shine "where every detail matters".
      In fact so far, they consistently fail in exactly these scenario, glossing over random important details whenever you double check results in depth.
      You might have found models, prompts or workflows that work for you though, I'm interested.
    - bitpush 3 hours ago
      > OpenAI's brand recognition shines.
      We've seen this movie before. Snapchat was the darling. Infact, it invented the entire category and was dominating the format for years. Then it ran out of time.
      Now very few people use Snapchat, and it has been reduced to a footnote in history.
      If you think I'm exaggerating, that just proves my point.
      [-]
      - decimalenough 3 hours ago
        Not a great example: Snapchat made it through the slump, successfully captured the next generation of teenagers, and now has around 500M DAUs.
  - rfw300 3 hours ago
    That might be true for a narrow definition of chatbots, but they aren't going to survive on name recognition if their models are inferior in the medium term. Right now, "agents" are only really useful for coding, but when they start to be adopted for more mainstream tasks, people will migrate to the tools that actually work first.
  - fullstick 2 hours ago
    I doubt anyone I know who is using llms outside of work knows that there are benchmark tests for these models.
  - holler 3 hours ago
    this. I don't know any non-tech people who use anything other than chatgpt. On a similar note, I've wondered why Amazon doesn't make a chatgpt-like app with their latest Alexa+ makeover, seems like a missed opportunity. The Alexa app has a feature to talk to the LLM in chat mode, but the overall app is geared towards managing devices.
    [-]
    - macNchz 3 hours ago
      Google has great distribution to be able to just put Gemini in front of people who are already using their many other popular services. ChatGPT definitely came out of the gate with a big lead on name recognition, but I have been surprised to hear various non-techy friends talking about using Gemini recently, I think for many of them just because they have access at work through their Workspace accounts.
    - Obertr 3 hours ago
      Most of Europe if full of Gemini ads, my parents use Gemini because it is free and it popped up in YouTube ad before the video
      Just go outside the bubble plus take a bit older people
    - nimchimpsky 3 hours ago
      [dead]
  - jay_kyburz 3 hours ago
    This is why both google and microsoft are pushing Gemini and Copilot in everyone's face.
- dieortin 4 hours ago
  Is there anything pointing to Brin having anything to do with Google’s turnaround in AI? I hear a lot of people saying this, but no one explaining why they do
  [-]
  - novok 3 hours ago
    In organizations, everyone's existence and position is politically supported by their internal peers around their level. Even google's & microsoft's current CEOs are supported by their group of co-executives and other key players. The fact that both have agreeable personalities is not a mistake! They both need to keep that balance to stay in power, and that means not destroying or disrupting your peer's current positions. Everything is effectively decided by informal committee.
    Founders are special, because they are not beholden to this social support network to stay in power and founders have a mythos that socially supports their actions beyond their pure power position. The only others they are beholden too are their co-founders, and in some cases major investor groups. This gives them the ability to disregard this social balance because they are not dependent on it to stay on power. Their power source is external to the organization, while everyone else is internal to it.
    This gives them a very special "do something" ability that nobody else has. It can lead to failures (zuck & occulus, snapchat spectacles) or successes (steve jobs, gemini AI), but either way, it allows them to actually "do something".
    [-]
    - JumpCrisscross 2 hours ago
      > Founders are special, because they are not beholden to this social support network to stay in power
      Of course they are. Founders get fired all the time. As often as non-founder CEOs purge competition from their peers.
      > The only others they are beholden too are their co-founders, and in some cases major investor groups
      This describes very few successful executives. You can have your co-founders and investors on board, if your talent and customers hate you, they’ll fuck off.
  - ryoshu 3 hours ago
    If he's having an impact it's because he can break through the bureaucracy. He's not trying to protect a fiefdom.
  - HarHarVeryFunny 2 hours ago
    I would say it more goes back to the Google Brain + DeepMind merger, creating Google DeepMind headed by Demis Hassabis.
    The merger happened in April 2023.
    Gemini 1.0 was released in Dec 2023, and the progress since then has been rapid and impressive.
- raincole 3 hours ago
  That's a quite sensationalized view.
  Ghibli moment was only about half a year ago. At that moment, OpenAI was so far ahead in terms of image editing. Now it's behind for a few months and "it can't be reversed"?
  [-]
  - Obertr 3 hours ago
    Check the size and budget of Google iniatives. It’s unlimited
  - BoredPositron 3 hours ago
    The Ghibli moment was an influencer fad not real advancement.
- baq 3 hours ago
  GPT 5.2 is actually getting me better outputs than Opus 4.5 on very complex reviews (on high, I never use less) - but the speed makes Opus the default for 95% of use cases.
- JumpCrisscross 3 hours ago
  > I start to believe OAI is very much behind
  Kara Swisher recently compared OpenAI to Netscape.
- louiereederson 3 hours ago
  i think the most important part of google vs openai is slowing usage of consumer LLMs. people focus on gemini's growth, but overall LLM MAUs and time spent is stabilizing. in aggregate it looks like a complete s-curve. you can kind of see it in the table in the link below but more obvious when you have the sensortower data for both MAUs and time spent.
  the reason this matters is slowing velocity raises the risk of featurization, which undermines LLMs as a category in consumer. cost efficiency of the flash models reinforces this as google can embed LLM functionality into search (noting search-like is probably 50% of chatgpt usage per their july user study). i think model capability was saturated for the average consumer use case months ago, if not longer, so distribution is really what matters, and search dwarfs LLMs in this respect.
  https://techcrunch.com/2025/12/05/chatgpts-user-growth-has-s...
- encroach 3 hours ago
  OAI's latest image model outperforms Google's in LMArena in both image generation and image editing. So even though some people may prefer nano banana pro in their own anecdotal tests, the average person prefers GPT image 1.5 in blind evaluations.
  https://lmarena.ai/leaderboard/text-to-image
  https://lmarena.ai/leaderboard/image-edit
  [-]
  - Obertr 3 hours ago
    Add This to Gemini distribution which is being adcertised by Google in all of their products, and average Joe will pick the sneakers at the shelf near the checkout rather than healthier option in the back
    [-]
    - gdhkgdhkvff 3 hours ago
      Those darn sneakers are just too delicious!
    - encroach 3 hours ago
      That's not how the arena works. The evaluation is blind so Google's advertising/integration has no effect on the results.
      [-]
      - Obertr 3 hours ago
        3 points, sure
        [-]
        encroach 3 hours ago
        Right, it only scores 3 points higher on image edit, which is within the margin of error. But on image generation, it scores a significant 29 points higher.
    - raincole 3 hours ago
      ...and what does this have to do with the comment you replied to? Did you reply to the wrong person or you were just stating unrelated factoids?
- yieldcrv 3 hours ago
  the trend I've seen is that none of these companies are behind in concept and theory, they are just spending longer intervals baking a more superior foundational model
  so they get lapped a few times and then drop a fantastic new model out of nowhere
  the same is going to happen to Google again, Anthropic again, OpenAI again, Meta again, etc
  they're all shuffling the same talent around, its California, that's how it goes, the companies have the same institutional knowledge - at least regarding their consumer facing options
- random9749832 4 hours ago
  This is obviously trained on Pro 3 outputs for benchmaxxing.
  [-]
  - CuriouslyC 3 hours ago
    Not trained on pro, distilled from it.
    [-]
    - viraptor 2 hours ago
      What do you think distilled means...?
  - NitpickLawyer 3 hours ago
    > for benchmaxxing.
    Out of all the big4 labs, google is the last I'd suspect of benchmaxxing. Their models have generally underbenched and overdelivered in real world tasks, for me, ever since 2.5 pro came out.
- nightski 3 hours ago
  Google has incredible tech. The problem is and always has been their products. Not only are they generally designed to be anti-consumer, but they go out of their way to make it as hard as possible. The debacle with Antigravity exfiltrating data is just one of countless.
  [-]
  - novok 3 hours ago
    The Antigravity case feels like a pure bug and them rushing to market. They had a bunch of other bugs showing that. That is not anti-consumer or making it difficult.
    [-]
alooPotato 50 minutes ago
I have a latency sensitive application - anyone know if any tools that let you compare time to first token and total latency for a bunch of models at once given a prompt. Ideally, run close to the DCs that serve the various models so we can take out network latency from the benchmark.
whinvik 5 hours ago
Ok, I was a bit addicted to Opus 4.5 and was starting to feel like there's nothing like it.
Turns out Gemini 3 Flash is pretty close. The Gemini CLI is not as good but the model more than makes up for it.
The weird part is Gemini 3 Pro is nowhere as good an experience. Maybe because its just so slow.
[-]
- scrollop 2 hours ago
  Yes! Gemini 3 pro is significantly slower than opus (surprisingly) , and prefer opus' output.
  Might be using flash for my MCP research/transcriber/minor tasks modl over haiku, now, though (will test of course)
- __jl__ 4 hours ago
  I will have to try that. Cursor bill got pretty high with Opus 4.5. Never considered opus before the 4.5 price drop but now it's hard to change... :)
  [-]
  - diamondfist25 3 hours ago
    $100 Claude max is the best subscription I’ve ever had.
    Well worth every penny now
acheong08 5 hours ago
Thinking along the line of speed, I wonder if a model that can reason and use tools at 60fps would be able to control a robot with raw instructions and perform skilled physical work currently limited by the text-only output of LLMs. Also helps that the Gemini series is really good at multimodal processing with images and audio. Maybe they can also encode sensory inputs in a similar way.
Pipe dream right now, but 50 years later? Maybe
[-]
- incognito124 5 hours ago
  Believe it or not, there's Gemini Robotics, which seems to be exactly what you're talking about:
  https://deepmind.google/models/gemini-robotics/
  Previous discussions: https://news.ycombinator.com/item?id=43344082
- iamgopal 5 hours ago
  Much sooner, hardware, power, software, even AI model design, inference hardware, cache, everything being improved , it's exponential.
mmaunder 26 minutes ago
Used the hell out of Gemini 3 Flash with some 3 Pro thrown in for the past 3 hours on CUDA/Rust/FFT code that is performance critical, and now have a gemini flavored cocaine hangover and have gone crawling back to Codex GPT 5.2 xhigh and am making slower progress but with higher quality code.
Firstly, 3 Flash is wicked fast and seems to be very smart for a low latency model, and it's a rush just watching it work. Much like the YOLO mode that exists in Gemini CLI, Flash 3 seems to YOLO into solutions without fully understanding all the angles e.g. why something was intentionally designed in a way that at first glance may look wrong, but ended up this way through hard won experience. Codex gpt 5.2 xhigh on the other hand does consider more angles.
It's a hard come-down off the high of using it for the first time because I really really really want these models to go that fast, and to have that much context window. But it ain't there. And turns out for my purposes the longer chain of thought that codex gpt 5.2 xhigh seems to engage in is a more effective approach in terms of outcomes.
And I hate that reality because having to break a lift into 9 stages instead of just doing it in a single wicked fast run is just not as much fun!
zone411 1 hour ago
Scores 92.0 on my Extended NYT Connections benchmark (https://github.com/lechmazur/nyt-connections/). Gemini 2.5 Flash scored 25.2, and Gemini 3 Pro scored 96.8.
Topfi 28 minutes ago
By existing as part of Google results, AI Search makes them the least reliable search engine of all. Just to show an example I have searched for organically today with Kagi that I tried with Google for a quick real world test, looking for the exact 0-100kph times of the Honda Pan European ST1100, I got a result of 12-13 seconds, which isn't even in the correct stratosphere (roughly around 4sec), nor anywhere in the linked sources the model claims to rely on: https://share.google/aimode/Ui8yap74zlHzmBL5W
No matter the model, AI Overview/Results in Google are just hallucinated nonsense, only providing roughly equivalent information to what is in the linked sources as a coincidence, rather than due to actually relying on them.
Whether DuckDuckGo, Kagi, Ecosia or anything else, they are all objectively and verifiably better search engines than Google as of today.
This isn't new either, nor has it gotten better. AI Overview has been and continues to be a mess that makes it very clear to me anyone claiming Google is still the "best" search engine results wise is lying to themselves. Anyone saying Google search in 2025 is good or even usable is objectively and verifiably wrong and claiming DDG or Kagi offer less usable results is equally unfounded.
Either fix your models finally so they adhere to and properly quote sources like your competitors somehow manage or, preferably, stop forcing this into search.
bearjaws 5 hours ago
I've been using the preview flash model exclusively since it came out, the speed and quality of response is all I need at the moment. Although still using Claude Code w/ Opus 4.5 for dev work.
Google keeps their models very "fresh" and I tend to get more correct answers when asking about Azure or O365 issues, ironically copilot will talk about now deleted or deprecated features more often.
[-]
- sv123 5 hours ago
  I've found copilot within the Azure portal to be basically useless for solving most problems.
  [-]
  - djeastm 4 hours ago
    Me too. I don't understand why companies think we devs need a custom chat on their website when we all have access to a chat with much smarter models open in a different tab.
    [-]
    - golem14 2 hours ago
      That's not what they are thinking. They are thinking: "We want to capture the dev and make them use our model – since it is easier to use it in our tab, it can afford to be inferior. This way we get lots of tasty, tasty user data."
jug 5 hours ago
Looks like a good workhorse model, like I felt 2.5 Flash also was at its time of launch. I hope I can build confidence with it because it'll be good to offload Pro costs/limits as well of course always nice with speed for more basic coding or queries. I'm impressed and curious about the recent extreme gains on ARC-AGI-2 from 3 Pro, GPT-5.1 and now even 3 Flash.
i_love_retros 36 minutes ago
I'll take the hit to my 401k for this to all just go away. The comments here sound ridiculous.
xnx 5 hours ago
OpenAI is pretty firmly in the rear-view mirror now.
[-]
- walthamstow 5 hours ago
  Google Antigravity is a buggy mess at the moment, but I believe it will eventually eat Cursor as well. The £20/mo tier currentluy has the highest usage limits on the market, including Google models and Sonnet and Opus 4.5.
  [-]
  - tempaccount420 48 minutes ago
    It's not in Google's style, but they need a codex-like fine-tune. I don't think they have ever released fine-tunes like that though.
    The model is very hard to work with as is.
k8sToGo 4 hours ago
I remember the preview price for 2.5 flash was much cheaper. And then it got quite expensive when it went out of preview. I hope the same won't happen.
[-]
- Tiberium 2 hours ago
  For 2.5 Flash Preview the price was specifically much cheaper for the no-reasoning mode, in this case the model reasons by default so I don't think they'll increase the price even further.
Fiveplus 5 hours ago
It is interesting to see the "DeepMind" branding completely vanish from the post. This feels like the final consolidation of the Google Brain merger. The technical report mentions a new "MoE-lite" architecture. Does anyone have details on the parameter count? If this is under 20B params active, the distillation techniques they are using are lightyears ahead of everyone else.
dandiep 3 hours ago
For someone looking to switch over to Gemini from OpenAI, are there any gotchas one should be aware of? E.g. I heard some mention of API limits and approvals? Or in terms of prompt writing? What advice do people have?
[-]
- scrollop 2 hours ago
  https://epoch.ai/benchmarks/simplebench
  Just do it.
  I use a service where I have access to all SOTA models and many open sourced models, so I change models within chats, using MCPs eg start a chat with opus making a search with perplexity and grok deepsearch MCPs and google search, next query is with gpt 5 thinking Xhigh, next one with gemini 3 pro, all in the same conversation. It's fantastic! I can't imagine what it would be like again to be locked into using one (or two) companies. I have nothing to do with the guys who run it (the hosts from the podcast This day in AI, though if you're interested have a look in the simtheory.ai discord.
  I don't know how people use one service can manage...
  [-]
  - dandiep 2 hours ago
    99% of what I do is fine-tuned models, so there is a certain level of commitment I have to make around training and time to switch.
alach11 4 hours ago
I really wish these models were available via AWS or Azure. I understand strategically that this might not make sense for Google, but at a non-software-focused F500 company it would sure make it a lot easier to use Gemini.
[-]
- lbhdc 4 hours ago
  I feel like that is part of their cloud strategy. If your company wants to pump a huge amount of data through one of these you will pay a premium in network costs. Their sales people will use that as a lever for why you should migrate some or all of your fleet to their cloud.
  [-]
  - jiggawatts 2 hours ago
    A few gigabytes of text is practically free to transfer even over the most exorbitant egress fee networks, but would cost “get finance approval” amounts of money to process even through a cheaper model.
SubiculumCode 3 hours ago
In Gemini Pro interface, I now have Fast, Thinking, and Pro options. I was a bit confused by that, but did find this: https://discuss.ai.google.dev/t/new-model-levels-fast-thinki...
jtrn 5 hours ago
This is the first flash/mini model that doesn't make a complete ass of itself when I prompt for the following: "Tell me as much as possible about Skatval in Norway. Not general information. Only what is uniquely true for Skatval."
Skatval is a small local area I live in, so I know when it's bullshitting. Usually, I get a long-winded answer that is PURE Barnum-statement, like "Skatval is a rural area known for its beautiful fields and mountains" and bla bla bla.
Even with minimal thinking (it seems to do none), it gives an extremely good answer. I am really happy about this.
I also noticed it had VERY good scores on tool-use, terminal, and agentic stuff. If that is TRUE, it might be awesome for coding.
I'm tentatively optimistic about this.
[-]
- amunozo 5 hours ago
  I tried the same with my father's little village (Zarza Capilla, in Spain), and it gave a surprisingly good answer in a couple of seconds. Amazing.
- peterldowns 3 hours ago
  That's a really cool prompt idea, I just tried it with my neighborhood and it nailed it. Very impressive.
- kingstnap 4 hours ago
  You are effectively describing SimpleQA but with a single question instead of a comprehensive benchmark and you can note the dramatic increase in performance there.
speedgoose 5 hours ago
I’m wondering why Claude Opus 4.5 is missing from the benchmarks table.
[-]
- anonym29 5 hours ago
  I wondered this, too. I think the emphasis here was on the faster / lower costs models, but that would suggest that Haiku 4.5 should be the Anthropic entry on the table instead. They also did not use the most powerful xAI model either, instead opting for the fast one. Regardless, this new Gemini 3 Flash model is good enough that Anthropic should be feeling pressure on both price and model output quality simultaneously regardless of which Anthropic model is being compared against, which is ultimately good for the consumer at the end of the day.
doomerhunter 5 hours ago
Pretty stoked for this model. Building a lot with "mixture of agents" / mix of models and Gemini's smaller models do feel really versatile in my opinion.
Hoping that the local ones keep progressively up (gemma-line)
bennydog224 5 hours ago
From the article, speed & cost match 2.5 Flash. I'm working on a project where there's a huge gap between 2.5 Flash and 2.5 Flash Lite as far as performance and cost goes.
-> 2.5 Flash Lite is super fast & cheap (~1-1.5s inference), but poor quality responses.
-> 2.5 Flash gives high quality responses, but fairly expensive & slow (5-7s inference)
I really just need an in-between for Flash and Flash Lite for cost and performance. Right now, users have to wait up to 7s for a quality response.
croemer 2 hours ago
It's fast and good in Gemini CLI (even though Gemini CLI still lags far behind Claude as a harness).
user_7832 5 hours ago
Two quick questions to Gemini/AI Studio users:
1, has anyone actually found 3 Pro better than 2.5 (on non code tasks)? I struggle to find a difference beyond the quicker reasoning time and fewer tokens.
2, has anyone found any non-thinking models better than 2.5 or 3 Pro? So far I find the thinking ones significantly ahead of non thinking models (of any company for that matter.)
[-]
- Workaccount2 5 hours ago
  Gemini 3 is a step change up against 2.5 for electrical engineering R&D.
- Davidzheng 5 hours ago
  I think it's probably actually better at math. Though still not enough to be useful in my research in a substantial way. Though I suspect this will change suddenly at some point as the models move past a certain threshold (also it is heavily limited by the fact that the models are very bad at not giving wrong proofs/counterexamples) so that even if the models are giving useful rates of successes, the labor to sort through a bunch of trash makes it hard to justify.
- tmaly 5 hours ago
  Not for coding but for the design aspect, 3 outshines 2.5
elvin_d 3 hours ago
Gemini 3 are great models but lacking a few things: - app expirience is atrocious, poor UX all over the place. A few examples: silly jumps when reading the text when the model starting to respond, slide-over view in iPad breaking request while Claude and ChatGPT working fine. - Google offer 2 choices: your data used for whatever they want or if you want privacy, the app expirience going even worse.
hubraumhugo 5 hours ago
You can get your HN profile analyzed and roasted by it. It's pretty funny :) https://hn-wrapped.kadoa.com
[-]
- onraglanroad 4 hours ago
  I didn't feel roasted at all. In fact I feel vindicated! https://hn-wrapped.kadoa.com/onraglanroad
- SubiculumCode 3 hours ago
  dang https://hn-wrapped.kadoa.com/dang
- apparent 3 hours ago
  Pretty fucking hilarious, if completely off-topic.
- WhereIsTheTruth 5 hours ago
  This is exactly why you keep your personal life off the internet
- echelon 5 hours ago
  This is hilarious. The personalized pie charts and XKCD-style comics are great, and the roast-style humor is perfect.
  I do feel like it's not an entirely accurate caricature (recency bias? limited context?), but it's close enough.
  Good work!
  You should do a "show HN" if you're not worried about it costing you too much.
- peheje 5 hours ago
  This is great. I literally "LOL'd".
FergusArgyll 5 hours ago
So much for "Monopolies get lazy, they just rent seek and don't innovate"
[-]
- NitpickLawyer 5 hours ago
  Also so much for the "wall, stagnation, no more data" folks. Womp womp.
- jonathan_h 1 hour ago
  "Monopolies get lazy, they just rent seek and don't innovate"
  I think part of what enables a monopoly is absence of meaningful competition, regardless of how that's achieved -- significant moat, by law or regulation, etc.
  I don't know to what extent Google has been rent-seeking and not innovating, but Google doesn't have the luxury to rent-seek any longer.
- deskamess 4 hours ago
  Monopolies and wanna-be monopolies on the AI-train are running for their lives. They have to innovate to be the last one standing (or second last) - in their mind.
- concinds 5 hours ago
  The LLM market has no moats so no one "feels" like a monopoly, rightfully.
- incrudible 4 hours ago
  LLMs are a big threat to their search engine revenue, so whatever monopoly Google may have had does not exist anymore.
agentifysh 2 hours ago
so hat's why logan posed 3 lightning emojis. at $0.50/M for input and $3.00/M for output, this will put serious pressure on OpenAI and Anthropic now
its almost as good as 5.2 and 4.5 but way faster and cheaper
Workaccount2 5 hours ago
Really hoping this is used for real time chatting and video. The current model is decent, but when doing technical stuff (help me figure out how to assemble this furniture) it falls far short of 3 pro.
poplarsol 5 hours ago
Will be interesting to see what their quota is. Gemini 3.0 Pro only gives you 250 / day until you spam them with enough BS requests to increase your total spend > $250.
Def_Os 3 hours ago
Consolidating their lead. I'm getting really excited about the next Gemma release.
sunaookami 4 hours ago
Sadly not available in the free tier...
[-]
- raybb 42 minutes ago
  And they recently cut 2.5 flash to 20 requests per day and removed 2.5 pro all together.
Tiberium 5 hours ago
Yet again Flash receives a notable price hike: from $0.3/$2.5 for 2.5 Flash to $0.5/$3 (+66.7% input, +20% output) for 3 Flash. Also, as a reminder, 2 Flash used to be $0.1/$0.4.
[-]
- BeetleB 5 hours ago
  Yes, but this Flash is a lot more powerful - beating Gemini 3 Pro on some benchmarks (and pretty close on others).
  I don't view this as a "new Flash" but as "a much cheaper Gemini 3 Pro/GPT-5.2"
  [-]
  - jexe 4 hours ago
    Right, depends on your use cases. I was looking forward to the model as an upgrade to 2.5 Flash, but when you're processing hundreds of millions of tokens a day (not hard to do if you're dealing in documents or emails with a few users), the economics fall apart.
  - Tiberium 5 hours ago
    I would be less salty if they gave us 3 Flash Lite at same price as 2.5 Flash or cheaper with better capability, but they still focus on the pricier models :(
    [-]
    - zzleeper 5 hours ago
      Same! I want to do some data stuff from documents and 2.0 pricing was amazing, but the constant increases go the wrong way for this task :/
walthamstow 5 hours ago
I'm sure it's good, I thought the last one was too, but it seems like the backdoor way to increase prices is to release a new model
[-]
- jeffbee 5 hours ago
  If the model is better in that it resolves the task with fewer iterations then the i/o token pricing may be a wash or lower.
tanh 5 hours ago
Does this imply we don't need as much compute for models/agents? How can any other AI model compete against that?
timpera 4 hours ago
Looks awesome on paper. However, after trying it on my usual tasks, it is still very bad at using the French language, especially for creative writing. The gap between the Gemini 3 family and GPT-5 or Sonnet 4.5 is important for my usage.
Also, I hate that I cannot send the Google models in a "Thinking" mode like in ChatGPT. When I send GPT 5.1 Thinking on a legal task and tell it to check and cite all sources, it takes +10 minutes to answer, but it did check everything and cite all its sources in the text; whereas the Gemini models, even 3 Pro, always answer after a few seconds and never cite their sources, making it impossible to click to check the answer. It makes the whole model unusable for these tasks. (I have the $20 subscription for both)
[-]
- happyopossum 3 hours ago
  > whereas the Gemini models, even 3 Pro, always answer after a few seconds and never cite their sources
  Definitely has not been my experience using 3 Pro in Gemini Enterprise - in fact just yesterday it took so long to do a similar task I’d thought something was broken. Nope, just re-chrcking a source
  [-]
  - timpera 3 hours ago
    Does Gemini Enterprise have more features?
    Just tried once again with the exact same prompt: GPT-5.1-Thinking took 12m46s and Gemini 3.0 Pro took about 20 seconds. The latter obviously has a dramatically worse answer as a result.
    (Also, the thinking trace is not in the correct language, and doesn't seem to show which sources have been read at which steps- there is only a "Sources" tab at the end of the answer.)
heliophobicdude 4 hours ago
Any word on if this using their diffusion architecture?
JeremyHerrman 5 hours ago
Disappointed to see continued increased pricing for 3 Flash (up from $0.30/$2.50 to $0.50/$3.00 for 1M input/output tokens).
I'm more excited to see 3 Flash Lite. Gemini 2.5 Flash Lite needs a lot more steering than regular 2.5 Flash, but it is a very capable model and combined with the 50% batch mode discount it is CHEAP ($0.05/$0.20).
[-]
- jeppebemad 5 hours ago
  Have you seen any indications that there will be a Lite version?
  [-]
  - summerlight 4 hours ago
    I guess if they want to eventually deprecate the 2.5 family they will need to provide a substitute. And there are huge demands for cheap models.
nickvec 5 hours ago
So is Gemini 3 Fast the same as Gemini 3 Flash?
[-]
- evandena 57 minutes ago
  yes
prompt_god 2 hours ago
it's better than Pro in a few evals. anyone who used, how is it for coding?
retinaros 2 hours ago
i might have missed the bandwagon on gemini but I never found the models to be reliable. now it seems they rank first in some hallucinations bench?
I just always thought the taste of gpt or claude models was more interesting in the professional context and their end user chat experience more polished.
are there obvious enterprise use cases where gemini models shine?
GaggiX 5 hours ago
They went too far, now the Flash model is competing with their Pro version. Better SWE-bench, better ARC-AGI 2 than 3.0 Pro. I imagine they are going to improve 3.0 Pro before it's no more in Preview.
Also I don't see it written in the blog post but Flash supports more granular settings for reasoning: minimal, low, medium, high (like openai models), while pro is only low and high.
[-]
- minimaxir 5 hours ago
  "minimal" is a bit weird.
  > Matches the “no thinking” setting for most queries. The model may think very minimally for complex coding tasks. Minimizes latency for chat or high throughput applications.
  I'd prefer a hard "no thinking" rule than what this is.
  [-]
  - GaggiX 5 hours ago
    It still supports the legacy mode of setting the budget, you can set it to 0 and it would be equivalent to none reasoning effort like gpt 5.1/5.2
    [-]
    - minimaxir 3 hours ago
      I can confirm this is the case via the API, but annoyingly AI Studio doesn't let you do so.
- skerit 5 hours ago
  > They went too far, now the Flash model is competing with their Pro version
  Wasn't this the case with the 2.5 Flash models too? I remember being very confused at that time.
  [-]
  - JohnnyMarcone 4 hours ago
    This is similar to how Anthropic has treated sonnet/opus as well. At least pre opus 4.5.
    To me it seems like the big model has been "look what we can do", and the smaller model is "actually use this one though".
- jug 5 hours ago
  I'm not sure how I'm going to live with this!
jijji 4 hours ago
I tried Gemini CLI the other day, typed in two one line requests, then it responded that it would not go further because I ran out of tokens. I've hard other people complaint that it will re-write your entire codebase from scratch and you should make backups before even starting any code-based work with the Gemini CLI. I understand they are trying to compete against Claude Code, but this is not ready for prime time IMHO.
jdthedisciple 2 hours ago
To those saying "OpenAI is toast"
ChatGPT still has 81% market share as of this very moment, vs Gemini's ~2%, and arguably still provides the best UX and branding.
Everyone and their grandma knows "ChatGPT", who outside developers' bubble has even heard of Gemini Flash?
Yea I don't think that dynamic is switching any time soon.
[-]
- scrollop 2 hours ago
  Says the CEO of MySpace.
- riku_iki 2 hours ago
  > ChatGPT still has 81% market share as of this very moment, vs Gemini's ~2%
  where did you get this from?
anonym29 5 hours ago
I never have, do not, and conceivably never will use gemini models, or any other models that require me to perform inference on Alphabet/Google's servers (i.e. gemma models I can run locally or on other providers are fine), but kudos to the team over there for the work here, this does look really impressive. This kind of competition is good for everyone, even people like me who will probably never touch any gemini model.
[-]
- oklahomasports 3 hours ago
  You don’t want Google to know that you are searching for like advice on how much a 61 yr old can contribute to a 401k. What are you hiding?
  [-]
  - anonym29 3 hours ago
    Why do you close the bathroom stall door in public?
    You're not doing anything wrong. Everyone knows what you're doing. You have no secrets to hide.
    Yet you value your privacy anyway. Why?
    Also - I have no problem using Anthropic's cloud-hosted services. Being opposed to some cloud providers doesn't mean I'm opposed to all cloud providers.
    [-]
    - happyopossum 3 hours ago
      > I have no problem using Anthropic's cloud-hosted services
      Anthropic - one of GCP’s largest TPU customers? Good for you.
      https://www.anthropic.com/news/expanding-our-use-of-google-c...
andrepd 5 hours ago
Is there a way to try this without a Google account?
[-]
- mschulkind 5 hours ago
  Just use openrouter or a similar aggregator.
moralestapia 5 hours ago
Not only it is fast, it is also quite cheap, nice!
i_love_retros 40 minutes ago
Oh wow another LLM update!
inquirerGeneral 5 hours ago
[dead]
Lucasjohntee 4 hours ago
[dead]
imvetri 5 hours ago
this is why samsung is stopping production in flash
[-]
- Tepix 5 hours ago
  This is why they stopped The Flash after season 9 in 2023.