ChatGPT Images 2.0

(openai.com)

1049 points | by wahnfrieden 25 days ago

146 comments

minimaxir 25 days ago
So during my Nano Banana Pro experiments I wrote a very fun prompt that tests the ability for these image generation models to follow heuristics, but still requires domain knowledge and/or use of the search tool:
```
    Create a 8x8 contiguous grid of the Pokémon whose National Pokédex numbers correspond to the first 64 prime numbers. Include a black border between the subimages.

    You MUST obey ALL the FOLLOWING rules for these subimages:
    - Add a label anchored to the top left corner of the subimage with the Pokémon's National Pokédex number.
      - NEVER include a `#` in the label
      - This text is left-justified, white color, and Menlo font typeface
      - The label fill color is black
    - If the Pokémon's National Pokédex number is 1 digit, display the Pokémon in a 8-bit style
    - If the Pokémon's National Pokédex number is 2 digits, display the Pokémon in a charcoal drawing style
    - If the Pokémon's National Pokédex number is 3 digits, display the Pokémon in a Ukiyo-e style
```
The NBP result is here, which got the numbers, corresponding Pokemon, and styles correct, with the main point of contention being that the style application is lazy and that the images may be plagiarized: https://cdn.bsky.app/img/feed_fullsize/plain/did:plc:oxaerni...
Running that same prompt through gpt-2-image high gave an...interesting contrast: https://cdn.bsky.app/img/feed_fullsize/plain/did:plc:oxaerni...
It did more inventive styles for the images that appear to be original, but:
- The style logic is by row, not raw numbers and are therefore wrong
- Several of the Pokemon are flat-out wrong
- Number font is wrong
- Bottom isn't square for some reason
Odd results.
[-]
- MrManatee 25 days ago
  Prompts like this feel like it's using the wrong abstraction. The "obvious" thing to do with something like this would be to generate some code that generates the image and then run that code.
  Inspired by this, I tried something much simpler. I asked it to draw 12 concentric circles. With three tries it always drew 10 instead. https://chatgpt.com/share/69e87d08-5a14-83eb-9a3b-3a8eb14692...
  [-]
  - LeifCarrotson 24 days ago
    I think prompts like this are where agentic workflows come in to play. If you asked it to do generate the first 64 prime numbers, AI tools could do that. If you asked it to draw a charcoal image of Pokemon 13, it could do that. If you asked it to add a white Menlo 13 on a black background to the top left corner of that image, it could do that. If you asked it to do that 63 more times, it could do those things, and if you asked it to assemble those into a grid, it could.
    It can't get that in a one-shot. Perhaps, though, it could figure out when it needs to break a problem into individual tasks to delegate to itself and assemble them at the end.
    [-]
    - wahnfrieden 24 days ago
      That's what makes it a fair evaluation of its limits
      [-]
      - fennecfoxy 24 days ago
        I mean asking these transformers to do maths has always been the wrong task. It's like we're now considering "it doesn't have x tools built with traditional code built in".
        Though I suppose we're testing their model + agent harness here as well. It really _should_ have all of those tools/reasoning available to accomplish a task like the above without issue.
        [-]
        wahnfrieden 24 days ago
        It's only been the wrong task because they've been deficient at it and expensive to use, so we had workarounds. They are getting better at these tasks and cheaper (sometimes). It's fair to evaluate even if there are more economical and accurate alternatives available.
        [-]
        fjdjshsh 23 days ago
        You can evaluate the limits of a spoon by trying to cut meat with it.
        The point is what are the typical use cases for the tool / what are the agreed upon areas of application?
        Making the LLM do math with large numbers, I would argue, is not in its typical use case, thought it's at the border.
        Asking an image generator model to calculate numbers before running an image sounds definitely NOT like a reasonable use case (do people need it? Will people try using it for this purpose?)
- dvt 25 days ago
  This is an amazing test and it's kinda' funny how terrible gpt-2-image is. I'd take "plagiarized" images (e.g. Google search & copy-paste) any day over how awful the OpenAI result is. Doesn't even seem like they have a sanity checker/post-processing "did I follow the instructions correctly?" step, because the digit-style constraint violation should be easily caught. It's also expensive as shit to just get an image that's essentially unusable.
  [-]
  - the_arun 25 days ago
    This is from Gemini - https://lens.usercontent.google.com/banana?agsi=CmdnbG9iYWw6...
    [-]
    - fblp 25 days ago
      Did it correctly follow the instructions? Don't know my pokemon well enough.
      [-]
      - minimaxir 25 days ago
        Essentially yes (bottom got distorted), but Gemini uses Nano Banana Pro or Nano Banana 2 so it's not a surprising result. The image I linked uses the raw API.
        [-]
        thih9 25 days ago
        Note that the styles are different; there are two digit images rendered in color.
        Color charcoal drawings do exist, but it’s not what’s usually meant by “charcoal drawing”.
      - podgietaru 24 days ago
        Plusul and Minun sit next to each other in the Pokedex, 311 and 312. There's two 307s.
      - tsukurimashou 24 days ago
        > Create a 8x8 contiguous grid
        It failed at the very first instruction
  - anshumankmr 25 days ago
    that is interesting cause I feel gpt-image-1 did have that feature.
    (source: https://chatgpt.com/share/69e83569-b334-8320-9fbf-01404d18df...)
    [-]
    - weird-eye-issue 25 days ago
      You are comparing ChatGPT to a raw image model. These are two completely different things. ChatGPT takes your input, modifies the prompt and then passes it to the image model and then will maybe read the image and provide output. The image model like through the API just takes the prompt verbatim and generates an image.
      [-]
      - minimaxir 25 days ago
        Nano Banana Pro and ChatGPT Images 2.0 also tweak the prompt because they can think.
        [-]
        weird-eye-issue 25 days ago
        Yes exactly, "ChatGPT Images 2.0" is in ChatGPT. That is not a model.
  - hyperadvanced 25 days ago
    I wouldn’t say it’s terrible. I wouldn’t say it’s a huge step forward in terms of quality compared to what I’ve seen before from AI
- podgietaru 24 days ago
  How is it that a model can produce what must be near 1:1 images ripped straight out of Pokemon Fire Red (The first ones) for profit and not be infringing copyright.
  I know that's the game, but it seems CRAZY to me that they can do this.
  [-]
  - dragonwriter 24 days ago
    Training a model on a corpus which includes copyrighted images but which is not focussed primarily or exclusively on applications which violate copyright might be fair use in the US (so far, it seems that way.)
    But that doesn't mean that producing outputs using the model so trained which are based on copyright-protected ones in ways which would violate copyright if produced by any other means doesn't still violate copyright. DMCA safe harbor might apply to the system owner (IIRC, the exact boundaries are fuzzy with UGC generated on the site by the provider’s systems rather than generated elsewhere and posted), so Google may not be liable for the infringement (though if it is actively searching for references online at generation and not relying on what is trained into the model, that would seem to weaken the case for that), but it's still an infringement.
  - minimaxir 24 days ago
    So the sprites aren't what I considered plagiarism since to my surprise they are sufficiently different even though it's a similar design to the FR/LG sprites.
    For Charmeleon, the sprite is closest to the B/W sprite, but not exact: https://bulbapedia.bulbagarden.net/wiki/Charmeleon_(Pokémon)...
    For Squirtle, the sprite is much closer to the FR/LG sprite but still some differences: https://bulbapedia.bulbagarden.net/wiki/Squirtle_(Pokémon)#S...
    The other images, however, crib from official artworks a bit too close for comfort.
    In my original analysis I hypothesized this is due to token scarcity that reduces the ability for the model to be created: I believe that NBP images used 1.5k tokens for that image while the gpt-2-image used 7k tokens, but this is hard to test.
  - dd8601fn 24 days ago
    The funny this is the main complaint I’ve heard so far is that it repeatedly refused to operate on original content… because it might violate copyright.
    [-]
    - minimaxir 24 days ago
      In testing the ChatGPT interface it appears to be looser on copyright than expected given the legal trouble.
    - DrewADesign 24 days ago
      Yeah, the CSAM generated by grok proves the guardrails are only really good for stymieing benign uses.
  - pixel_popping 23 days ago
    Well, Anthropic did download torrents massively right, they clearly knew this was completely illegal and decided to not care.
  - DrewADesign 24 days ago
    It can’t. It violates copyright. The big players are the only ones with the money to pursue these things, but they’re interested in replacing artists with AI trained on their models so they settle and set up some sort of agreement. The little guys have no presidential case law to help them along, and nowhere close to the resources to push it that far, so they get steamrolled. I know artists famous enough for people— even commercial entities — to regularly blatantly rip them off by name with “in the style of” prompts, but there’s no realistic path to pursue it. Fame doesn’t pay legal bills.
    [-]
    - DrewADesign 23 days ago
      Precedential. Stupid autocorrect.
  - Jensson 24 days ago
    Gemini uses google search to find references when making images, so it probably found the pokemon images online to do this.
    > I know that's the game, but it seems CRAZY to me that they can do this.
    Its not crazy that a search can find existing pokemon images. Maybe google should show which images it used as references to be more transparent here.
- AussieWog93 25 days ago
  For what it's worth, NBP made some mistakes too.
  Artistic oddities aside (why are the 8-bit sprites 16-bit, why do the charcoal drawings have colour, why does the art of specifically the Gen 1 Pokemon look so off.), 271 is Lombre, not Lotad.
- Razengan 25 days ago
  Even a few months ago, ChatGPT/Sora's image generation performed better than Gemini/Nano Banana for certain weird prompts:
  Try things like: "A white capybara with black spots, on a tricycle, with 7 tentacles instead of legs, each tentacle is a different color of the rainbow" (paraphrased, not the literal exact prompt I used)
  Gemini just globbed a whole mass of tentacles without any regards to the count
- vincentbuilds 25 days ago
  banana Pro gets the logic and punts on the art; gpt-2-image gets the art and punts on the logic. Feels like instruction-following and creativity sit on opposite ends of the same slider.
  [-]
  - sdwr 24 days ago
    Yeah it's fascinating to have an alternate source for intelligence, feels like a mental Rosetta Stone
  - changyou 24 days ago
    [dead]
  - dieortin 25 days ago
    This feels incredibly AI generated
    [-]
    - doginasuit 24 days ago
      The random accusations of AI generated comments are the most annoying part of the unfolding AI dystopia.
      [-]
      - fennecfoxy 24 days ago
        Dogs don't wear suits! You must be AI too ;)
        [-]
        doginasuit 24 days ago
        Damn, busted!
- rrr_oh_man 25 days ago
  Why would you consider this a good prompt?
  [-]
  - minimaxir 25 days ago
    Because both Nano Banana Pro and ChatGPT Images 2.0 have touted strong reasoning capabilities, and this particular prompt has more objective, easy-to-validate criteria as opposed to the subjective nature of images.
    I have more subjective prompts to test reasoning but they're your-mileage-may-vary (however, gpt-2-image has surprisingly been doing much better on more objective criteria in my test cases)
  - o10449366 25 days ago
    [flagged]
    [-]
    - minimaxir 25 days ago
      "Quirky and obscure" has the functional benefit of ensuring the source question is not in the training data/outside the median user prompt, and therefore making the model less likely to cheat.
      We have enough people complaining about Simon Willison's pelican test.
      [-]
      - o10449366 25 days ago
        When you program, do you consider using your prior knowledge of programming cheating?
    - Bjartr 25 days ago
      What would make the prompt a better actual evaluation in your judgement?
      [-]
      - leptons 25 days ago
        Not focusing on pokemon for a start. Maybe use something more people can recognize and evaluate. I have zero knowledge of pokemon, I see it as a niche thing for ultra-nerdy people, and not something everyone is familiar with. Nothing about that test can be evaluated by anyone but a pokemon expert. Sorry, but pokemon isn't as mainstream as some people might think it is.
        [-]
        Bjartr 24 days ago
        I think you underestimate how popular Pokemon is.
        By most objective measures it's the largest entertainment franchise in all of history.
        Would you also object to any other pop-culture reference for the same reason?
        [-]
        leptons 24 days ago
        >I think you underestimate how popular Pokemon is.
        No, I think you are overestimating how popular pokemon is.
        >By most objective measures it's the largest entertainment franchise in all of history.
        I don't care? Only a small set of pokemon fans would be able to gain anything from this "test".
        >Would you also object to any other pop-culture reference for the same reason?
        Yes.
    - tailscaler2026 25 days ago
      still #opentowork huh
      [-]
      - beepbooptheory 25 days ago
        Where does one even use that hashtag?
        [-]
        minimaxir 25 days ago
        It's a LinkedIn joke.
    - codemog 25 days ago
      Ah yes, also known as C++ enjoyers.
- Palmik 25 days ago
  I do not think this is a good prompt or useful benchmark, but nonetheless, it seems to work better for me: https://chatgpt.com/share/69e88a94-ded8-8395-b5dc-abceb2f44d...
  [-]
  - minimaxir 24 days ago
    Huh, that is indeed better. If ChatGPT Images 2.0/gpt-2-image is more nondeterministic than usual, than that is in itself a useful data point.
    [-]
    - Palmik 24 days ago
      Did you enable thinking for your experiment? Are you sure you were on the 2.0 rather than 1.5 version?
      [-]
      - minimaxir 24 days ago
        That experiment image was directly through the API on high. (no Thinking parameter like the Web UI)
- Melatonic 23 days ago
  But where is the one for Missing No ?!
- pfortuny 25 days ago
  Just try a 23-sided plane convex polygon.
- razorbeamz 25 days ago
  Neither of them drew them in an 8-bit style either. It's way too many colors.
  [-]
  - podgietaru 24 days ago
    They made the same mistake a lot of people do, 8-bit meaning Retro style. But they're from the 16bit(?) GBA games.
  - dodslaser 25 days ago
    Maybe they're so advanced they learned to write to the palette registers mid-scanline.
- heroku 25 days ago
  [dead]
- m3kw9 25 days ago
  Prob a very unscientific way to test an image model. This would me likely because they have the reasoning turned down and let its instant output takeover
  [-]
  - minimaxir 25 days ago
    There's no good scientific way to test a closed-source model with both nondeterministic and subjective output.
    This example image was generated using the API on high, not the low reasoning version. (it is slow and takes 2 minutes lol)
  - crustaceansoup 25 days ago
    If the results are quantifiable/objective and repeatable it's scientific, how is it not scientific?
    The reasoning amount is part of the evaluation isn't it?
  - TeMPOraL 25 days ago
    This is the best kind of science there is: direct, empirical test.
vunderba 25 days ago
OpenAI’s gpt-image-1.5 and Google’s NB2 have been pretty much neck and neck on my comparison site which focuses heavily on prompt adherence, with both hovering around a 70% success rate on the prompts for generative and editing capabilities. With the caveat being that Gemini has always had the edge in terms of visual fidelity.
That being said, gpt-image-1.5 was a big leap in visual quality for OpenAI and eliminated most of the classic issues of its predecessor, including things like the “piss filter.”
I’ll update this comment once I’ve finished running gpt-image-2 through both the generative and editing comparison charts on GenAI Showdown.
Since the advent of NB, I’ve had to ratchet up the difficulty of the prompts especially in the text-to-image section. The best models now score around 70%, successfully completing 11 out of 15 prompts.
For reference, here’s a comparison of ByteDance, Google, and OpenAI on editing performance:
https://genai-showdown.specr.net/image-editing?models=nbp3,s...
And here’s the same comparison for generative performance:
https://genai-showdown.specr.net/?models=s4,nbp3,g15
UPDATES:
gpt-image-2 has already managed to overcome one of the so‑called “model killers” on the test suite: the nine-pointed star.
Results are in for the generative (text to image) capabilities: Gpt-image-2 scored 12 out of 15 on the text-to-image benchmark, edging out the previous best models by a single point. It still fails on the following prompts:
- A photo of a brightly colored coral snake but with the bands of color red, blue, green, purple, and yellow repeated in that exact order.
- A twenty-sided die (D20) with the first twenty prime numbers (2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71) on the faces.
- A flat earth-like planet which resembles a flat disc is overpopulated with people. The people are densely packed together such that they are spilling over the edges of the planet. Cheap "coastal" real estate property available.
All Models:
https://genai-showdown.specr.net
Just Gpt-Image-1.5, Gpt-Image-2, Nano-Banana 2, and Seedream 4.0
https://genai-showdown.specr.net?models=s4,nbp3,g15,g2
[-]
- m_kos 25 days ago
  Very useful website. Would you have insight into what models are best at editing existing images?
  I often have to make very specific edits while keeping the rest of the image intact and haven't yet found a good model. These are typically abstract images for experiments.
  I asked gpt-image-2 to recolor specific scales of your Seedream 4 snake and change the shape of others. It did very poorly.
  [-]
  - vunderba 25 days ago
    OpenAI actually has really good adherence, but occasionally tends to introduce its own almost equivalent of "tone mapping", making hyper-localized edits frustrating.
    I don’t know how much work it is for you, but one thing a lot of people do, myself included, is take the original image, make a change to it using something like NB, then paste that as the topmost layer in something like Krita/Pixelmator. After that, we’ll mask and feather in only the parts we actually want to change. It doesn’t always work if it changes the overall color balance or filters out certain hues, it can be a real pain but it does the job in some cases.
    The Flux models (like Kontext) are actually surprisingly good at making very minimal changes to the rest of the image, but unfortunately their understanding of complex prompts is much weaker than the closed, proprietary models.
    I will say that I’ve found Gemini 3.0 (NB Pro) does a relatively decent job of avoiding unnecessary changes - sometimes exceeding the more recent NB2, and it scored quite well on comparative image-editing benchmarks.
    https://genai-showdown.specr.net/image-editing
    [-]
    - m_kos 25 days ago
      Thanks. I will try this! I need to read up on how to work with vision models for both generation and understanding.
  - throwthrowuknow 23 days ago
    Might be worthwhile to find a provider that lets you use a mask and inpainting
- CamperBob2 25 days ago
  It'd be interesting if you could add HunyuanImage-3 to the competition. It's better than Z-Image at almost everything I've thrown at it.
  It can be (slowly) run at home, but needs 96GB RTX 6000-level hardware so it is not very popular.
  [-]
  - vunderba 25 days ago
    I’ll have to give it another try. Its predecessor, Hunyuan Image 2.0, scored pretty poorly when I tested it last year: 2 out of 15, so it'll be interesting to see how much it has improved.
    Here's ZiT, Gpt-Image-2, and Hunyuan Image 2 for reference:
    https://genai-showdown.specr.net/?models=hy2,g2,zt
    Note: It won't show up in some of the newer image comparisons (Angelic Forge, Flat Earth, etc) because it's been deprecated for a while but in the tests where it was used (Yarrctic Circle, Not the Bees, etc.) it's pretty rough.
    [-]
    - CamperBob2 25 days ago
      It does quite a bit better than 2.0, I think. Or at least it may be stylistically different enough to justify a rematch against the others.
      Ring toss: https://i.imgur.com/Zs6UNKj.png (arguably a pass)
      9-pointed star: https://i.imgur.com/SpcSsSv.png (star is well-formed but only has 6 points)
      Mermaid: https://i.imgur.com/R6MbMPX.png (fail, and I can't get Imgur to host it for some reason even though it's SFW)
      Octopus: https://i.imgur.com/JTVH7xy.png (good try, almost a pass, but socks don't cover the ends of all the tentacles)
      Above are one-shot attempts with seed 42.
      [-]
      - vunderba 25 days ago
        > https://i.imgur.com/6NXpI2q.png
        You're killing me Smalls. This one is a 404. I'm really curious what it actually showed.
        That ring toss is definitely leagues better than its predecessor. I’m not going to fault it too much for the star though, that one is an absolute slate wiper. The only locally hostable model that ever managed it for me was the original Flux, and I’m still not entirely convinced it wasn’t a fluke. Despite getting twice as many attempts, Flux 2, a much larger model, couldn’t even pull it off.
        [-]
        CamperBob2 25 days ago
        Yeah, I suspect you'd see some solid passing scores if you ran it as many times as some of the others.
        For the mermaid, https://i.imgur.com/R6MbMPX.png sometimes seems to work but not consistently. It is probably triggering a porn filter of some kind. I need to find another free image host, as imgur has definitely jumped the shark.
        The image shows a mermaid of evident Asian extraction lying on a beach, face down. There is a dolphin lying on top of her, positioned at a 90-degree angle. It doesn't show any interaction at all, so a definite fail.
        [-]
        vunderba 25 days ago
        I still use Imgur from time to time just because it’s convenient, but I’ve been meaning to build an Imgur-style extension for my site for a while, something that would let me drag and drop media for quick sharing but it being Astro-based (static site generation) makes it tricky.
- LiteSoul 24 days ago
  Great website, 2 things:
  1 - Gpt-image-2 seems to pass the Flat Earth test? (if not, I'm sure the paid thinking 2k version passes it).
  2 - Since NB2 was earlier, many gold medals are assigned to it, even though now GI2 passes them too, example the Octopus test NB2 14 attempts but GI2 just 2 (BTW number of attempts should affect the score I guess?)
  [-]
  - vunderba 24 days ago
    So if you zoom in (click the zoom button on the actual gpt-image-2 of the flat Earth), you’ll see that a lot of the people are anatomical impossibilities, which is one of the disallowed criteria on the list. The faces also look like melted candles.
    This is one of those areas where even state-of-the-art models still struggle. You’re asking for a high level of detail at a per-person level, which means you end up with lots and lots of very small objects that all need to be rendered with convincing detail.
    I should probably explain the scoring rubric better - it's in the (i) info icon. If you click the pass/fail button towards the top, it switches from a simple pass/fail view to a weighted score. That weighted score is based on three things: level of adherence to the prompt, visual fidelity, and the number of attempts.
    I've tried to keep my criteria as objective as possible, but there's just a certain level of unavoidable subjectivity to it.
    For example, with the octopus image: Even though the minimum criteria might be five tentacles covered, having all eight is much closer to the ideal of “an octopus,” so it usually gets bumped up to a higher rating (bronze, silver, gold).
    Honestly, I think I agree that the gpt-image-2 probably should be upgraded to a gold medal. Thanks for pointing that out!
- valarauko 24 days ago
  That's lovely. My own personal benchmark has been to ask the various models to generate a functional pair of novelty New Year's Eve glasses on a person, that don't just plonk the year onto the top of regular frames.
  [-]
  - vunderba 24 days ago
    Thanks. That's a good one~ Lens type stuff that involves reflections/refraction is a neat challenge for generative models. I did some editing tests that involved replacing an apartment window with a mirror back when Nano-Banana Pro was released and was rather stunned by the results.
    https://mordenstar.com/blog/edits-with-nanobanana/#through-t...
    [-]
    - valarauko 22 days ago
      That's great, though I wasn't even thinking at the scale of reflection or refraction. My test was if the image generators could come up with a novelty pair of glasses that incorporate the year digits into the shape of the frame itself with some whimsy, rather than just plop the numbers on top of regular boring frames. So something like [this](https://p.kagi.com/proxy/oardefault.jpg?c=-4THVYblKrsgkzFTNE...) rather than [this](https://p.kagi.com/proxy/2026-Glasses-4-Color-New-Year-Glass...). A lot of the initial designs just incorporated the numbers into the frames, with no consideration for relative placement to the eyes, completely obscuring vision. Additional prompting might lead to cutouts for the eyes, but that was unsatisfying. At least as of this past new year's eve, I couldn't get any of the image generators to give me something even passible. Images 2.0 also couldn't give anything acceptable till I gave it some examples.
      [-]
      - vunderba 22 days ago
        Oh, I see what you’re saying. I like these types of tests where you incorporate well-known objects from the training data into unusual geometries.
        Kind of makes me want to take advantage of the multi-image editing capability, since you can use gpt-image-2 with multiple images.
        Take a photo of an existing pair of glasses frames (maybe even snapped at an optometrist’s office) then take a picture of an animal, like a spider with an unusual number of eyes, or something like a flounder, where the eyes eventually migrate to the top of its body.
        Then you could see if the system can realistically adapt the design and show how those glasses might look if they were redesigned for these unusual optical situations.
        [-]
        valarauko 22 days ago
        Flounder might even work, since my initial complaints that the generated designs obscured the wearer's eyesight were met with solutions that just moved the offending eye to the side of the person's head :)
- gusmally 24 days ago
  Such a fun site, thank you! I was surprised that Seedream4 passed the mermaid test since it's hard to tell whether they are in the water or submerged, and the mermaid has something funny going on with her left hand.
  [-]
  - vunderba 24 days ago
    Yeah seedream's attempt does have a bit of an uncanny valley effect: the mermaid/dolphin are only partially submerged, but there’s water above them with sunlight reflecting on the surface, and the mermaid’s hand looks disconnected from the angle of her arm.
    That’s why I gave it a bronze. To me, it falls into that “barely passing” category, similar to Gemini 2.5 Flash Image on that test. Seedream also took a major hit to its weighted score because of how many attempts it took to get something even remotely passable out of it.
    Thanks for the feedback!
- VladVladikoff 25 days ago
  Why does Gemini 3.1 get a pass for the same reasons they got image 2 gets a fail on the flat earth one? Gemini has all sorts of random body parts and limbs etc.
  [-]
  - vunderba 25 days ago
    That's a mistake~ None of the models successfully passed the Flat Earth composition test. I've updated the passing criteria to be more explicit as well. Thanks for catching that!
- what 25 days ago
  Where can I see the actual prompts and follow ups you fed each model?
  [-]
  - vunderba 25 days ago
    So the prompts are tuned and adjusted on a per-model basis. If you look at the number of attempts, each receives a specific prompt variation depending on the model. This honestly isn't as much of an issue these days because SOTA models natural language parsing (particularly the multimodal ones) has eliminated a lot of the byzantine syntax requirements of the SD/SDXL days.
    The template prompt seen in each comparison gets adjusted through a guided LLM which has fine-tuned system prompts to rewrite prompts. The goal is to foster greater diversity while preserving intent, so the image model has a better chance of getting the image right.
    Getting to your suggestion for posting all the raw prompts, that's actually a great idea. Too bad I didn't think about it until you suggested it. And if you multiply it out - there's 15 distinct test cases against 22 models at this point, each with an average of about 8 attempts so we’re talking about thousands of prompts many of which are scattered across my hard drive. I might try to do this as a future follow-up.
    [-]
    - what 25 days ago
      Shouldn’t every model get the same prompt? Seems a bit weird, especially when you can’t see the prompts that were used.
      [-]
      - vunderba 25 days ago
        The goal isn’t the prompt itself. The test is whether a prompt can be expressed in such a way that we still arrive at the author's intent, and of course to do so in a way that isn't unnatural.
        The prompts despite their variation are still expressed in natural language.
        The idea is that if you can rephrase the prompt and still get the desired outcome, then the model demonstrates a kind of understanding; however more variation attempts also get correspondingly penalized: this is treated more as a failure of steering, not of raw capability.
        An example might help - take the Alexander the Great on a Hippity-Hop test case.
        The starter prompt is this: "A historical oil painting of Alexander the Great riding a hippity-hop toy into battle."
        If a model fails this a couple of times (multiple seeds), we might use a synonym for a hippity-hop, it was also known as a space hopper.
        Still failing? We might try to describe the basic physical appearance of a hippity-hop.
        Thus, something like GPT-Image-2 scored much higher on the compliance component of the test, requiring only a single attempt, compared with Z-Image Turbo, which required 14 attempts.
simonw 25 days ago
I've been trying out the new model like this:
```
  OPENAI_API_KEY="$(llm keys get openai)" \
    uv run https://tools.simonwillison.net/python/openai_image.py \
    -m gpt-image-2 \
    "Do a where's Waldo style image but it's where is the raccoon holding a ham radio"
```
Code here: https://github.com/simonw/tools/blob/main/python/openai_imag...
Here's what I got from that prompt. I do not think it included a raccoon holding a ham radio (though the problem with Where's Waldo tests is that I don't have the patience to solve them for sure): https://gist.github.com/simonw/88eecc65698a725d8a9c1c918478a...
[-]
- simonw 25 days ago
  I just got a much better version using this command instead, which uses the maximum image size according to https://github.com/openai/openai-cookbook/blob/main/examples...
```
  OPENAI_API_KEY="$(llm keys get openai)" \
    uv run 'https://raw.githubusercontent.com/simonw/tools/refs/heads/main/python/openai_image.py' \
    -m gpt-image-2 \
    "Do a where's Waldo style image but it's where is the raccoon holding a ham radio" \
    --quality high --size 3840x2160
```
  https://gist.github.com/simonw/88eecc65698a725d8a9c1c918478a... - I found the raccoon!
  I think that image cost 40 cents.
  [-]
  - makira 25 days ago
    Fed into a clear Claude Code max effort session with : "Inspect waldo2.png, and give me the pixel location of a raccoon holding a ham radio.". It sliced the image into small sections and gave:
    "Found the raccoon holding a ham radio in waldo2.png (3840×2160).
```
  - Raccoon center: roughly (460, 1680)                                                                                            
  - Ham radio (walkie-talkie) center: roughly (505, 1650) — antenna tip around (510, 1585)                                         
  - Bounding box (raccoon + radio): approx x: 370–540, y: 1550–1780                                                                
                                                                                                                                   
  It's in the lower-left area of the image, just right of the red-and-white striped souvenir umbrella, wearing a green vest. "
```
    Which is correct!
    [-]
    - cwillu 25 days ago
      I had one problem: finding the raccoon. Now I have two: finding the red-and-white striped souvenir umbrella, and finding the raccoon.
      [-]
      - makira 25 days ago
        simonw posted 2 different images: make sure to look at the second one.
        [-]
        cwillu 25 days ago
        Yeah, I noticed that just now, but too late to delete the comment :p
        [-]
        jaggederest 25 days ago
        You had a meta problem, and three, in total: find the raccoon, find the umbrella, find the right link in the comments.
        [-]
        bombcar 24 days ago
        To find Waldo you must first create the Universe.
    - M3L0NM4N 25 days ago
      We would need a larger sample size than just myself, but the raccoon was in the very first spot I looked. Found it literally immediately, as if that's where my eyes naturally gravitated to first. Hopefully that's just luck and not an indictment of the image-creating ability, as if there is some element missing from this "Where's Waldo" image, that would normally make Waldo hard to find.
      [-]
      - nerdsniper 25 days ago
        There seemed to be more space around the raccoon than most other subjects. Zoomed out it appears as almost a “halo” highlighting the raccoon.
  - wewtyflakes 25 days ago
    A startling number of people either have no arms, one arm, a half of an arm, or a shrunken arm; how odd!
    [-]
    - rattlesnakedave 25 days ago
      To be fair, the average person has fewer than two arms.
      [-]
      - cozzyd 25 days ago
        Most people have an ARM in their pockets, nowadays. And possibly on their wrist.
      - floodfx 25 days ago
        Haha. Underrated comment!
    - globular-toast 25 days ago
      Finding the raccoon was instant. Finding all the weird AI artifacts is more fun. It's quite fascinating really. As usual it looks impressive at a glance but completely falls apart on closer inspection. I also didn't find any jokes, unless maybe the bridge to nowhere or finger posts pointing both ways counts?
    - cozzyd 25 days ago
      This is why they're congregating around the first aid and the lost and found
    - ehnto 25 days ago
      There id a leg that sprouts into part of bush, perhaps that's where people's legs are disappearing to.
  - prmoustache 25 days ago
    Funny how it can look convincing from far away but once you zoom in you find out most characters have a mix of leprosy and skin cancer.
  - davebren 25 days ago
    The faces...that's nice that it turned a kid's book into an abomination
    [-]
    - Filligree 25 days ago
      By image generation standards this is a ridiculously good result. No surprise that people instantly find the new limits, but they are new limits.
      [-]
      - davebren 25 days ago
        It could already copy the art styles from its training data, what is the advancement here?
      - globular-toast 25 days ago
        But it's also straight up plagiarism and still ridiculously bad on so many levels.
    - vaulstein 25 days ago
      It's interesting that the raccoon is well defined because it was a part of the request. But none of the other Fauna are.
    - keithnz 25 days ago
      it's interesting, zoomed out it kind of looks ok, zoomed in.... oh my.
  - jdironman 25 days ago
    The real NFTs where the images we generated along the way
  - louiereederson 25 days ago
    The people in this image remind me of early this person does not exist, in the best way
    [-]
    - dfee 25 days ago
      fair point, also "this raccoon does not exist"
  - gpt5 25 days ago
    I tried it on the ChatGPT web UI and it also worked, although the ham radio looks like a handbag to me.
    https://postimg.cc/wyxgCgNY
    [-]
    - luxpir 25 days ago
      Nice, enjoyed the image as someone who has been to the events. But also easy raccoon placement :)
    - djmips 25 days ago
      mmmm yummy OSLS?
  - mirekrusin 25 days ago
    Can it generate non halloween version though?
    This lower-is-better danse macabre, nightmares inducing ratio feels like interesting proxy for models capability.
  - ireadmevs 25 days ago
    I found it on the 2nd image! On the 1st one not yet...
  - dzhiurgis 25 days ago
    Cost me < 1 cents - https://elsrc.com/elsrc/waldo/wojak.jpg
    And this medium quality, high resolution https://elsrc.com/elsrc/waldo/10_wojaks.jpg was 13cents
    p.s. aaaand that's soft launch my SaaS above, you can replace wojak.jpg with anything you want and it will paint that. It's basically appending to prompt defined by elsrc's dashboard. Hopefully a more sane way to manage genai content. Be gentle to my server, hn!
    [-]
    - botanrice 24 days ago
      Some pretty funny but good examples:
      https://elsrc.com/elsrc/waldo/10_schoolsofthought.jpg
      https://elsrc.com/elsrc/waldo/10_anthropomorphizedcomputermo...
      https://elsrc.com/elsrc/waldo/10_breathoffreshairsittingonad...
      https://elsrc.com/elsrc/waldo/10_drizzydrakesdoingthedrakeme...
      https://elsrc.com/elsrc/waldo/10_sashringingtrashsingingmash...
      Ok i promise I'm done xD
    - wordpad 24 days ago
      That's way more than 10, around 50
    - botanrice 24 days ago
      are you using the same prompt the above commenter used? I've been toying around with increasingly ridiculous prompts and it works surprisingly well. It's the new ChatGPT image gen or Nano Banana?
      It's pretty good tbh, even with absurd prompts
  - Barbing 25 days ago
    >I think that image cost 40 cents.
    Kinda made me sad assuming the author didn't license anything to OpenAI.
    I recognize it could revert (99% of?) progress if all the labs moved to consent-based training sets exclusively, but I can't think of any other fair way.
    $.40 does not represent the appropriate value to me considering the desirability of the IP and its earning potential in print and elsewhere. If the world has to wait until it’s fair, what of value will be lost? (I suppose this is where the big wrinkle of foreign open weight models comes in.)
    [-]
    - rafram 25 days ago
      License what? The concept of a hidden object search? The only stylistic similarity here is the viewing angle. Where’s Waldo comics are flat, brightly colored line drawings that look nothing like this at all.
      [-]
      - Barbing 25 days ago
        Well, I recognized the style from even the new physical books on sale today, but I don’t know art well enough to use a term like flat.
        I am not an art expert but I’m perhaps a reasonable consumer and there is possibility of confusion if someone sells AI Where’s Waldo knockoff books at the dollar store, maybe until I take a closer look.
- makira 25 days ago
  > though the problem with Where's Waldo tests is that I don't have the patience to solve them for sure
  I see an opportunity for a new AI test!
  [-]
  - vunderba 25 days ago
    There have already been several attempts to procedurally generate Where’s Waldo? style images since the early Stable Diffusion days, including experiments that used a YOLO filter on each face and then processed them with ADetailer.
    It's a difficult test for genai to pass. As I mentioned in a different thread, it requires a holistic understanding (in that there can only be one Waldo Highlander style), while also holding up to scrutiny when you examine any individual, ordinary figure.
  - simonw 25 days ago
    I've actually been feeding them into Claude Opus 4.7 with its new high resolution image inputs, with mixed results - in one case there was no raccoon but it was SURE there was and told me it was definitely there but it couldn't find it.
    [-]
- marricks 25 days ago
  Like... this has things that AI will seemingly always be terrible at?
  At some point the level of detail is utter garbo and always will be. An artist who was thoughtful could have some mistakes but someone who put that much time into a drawing wouldn't have:
  - Nightmarish screaming faces on most people
  - A sign that points seemingly both directions, or the incorrect one for a lake and a first AID tent that doesn't exist
  - A dog in bottom left and near lake which looks like some sort of fuzzy monstrosity...
  It looks SO impressive before you try to take in any detail. The hand selected images for the preview have the same shit. The view of musculature has a sternocleidomastoid with no clavicle attachment. The periodic table seems good until you take a look at the metals...
  We're reconfiguring all of our RAM & GPUs and wasting so much water and electricity for crappier where's Waldos??
  [-]
  - fennecfoxy 24 days ago
    No, it won't be. I did indeed get the same problems when trying to generate my own image for it.
    However as someone who's mucked about with local image generation as well - I'd say that this is a problem with their implementation, it doesn't resolve fine detail because majority of requests it won't matter/it drastically increases compute requirements.
    With local image generation bad features/incorrect fingers/disfigurement etc has been solved for a long time.
    I think their new process involves multiple steps including sketching/fleshing out the idea before adding detail. The step that would fix this would be outpainting or similar to tile based upscaling.
    From what I understand of image generation models they also struggle with fine detail in general because they aren't really trained for that. However for each tiny chunk of a detailed image like that there's nothing to say they can't allocate a 500x500 chunk for it to work in as its "idea/reference space" and then transpose that into the main image being generated - i.e. generate image features separately rather than all together.
  - p1esk 25 days ago
    AI will seemingly always be ...
    You do realize that the whole image generation field is barely 10 years old?
    I remember how I was able to generate mnist digits for the first time about 10 years ago - that seemed almost like magic!
- pants2 25 days ago
  The second 4K image definitely has a raccoon on the left there! Nice.
- halamadrid 25 days ago
  Really hard to look at these images given how not human like the humans are. A few are ok, but a lot are disfigured or missing parts and its hard to find a raccoon in here.
- vova_hn2 25 days ago
  Thanks for the image, I will see their faces in my nightmares.
  [-]
  - vunderba 25 days ago
    This happens all too frequently when you ask a GenAI model to create an image with a large crowd especially a “Where’s Waldo?” style scenes, where by definition you’re going to be examining individual faces very closely.
  - hackable_sand 25 days ago
    What about the faces of the people ChatGPT killed?
- ritzaco 25 days ago
  haha took me a while to notice that one of the buildings is labelled 'Ham radio'
- nerdsniper 25 days ago
  That is a devilishly difficult prompt for current diffusion tasks. Kudos.
- arealaccount 25 days ago
  I see the raccoon
- ElFitz 25 days ago
  Damn. There’s a fun game app to make here ^^
  [-]
  - dymk 25 days ago
    Is there? The moment you look closely at the puzzle (which is... the whole point of Where's Waldo), you notice all the deformities and errors.
    [-]
    - ElFitz 25 days ago
      Yes, it’s not there yet. But nothing unsolvable. First thing that comes to mind would be generating smaller portion at the same resolution, then expand through tiling (although one might need to use another service & model for this), like we used to do with Stable Diffusion years ago.
      Another option would be generating these large images, splitting them into grids, and using inpainting on each "tile" to improve the details. Basically the reverse of the first one.
      Both significantly increase costs, but for the second one having what Images 2.0 can produce as an input could help significantly improve the overall coherence.
    - amelius 25 days ago
      Yes sounds more like a fun research project instead.
- tptacek 25 days ago
  5.4 thinking says "Just right of center, immediately to the right of the HAM RADIO shack. Look on the dirt path there: the raccoon is the small gray figure partly hidden behind the woman in the red-and-yellow shirt, a little above the man in the green hat. Roughly 57% from the left, 48% from the top."
  (I don't think it's right).
  [-]
  - ritzaco 25 days ago
    I tried
    > please add a giant red arrow to a red circle around the raccoon holding a ham radio or add a cross through the entire image if one does not exist
    and got this. I'm not sure I know what a ham radio looks like though.
    https://i.ritzastatic.com/static/ffef1a8e639bc85b71b692c3ba1...
    [-]
    - jackpirate 25 days ago
      Also, the racoon it circled isn't in the original.
      [-]
      - Aurornis 25 days ago
        I love how perfectly this captures the difficulties of using generative AI for detection tasks.
        [-]
        jetbalsa 25 days ago
        Oh god yes, I've been trying to make a LLM Assisted Magic the Gathering card scanner... its been a hell of a time trying to get it to just OCR card names well....
        [-]
        what 25 days ago
        Why would you use an LLM for OCR?
        [-]
        fennecfoxy 24 days ago
        Because if it's multimodal, oops all transformers and they're pretty much best in class for ocr now, afaik?
        [-]
        jetbalsa 23 days ago
        Yep, Its pretty damn good compared to classic OCR and even more lightweight ones as well that I can run locally. the cards just vary too much over time.
        jubilanti 24 days ago
        Because apparently that's what programming is and can only be these days...
        Abby_101 24 days ago
        [dead]
      - angiolillo 25 days ago
        Indeed. I suppose one way to ensure you can find Waldo in any image is to add it yourself.
    - simonw 25 days ago
      That's excellent. I added it to my post: https://simonwillison.net/2026/Apr/21/gpt-image-2/#update-as...
    - davecahill 25 days ago
      hilarious - i tried and got the same thing.
      there was a very large bear in the first image; when asked to circle the raccoon it just turned the bear into a giant raccoon and circled it.
parasti 25 days ago
A great technical achievement, for sure, but this is kind of the moment where it enters uncanny valley to me. The promo reel on the website makes it feel like humans doing incredible things (background music intentionally evokes that emotion), but it's a slideshow of computer generatated images attempting to replicate the amazing things that humans do. It's just crazy to look at those images and have to consciously remind myself - nobody made this, this photographed place and people do not exist, no human participated in this photo, no human traced the lines of this comic, no human designer laid out the text in this image. This is a really clever amalgamation machine of human-based inputs. Uncanny valley.
[-]
- qnleigh 25 days ago
  No this is what life looks like on the other side of the uncanny valley. The images don't look creepy because they look artificial or wrong. They're a reminder of a creepy new reality where our eyes can no longer tell us what's real.
  [-]
  - Cyan488 24 days ago
    We've definitely passed the point where discerning between real and AI images is impossible, even for a very detail-oriented eye.
    [-]
    - Cthulhu_ 24 days ago
      It's not really a new problem though, as image forgery was a thing ages ago; if there weren't laws or measures taken against photoshopped images or instagram filters or faceapp things then, why would there be laws or measures taken against AI generated images now?
      Granted, a nontrivial difference is that the barrier to entry is lower; photo editing is something that requires active effort and learning.
      [-]
      - conception 24 days ago
        Absolutism isn’t very useful. Scale and magnitude always need to be considered. “I can buy plates with uranium in them, why can’t I enrich it at scale for my own personal use???” “Humans have been hunting for thousands of years. Why can’t i deploy automated sentry machine guns at my property line??”
  - rambojohnson 25 days ago
    Online.
  - rkozik1989 24 days ago
    Can't wait for all scams to rip of older folks and people who aren't there but aren't so far gone they still have that nobody has power of attorney over them.
- tempaccount5050 24 days ago
  Yep. Just like motion pictures. Why, it's just a facsimile! People were meant to see performances by real people. These motion pictures fool your eye and surely will unravel the very fabric of civilized society! No longer shall the thespian be well employed! And the minds of the children will lay in ruins from such filth!
  [-]
  - parasti 24 days ago
    I get your point, but it's not even really that. It's that an AI generated photo evokes the same feelings in me that human-made photographs do and I have to catch that and turn that off consciously.
    [-]
    - tempaccount5050 24 days ago
      It shouldn't bother you. Just enjoy stuff. It's ok to think computer art is pretty. It's not some kind of personal or societal moral failing.
      [-]
      - vincnetas 24 days ago
        well it is. what if you found out that your wife is actually a robot that you cant tell apart from real human. your real wife. well at least not by cutting her open. would you feel the same being with here?
        [-]
        GMoromisato 24 days ago
        That's not as bad as when I learned my wife is really just the product of cell division.
        [-]
        vincnetas 24 days ago
        ... damn, when you say this like that ... i'm now in existential crisis ...
        rnjesus 24 days ago
        where can i buy one of these robot wives?
- GMoromisato 24 days ago
  I get this attitude--I really do. But I think the world moves on, and our children are not going to think this is even slightly strange. As always, it's us old-timers who have the hardest time with change.
  I also think this is "art" in service of commerce. This is OpenAI advertising their goods using art/design/writing. That's no different than cereal companies using Elmer's glue instead of milk for their photoshoots. I don't have a high-bar for that kind of "art".
  The good news is that the cutting edge of art will (for a while longer) still be a human domain. The more popular these models become, the more of their images we see in our lives, the more we will value things that look different.
  [-]
  - eieke 24 days ago
    Right… I’m sure Zuckerberg thought the same about metaverse.
    This is the last place to get a reasonable take of how the average person feels about this stuff.
    [-]
    - GMoromisato 24 days ago
      I never tried the metaverse. I don't own any Bitcoin. But I literally use AI every day.
      I acknowledge that I'm not particularly good at predicting the future, but I'm confident that AI is here to stay.
- mw888 24 days ago
  Uncanny Valley means the content directly evokes that creepy feeling, because the 'unrealness' is somehow subjectively apparent.
  But you say yourself you "have to consciously remind [yourself]" it isn't real. The Uncanny Valley is not applicable when true subjective realness is imparted.
- Sohcahtoa82 24 days ago
  That's not the uncanny valley. It's literally the other side of the uncanny valley.
  The uncanny valley is when it's just slightly imperfect which makes things feel "off".
  When we've reached the point that the AI is indistinguishable from humans, we've exited the uncanny valley.
- ceroxylon 24 days ago
  The wolf photo for the article was the most eerie example for me... if I am reading about the natural world, I want to see a real photo of the natural world.
- linen 24 days ago
  Why are so many on HN unable to see through the B.S. and hype? Everything in the trailer feels unvaried and derivative. It does text and filters well (grit/grain, UI etc) but all the posters, comics, and infographics feel the same. They've all got matching structure and color palettes and once you've seen enough of them, you can easily spot them in a crowd. I'm not sure why people are falling for this, the AI voices in the trailer are ridiculous too.
  [-]
  - eieke 24 days ago
    No taste.
    No point trying to reason with them.
    These products don’t stick - just like sora they’re seemingly cool initially but then people go back to what were already doing ex-ante.
- mannanj 23 days ago
  For me it feels off that we don't have as much interest and attention in validating fakes and lies from real ones. I remember seeing a lot of media about that, and recently its just.. fizzled out. I want to know when someones lied with the photo they presented to me.
ea016 25 days ago
Price comparison:
GPT Image 2
```
  Low     : 1024×1024 $0.006 | 1024×1536 $0.005 | 1536×1024 $0.005

  Medium  : 1024×1024 $0.053 | 1024×1536 $0.041 | 1536×1024 $0.041

  High    : 1024×1024 $0.211 | 1024×1536 $0.165 | 1536×1024 $0.165
```
GPT Image 1
```
  Low     : 1024×1024 $0.011 | 1024×1536 $0.016 | 1536×1024 $0.016

  Medium  : 1024×1024 $0.042 | 1024×1536 $0.063 | 1536×1024 $0.063

  High    : 1024×1024 $0.167 | 1024×1536 $0.25  | 1536×1024 $0.25
```
[-]
- Melatonic 25 days ago
  Weird that they restrict the resolution so much. Does it fall apart with more detail (when zoomed in) or does the cost just skyrocket?
  [-]
  - vunderba 25 days ago
    It's usually based on what they've been trained on. There aren't very many models that'll do higher resolutions outside of Seedream but adherency is worse.
    [-]
    - _the_inflator 25 days ago
      Processing power, not training. The larger the scene in 2ď the more you need to compute. The resolution itself is not flexible. Imagine painting a white canvas. It is still a pixel per pixel algo which costs LLM GPU power while being the easiest thing to do without it.
      You can create larger images by creating separate parts you recombine. But they may not perfectly match their borders.
      It is a Landau thing not a trading thing. The idea of LLM is to work on the unknown.
      [-]
      - vunderba 25 days ago
        It depends on the model. Diffusion models, which are among the more popular approaches, are typically trained at a specific image resolution.
        For example, SDXL was trained on 1MP images, which is why if you try to generate images much larger than 1024×1024 without using techniques like high-res fixes or image-to-image on specific regions, you quickly end up with Cthulhu nightmare fuel.
    - nomel 25 days ago
      Need a model trained on closeup/macro shots of everything, to use for upscaling, then run that, as a kernel, over the whole image.
      [-]
      - Melatonic 25 days ago
        Exactly what I was thinking
  - dsrtslnd23 25 days ago
    actually gpt-image-2 is VERY flexible with the resolution. You can use arbitrary resolution within the max pixel budget.
  - ModernMech 25 days ago
    Generate a lower resolution image and upscale to the resolution you need.
  - al_borland 25 days ago
    [dead]
- lxgr 25 days ago
  Interesting, I wonder why larger outputs are more expensive than smaller square ones on v2, while it’s the other way around in v1.
- ComputerGuru 25 days ago
  It can generate 3840x2160
neom 25 days ago
Here is my regular "hard prompt" I use for testing image gen models:
"A macro close-up photograph of an old watchmaker's hands carefully replacing a tiny gear inside a vintage pocket watch. The watch mechanism is partially submerged in a shallow dish of clear water, causing visible refraction and light caustics across the brass gears. A single drop of water is falling from a pair of steel tweezers, captured mid-splash on the water's surface. Reflect the watchmaker's face, slightly distorted, in the curved glass of the watch face. Sharp focus throughout, natural window lighting from the left, shot on 100mm macro lens."
google drive with the 2 images: https://drive.google.com/drive/folders/1-QAftXiGMnnkLJ2Je-ZH...
Ran a bunch both on the .com and via the api, none of them are nearly as good as Nano Banana.
(My file share host used to be so good and now it's SO BAD, I've re-hosted with them for now I'll update to google drive link shortly)
[-]
- rrr_oh_man 25 days ago
  Why would you consider this a good prompt?
  [-]
  - brynnbee 25 days ago
    My observations have been that image generation is especially challenged when asked to do things that are unusual. The fewer instances of something happening it has to train on, the worse it tends to be. Watch repair done in water fits that well - is there a single image on the internet of someone repairing a watch that is partially submerged in water? It also tends to be bad at reflections and consistency of two objects that should be the same.
    [-]
- jcattle 25 days ago
  I mean, your prompt is basically this skit: https://www.youtube.com/watch?v=BKorP55Aqvg ("The Expert" 7 red lines: all strictly perpendicular, some with green ink some with transparent ink)
  I couldn't imagine the image you were describing. I've listed some of the red lines with green ink I've noticed in your prompt:
  Macro Close Up - Sharp throughout
  Focus on tiny gear - But also on tweezers, old watchmakers hand, water drop?
  Work on the mechanism of the watch (on the back of the watch) - but show the curved glass of the watch face which is on the front
  This is the biggest. Even if the mechanism is accessible from the front, you'd have to remove the glass to get to it. It just doesn't make sense and that reflects in the images you get generated. There's all the elements, but they will never make sense because the prompt doesn't make sense.
  [-]
  - fc417fc802 25 days ago
    The last point (reflection by front glass versus mechanism access so no front glass) is the only issue I see with it. Other than that I can easily visualize an image that satisfies the prompt. I think that the general idea is a good one because it's satisfable while having multiple competing requirements that impose geometric constraints on the scene without providing an immediate solution to said constraints as well as requiring multiple independent features (caustics, reflections, fluid dynamics, refraction, directional lighting) that are quite complicated to get right.
    To illustrate that there aren't any contradictions (other than the final bit about the reflection in the glass). Consider a macro shot showing partial hands, partial tweezers, and pocket watch internals. That's much is certainly doable. Now imagine the partial left hand holding a half submerged pocket watch, fingertips of right hand holding front half of tweezers that are clasping a tiny gear, positioned above the work piece with the drop of water falling directly below. Capture the watchmaker's perspective. I could sketch that so an image model capable of 3D reasoning should have no trouble.
    It's precisely the sort of scene you'd use to test a raytracer. One thing I can immediately think to add is nested dielectrics. Perhaps small transparent glass beads sitting at the bottom of the dish of water with the edge of the pocket watch resting on them, make the dish transparent glass, and place the camera level with the top of the dish facing forward?
    https://blog.yiningkarlli.com/2019/05/nested-dielectrics.htm...
    A second thing I can think to add is a flame. Perhaps place a tealight candle on the far side of the dish, the flame visible through (and distorted by) the water and glass beads?
    [-]
    - jcattle 25 days ago
      Without the last point with the watch glass it is also easier to imagine for me. Still, you'd have to be selective.
      Do you want it to actually look like macro photography (neither of the generated images do)? Then you can't have it sharp throughout and you won't be able to show the (sharp) watchmakers face in a reflection because it would be on a different focal plane.
      Dropping the macro requirement, you can show a lot more. You can show that the watchmaker is actually old, you can show the reflection, etc.
      Something has to give in the prompt, on multiple of the requirements. The generated images are dropping the macro requirement and are inventing some interesting hinging watch glass contraptions to make sense of it.
      [-]
      - fc417fc802 25 days ago
        Yeah, fair enough. I figure "macro" sees sufficiently loose use that a model should be able to make sense of it but to get the prompt into perfect shape that ought to be replaced with something like "a closeup showing X, Y, Z in perfect focus". Still the only real problem I see is the aforementioned contradiction regarding the front glass. Short of that single detail an artist could easily satisfy the description as written to well within reason.
  - neom 24 days ago
    Yeah I dunno bud, I have a degree in film and three Emmy awards for technical production (an expert), I could shoot that prompt (unlike the so called "expert" in the skit). Canon EF 100mm Macro USM at f32 should be able to produce that, focus doesn't need to imply aperture, and a quick google search shows me there are loads of front gear pocket watches available. Also it produced something very clearly not shot with a 100mm anyway, as the telephoto compression is wrong.
    [-]
    - Auracle 24 days ago
      Far be it for me to add to a comment by an expert from someone who only whipped out his macro lens for ring shots at weddings and - about 2 hours ago - a picture of our latest newborn. However, I think most photographers is that situation wouldn’t shoot at f/32 due to diffraction and would focus stack instead.
      Of course, a text to image model shouldn’t really need to worry about that sort of thing.
    - jcattle 24 days ago
      Yeah I dunno bud, I've watched a few watch repair videos on youtube and have seen macro photography which other people did.
      Sure there are pocket watches where the movement is visible from the front (you'd still likely service them from the back, but alas). Even if you'd do service from the front where the glass is, you'd still have to remove it to drop in a gear.
      Anyway, I think that we aren't really talking about the same thing. I'm nitpicking your prompt while you constructed it to mostly see the performance of the model in novel situations and difficult lighting and refraction environments. And that's fair.
      How satisfied are you with the generated image results? What would you do different when shooting this proposed scene yourself?
      [-]
      - neom 24 days ago
        Reasonable people can disagree - I think you made some good points, I've been sitting for the last 20 minutes wondering where the DoF at 32 on a 100 runs out, maybe you're right I'm not 100% sure.
        The prompt I did mostly to see how it does with the gears and the tweezers, and the perspective of the gears (do they.. I don't know the opposite word of distort, straighten?, but do they seem like they're actually round, could they work?) I think those are really hard things for AI, the glass distortion, reflections the DoF etc were just to see how it approached that, and like the other comment below said, I tried to pick something that that wasn't likely to be in training data, so it reasoned about it more.
        Nano was able to spit it out consistently, Images 2 really struggles, and has yet to complete one I was satisfied with, whereas with nano it nails it almost every time, the 2 images I showed originally are the first shot of the prompt with the models. (here are the 3 other gens from Images2: https://drive.google.com/drive/folders/1s8gik_x0B-xDZO6rOqoz...)
        How would I shoot it? I wouldn't, fixing a watch in water is a dumb idea. ;)
- the_lucifer 25 days ago
  Looks like your image host has rate limited viewing the shared images, wanted to give you a heads up
  [-]
  - neom 25 days ago
    Thanks, I need to get off Zight, they used to be such an nice option for fast file share but they've really suffered some of the worst enshittification I've seen yet.
- pb7 25 days ago
  Links are broken.
  [-]
  - waynesonfire 25 days ago
    So.. sign up. "Get Sight for free". Ads everywhere bro.
madrox 25 days ago
This seems like a great time to mention C2PA, a specification for positively affirming image sources. OpenAI participates in this, and if I load an image I had AI generate in a C2PA Viewer it shows ChatGPT as the source.
Bad actors can strip sources out so it's a normal image (that's why it's positive affirmation), but eventually we should start flagging images with no source attribution as dangerous the way we flag non-https.
Learn more at https://c2pa.org
[-]
- debazel 25 days ago
  > but eventually we should start flagging images with no source attribution as dangerous the way we flag non-https.
  Yes, lets make all images proprietary and locked behind big tech signatures. No more open source image editors or open hardware.
  [-]
  - henry-j 25 days ago
    C2PA is actually an open protocol, à la SMTP. the whole spec is at https://spec.c2pa.org/, available for anyone to implement.
    [-]
    - debazel 25 days ago
      The standard itself being open is irrelevant. I'm not sure why this is always brought up for attestation standards. It is fundamentally impossible to trust the signature from open-source software or hardware, so a signature from open-source software is essentially the same as no signature.
      The need for a trusted entity is even mentioned in your specification under the "attestation" section: https://spec.c2pa.org/specifications/specifications/1.4/atte...
      So now, if we were to start marking all images that do not have a signature as "dangerous", you would have effectively created an enforcement mechanism in which the whole pipeline, from taking a photo to editing to publishing, can only be done with proprietary software and hardware.
      [-]
      - madrox 24 days ago
        We already have a centrally curated trust model in https. Browsers only treat connections as "secure" if they chain up to a root CA in their trust store. You can operate outside that system, but users will see warnings and friction. Some level of trust concentration isn’t new.
        I'm curious if you think this is worse or not as bad as a best-case broad implementation c2pa...especially if there is a similar Let's Encrypt entity assisting with signatures.
  - Melatonic 25 days ago
    Why would the image itself have to be proprietary to have some new piece of metadata attached to it ?
- mdasen 25 days ago
  > Bad actors can strip sources out
  I think the issue is that it's not just bad actors. It's every social platform that strips out metadata. If I post an image on Instagram, Facebook, or anywhere else, they're going to strip the metadata for my privacy. Sometimes the exif data has geo coordinates. Other times it's less private data like the file name, file create/access/modification times, and the kind of device it was taken on (like iPhone 16 Pro Max).
  Usually, they strip out everything and that's likely to include C2PA unless they start whitelisting that to be kept or even using it to flag images on their site as AI.
  But for now, it's not just bad actors stripping out metadata. It's most sites that images are posted on.
  [-]
  - henry-j 25 days ago
    There’s actually a part of the NY state budget right now (TEDE part X, for my law nerds) that’d require social media companies to preserve non-PII provenance metadata and surface it to the user, if the uploaded image has it.
    linkedin already does this--- see https://www.linkedin.com/help/linkedin/answer/a6282984, and X’s “made with ai” feature preserves the metadata but doesn’t fully surface it (https://www.theverge.com/ai-artificial-intelligence/882974/x...)
  - madrox 25 days ago
    You're implying social platforms aren't bad actors ;)
    In seriousness, social platforms attributing images properly is a whole frontier we haven't even begun to explore, but we need to get there.
- woadwarrior01 25 days ago
  Yeah, OpenAI has been attaching C2PA manifests to all their generated images from the very beginning. Also, based on a small evaluation that I ran, modern ML based AI generated image detectors like OmniAID[1] seem to do quite well at detecting GPT-Image-2 generated images. I use both in an on-device AI generated image detector that I built.
  [1]: https://arxiv.org/abs/2511.08423
- paradoxyl 25 days ago
  What a dystopian, pro-tyranny ask. Horrifying.
  [-]
  - madrox 24 days ago
    The comments that aren't directly discussing the technical achievement here are bemoaning the destruction to society that AI generated images can cause, which is a fair criticism. I'm genuinely curious what you think the greater horror is. Or what a better solution might be.
    Reddit blurs nsfw images by default. You can change that in settings. I don't see what it so terrible about the idea of doing this with untrusted image sources.
  - cubefox 24 days ago
    To ask for verification whether a photo is real or fake?
swalsh 25 days ago
Been using the model for a few hours now. I'm actually reall impressed with it. This is the first time i've found value in an image model for stuff I actually do. I've been using it to build powerpoint slides, and mockups. It's CRAZY good at that.
[-]
- johnwheeler 25 days ago
  Yeah, it's funny. I would expect to see more enthusiasm versus just basic run-of-the-mill, "oh, there it is". Leave it to the HN crowd. This is incredible. I don't even like OpenAI.
  [-]
  - rkozik1989 24 days ago
    LLMs make for great day 1 demos, but in a few weeks I promise you many people will be able to tell nearly all of the images generated by this are AI. It just takes time and exposure to figure out the new common flaws.
    Frankly, I am not sure if they will ever actually be able to solve this problem or if it'll be a continuous game of whackamole, but regardless there's a large crowd of people out there where if they can tell something is AI generated they will not support the company behind it. Being able to tell anything is AI generate cheapens brands.
    [-]
    - johnwheeler 24 days ago
      You're thinking is like everyone else's, and it's backwards. The world will learn to accept it as the standard way of doing things and people will appreciate one generation over another and look at manual image creation as a niche activity like blacksmithing vs assembly-line manufacturing and automation. With the latter, you appreciate the intent and the end result. Same thing here, people are just adjusting to it.
  - pembrook 25 days ago
    HN is engineer heavy so its a bunch of people who spend their days looking at code. If it's not a coding model they'll likely never use it.
    To the average HN'er, images and design are superfluous aesthetic decoration for normies.
    And for those on HN who do care about aesthetics, they're using Midjourney, which blows any GPT/Gemini model out of the water when it comes to taste even if it doesn't follow your prompt very well.
    The examples given on this landing page are stock image-esque trash outside of the improvements in visual text generation.
lionkor 25 days ago
Every cent you spend on this, remember: The people who made this possible are not even getting a millionth of a cent for every billion USD made with it (they are getting nothing). Same with code; that code you spent years pouring over, fixing, etc. is now how these companies make so much money and get so much investment. It's like open source, except you get shafted.
[-]
- arghwhat 24 days ago
  This is, in my opinion, attempting to say the right thing with entirely the wrong perspective:
  The people you say are getting "shafted" always got shafted. Their works are the inspiration for all artists and people who lay their eyes on it - maybe they got paid when they made the work, maybe they managed to sell it, but probably not. And still, other artists (and machines) will use remember and be inspired by it, sometimes to the point of verbatim copy (which is extremely common for human artists as well, with verbatim copy and replication being an actual sought after skill).
  (Those about to shout "LICENSING", that's a very new invention and we're terrible at it. What are you going to do, cut out the part of your brain that formed new connections while touching GPL code?)
  The person (singular) that is actually getting "shafted" at each use is the artist you didn't hire to do the job of making your new work, because it is their skill that got replaced. A skill build from a lifetime of studying other art and practicing themselves, replaced with a skill build from a machine studying other art and by virtue of some closed loops likely also "practicing" itself.
  Still, shafting at large, but the obsession with training data is misplaced in that it entirely ignores how society and art worked beforehand.
  At the same time, for most of the things you're likely using the tool for, there would probably would never have been an artist in the first place. For example, if you're just making your powerpoint prettier, or if your commission is ridiculous as it often is and yet only willing to offer a single-digit dollar sum per work which no artist should take (RIP the poor souls that take such work anyway).
  [-]
  - tsimionescu 24 days ago
    You're ignoring the biggest problem here: the concentration and extraction of wealth. The sum total of human artists were previously getting those billions of dollars, and now it's OpenAI (and Anthropic, and Google, and Microsoft, and maybe a handful of other players) getting it. Now, maybe it actually used to be hundreds of millions of dollars, and they've grown it to billions, and maybe they deserve some of that - but they're getting all of it. This is the huge issue with this technology, not so much the fact that it exists but that it is being sold by a tiny, tiny amount of people.
    [-]
    - torginus 24 days ago
      I wonder what happened to actual artists though - they seem to be doing fine. I'm sure many people as consumers dabbled in AI art, and reached the conclusion after hours that what they made never looked quite right.
      Then they found they could commission an actual artist to draw what they wanted for tens or hundreds of dollars, which is a very good price for getting exactly what you want without having to waste your time playing the token slot machine.
      [-]
      - mrWiz 24 days ago
        How'd you conclude that artists are doing fine? That doesn't match my experience or observations at all.
        [-]
        torginus 23 days ago
        I know some pro artists (ppl doing work for big name companies, games, US film studios), either on a contract or employed basis.
        They've always told me the same thing - the job is to hit the minimum acceptable level of quality (which to my untrained eye often looks high, but they reassure me, their work is in fact sloppy garbage), using whatever means necessary, even if that means AI.
        They don't even hate AI mostly the way art Twitter does, they hate is because it gives unrealistic expectations to what costs how much, and its often not really possible to get useful results - at least that was the case a couple years ago, things might have evolved.
        If AI were good enough, they would certainly use it.
        As for Twitter people doing commissions, I dont have firsthand experience, but imo their biggest issue is that there are tons of artists from places like Latam or the Philippines who do high quality work and charge very little, and the people who commission don't care - this was the case well before AI.
    - WarmWash 24 days ago
      That's also the wrong framing
      AI Labs are getting a tiny cut of the hundreds saved by not hiring an artist.
      So regular people save hundreds, the labs get a few dollars, and the artists get nothing.
      The artists are still losing, but it's regular people, especially the least able, who are winning.
      The coffee shop isn't cutting OAI a $300 check for doing their spring menu. They are pocketing $295 and paying OAI $5.
      [-]
      - runarberg 24 days ago
        No. The coffee shop who isn’t paying an artist $300 is gonna get negative reviews and loose customers and money from their bad business decision[1]. I know I would think twice about ordering at a café which uses AI in their marketing, and I am not the only one.
        The coffee shop who cannot afford the $300 for an artist and homebrews their design in Microsoft Word is still doing just as before, the coffee shop which can afford it and still pays an artist is still doing fine. The coffee shop which is paying openAI $5 for stolen art, gets to look as cheap as they are.
        1: https://www.sfgate.com/food/article/santa-cruz-restaurant-ai...
        [-]
        atonse 24 days ago
        So to save the idea of $300 (logo design with "local" talent is never $300, it is only that cheap if you offshore it), they tried to ruin a business that presumably employs multiple LOCAL people full time (way more than $300) with 1 star reviews to "punish it"
        This is an internet mob at its worst. Not an example of anything to emulate, in my opinion.
        [-]
        runarberg 24 days ago
        People hate AI, and this is one of very few ways people have to punish AI. It is bound to happen.
        And in either case, this example destroys the framing that coffee shop owners are the ones who benefit from the systemic art theft employed by AI companies.
        WarmWash 24 days ago
        Sure, just like every software company using AI is going to go under and every video game using AI will fail?
        [-]
        runarberg 24 days ago
        I am not sure what you mean. The AI backlash is real, and it has real and obvious effects in the real world, with written articles to prove it.
        If you are attempting here to shift the focus away from coffee shops (may I remind you, you were the one who brought that as an example) and into video games or software companies, I simply reject that attempt.
        That there exists a software company which uses AI in their product and is not failing has no bearing on the framing on how a coffee shop which is too cheap to pay an artist for their logo does indeed look cheap to it’s customers who will be inclined to give that café a negative review or otherwise avoid said café.
        [-]
        WarmWash 24 days ago
        I'm shifting the focus to the reality that exists outside of internet mobs.
        99% of people don't recognize AI generated content, and don't particularly care enough to pixel scan every image they see.
        You can death grip articles of AI art backlash, but they are all these hyper-narrow one off events. But reality is the general population doesn't really see it or care.[1]
        1.https://www.forbes.com/sites/conormurray/2026/04/17/the-no-1...
    - arghwhat 24 days ago
      That's an entirely different problem to artists getting "shafted". Not saying it's not a worthwhile discussion, but it is a separate concern.
      Having everyone pay phone/internet, office, streaming, music, etc., subscriptions to large tech companies that are effectively monopolies all do that. It's a bigger, pre-existing issue.
    - fennecfoxy 24 days ago
      Yes, look at how many historical inventors (like the Blue LED, the guys struggling to convince Gates and Balmer to make the Xbox) etc get/got nothing for their efforts compared to the huge sums raked in by the very people actively trying to prevent them from building the idea that made all the money.
      AI is hugely beneficial to our species. Our tribalism and "yeah well they earned it!" response to capitalism's rampant production of billionaires is the real problem, not technology.
      Why are footballers and movie celebrities paid 50$m a year? There's the answer.
    - jstummbillig 24 days ago
      1) Is there a moat? Is there no moat? Are open models as good as the closed ones? I keep getting confused.
      2) As one of these artists, I am entirely fine with my entire body of work being used for the purposes of model building. The tech is astonishing and fantastic, and I sincerely hope we will be better through it. As the parent suggested: The idea that people in general previously gave a fuck about compensating artists is hilarious. MS builds models with my work, random people bought, idk, another vacation in Thailand or a fourth pair of shoes with the money that they never spent on art. I know which one I would prefer.
      But I do find it particularly juicy that people, who, on the whole, never thought too much about paying artists (which I am also fine with btw!), all of a sudden can't stop wringing their hands about the injustice of it all.
      [-]
      - ori_b 24 days ago
        What art have you produced? I did a little googling, and I can't find anything of note in public.
    - dwayne_dibley 24 days ago
      The same issue applies to fastfood, coffee chains and taxi services. Capitalism.
    - Lalabadie 24 days ago
      Correct. The way it's being built is exactly all that the US mentality warns about socialism/communism (that giving away your hard work "for the greater good" is a lie and is actually a power grab).
      Turns out, if it's American oligarchs profiting from everyone's work, they love the idea!
  - rasKqa 24 days ago
    Children can draw without ever having been to an art gallery. The IP laundromats need the entire stolen corpus of human labor. The latter is clearly an infringing derivative work.
    It will be true no matter who many bribes those who have never created anything pay to Marsha Blackburn (who miraculously reversed her AI skepticism).
    I wonder how many threats of being primaried have been issued by the uncreative technocrat thieves.
    [-]
    - strulovich 24 days ago
      No they can’t just draw by themselves. It’s extremely bad and random.
      Their teachers teach them from a very early age how to hold a carton, and how to draw.
      Maybe some miraculous humans will reinvent all drawing of growing by themselves in the jungle, most people will not.
      Source: I have kids.
  - BeetleB 24 days ago
    > The person (singular) that is actually getting "shafted" at each use is the artist you didn't hire to do the job of making your new work, because it is their skill that got replaced.
    1% Yes, and 99% No.
    Over 99% of uses would not have resulted in hiring someone to do the work had these models not existed as you yourself acknowledge.
    [-]
    - arghwhat 24 days ago
      Yes, but this is a bit of an oversimplification. The "99%" tends to be either: 1. Pointless throwaway content which we can just ignore as a new source of noise, 2. Something that could have ended up being a $5 commission[^1] to a kid somewhere out there but now never will be.
      Those numbers are also a bit too aggressive - it's easy to miss what kind of gig work exist out there. PowerPoint as a service is a thing on Fiverr for example. A horrible, horrible thing, but a thing none the less.
      ^1: not at all what art costs, but someone trying to get started might do quick sketches at those prices
      [-]
      - BeetleB 24 days ago
        > The "99%" tends to be either: 1. Pointless throwaway content which we can just ignore as a new source of noise, 2. Something that could have ended up being a $5 commission[^1] to a kid somewhere out there but now never will be.
        Or 3. Something I made and I actually use, but I would never have paid a kid $5 to do.
        Yes, I know of Fiverr and similar sites. Even planned on using it once. Even know someone in another country who made side money from it. And yes, it does suck for them. But none of that changes the fact that well over 99% of uses are not depriving them of any money.
        [-]
        arghwhat 23 days ago
        I disagree, because when someone can just get those simple works made on a subscription you already paid for, then the $5 commission goes from something someone might end up doing if the idea is stuck in their head long enough (or they find the idea amusing enough), to be something that can never become a commission.
        Not pointing fingers or saying that you must pay kids to draw things for you, but it most definitely does take work away by replacing an entire class of commissions. Not sure what to do with that fact.
        (I'd put things that would never, ever be worth a $5 commission into the throwaway noise category, even if you do use the outcome.)
        runarberg 24 days ago
        I have seen arguments that a lot of your nr. 3 is basically just addiction. You are making the AI slot machine generate stuff for you and you get to have the sense of accomplishment that comes with thinking you created something without putting in any of the work of actually creating something. To the rest of the world this is indistinguishable from your parent’s nr. 1.
        [-]
        BeetleB 24 days ago
        Fair point. It's just that his number 1 was "Pointless throwaway content", and I was saying "Well, actually, it's not thrown away but actually used".
        You may look at the output and say "Crap!", but the reality is the person using it found value in it.
        (To be honest, I used to think "Crap!" to stock photos long before LLMs came on to the scene, so I have little sympathy with stock photo photographers going out of business - those photos exist primarily to attract readers and do not provide any value to the content - they're just like ads in that regard).
- ACCount37 24 days ago
  If "people who made this possible" were getting their fair share, "a millionth of a cent for every billion USD made with it" would be about it for the artists.
  What makes the dataset valuable isn't that the image 0012992 in it is precious and irreplaceable. It's that the index goes to seven digits. Pre-training is very much a matter of scale - and scraping is merely the easiest way to get data at scale.
  People who complain about "artists not getting paid" must have in their imagination some kind of counterfactual where artists are being paid thousands for their contributions. That's not how it works. A counterfactual world where artists were paid for AI training is one where an average artist is 5 cents richer, an average image generation AI performs 5% worse, and the bulk of extra data spending is captured by platforms selling stock photos and companies destructively digitizing physical media.
  [-]
  - lionkor 24 days ago
    The ideal world would be one where, to train on art, you have to buy a license to that art. Sure, for most artists they would maybe put a low price tag, but that isn't the point.
    The point isn't about money. It's that copies were made, without license and without permission, and without any legal right to do so, of art, and then used to train a system which generates similar art. The first step, the copy, is illegal without a license, and even for most public images online, licenses and copyright notices (which must be preserved) are attached.
    [-]
    - ACCount37 24 days ago
      "Without any legal right to do so" is for the courts to decide. And so far, the courts are very much not deciding the way you want them to.
      "Fair use" counters "without license and without permission" hard. The argument that training AI on scraped data is "fair use" and the resulting model outputs are "transformative works" has held up in courts. Anthropic got dinged for downloading pirated books, but not for throwing the ones they didn't pirate down the training pipeline.
      Some countries, like Japan, have amended their copyright laws to make AI training categorically legal. Others are in "fair use clauses" grey areas with courts deciding case by case based on precedent and interpretation. So trying to latch onto copyright law is, as it always was, the wrong move. Copyright never favored the small guy. Stupid to expect that it suddenly will.
      [-]
      - davemp 24 days ago
        > The argument that training AI on scraped data is "fair use" and the resulting model outputs are "transformative works" has held up in courts.
        Nope. Nope. Nope. That has explicitly not been ruled on yet. Transformative means that you don't need a fair use defense. Anthropic has only gotten away with their outputs being called transformative so far because they put a dubiously effective filter in front to block the most egregious infringing outputs. No one has actually challenged this afaik.
    - brookst 24 days ago
      Would your ideal world apply to humans as well? Like if I see some art in a museum and it inspires me to create some of my own, I would need to pay a licensing fee to the original artist?
      And what about the artists that inspired them? There is no art in the world that sprang fully formed from one single person, without any influences.
      Should we reshape our economy to ensure knowledge and artistic provenance is maintained perpetually?
      This whole discussion is so weird to me. It’s like AI has freaked everyone out so much that the instinct is to run to the safety of Disney-esque complete control and perpetual monetization of every work.
      Which is exactly the opposite of how art worked for the first several hundred thousand years. Really, we want to double down on the perverse incentives and tight control that IP owners have given us in the past 50 years?
      [-]
      - Jtarii 24 days ago
        >Like if I see some art in a museum and it inspires me to create some of my own, I would need to pay a licensing fee to the original artist?
        Nope, humans are admitted for free :).
        >And what about the artists that inspired them? There is no art in the world that sprang fully formed from one single person, without any influences.
        As long as you are a human you get to be inspired all you want :)
        You seem very invested in licking the boot of the trillion dollar corporations. Your fellow humans are concerned.
        >Really, we want to double down on the perverse incentives and tight control that IP owners have given us in the past 50 years?
        Isn't it interesting that the EXACT second that copyright law impedes billion dollar corporations it is thrown out the window, really makes you think huh?
        [-]
        brookst 18 days ago
        [dead]
    - kolinko 24 days ago
      Ideal for whom? For society in general, I don’t think so.
      [-]
      - bigfishrunning 24 days ago
        I think you may be placing too much value on the output of these machines which use tons of energy, generate pollution (both noise and chemical), and generate output that's worse then what a human can do. We would be better off if these LLMs didn't exist.
        [-]
        azan_ 24 days ago
        Average person in US reducing his/her meat intake by 1/4 would do much, much more for environment compared with completely scrapping entire AI infrastructure worldwide. For some reason people concerned with environmental impact of AI get really angry whenever I point this out.
        [-]
        bluebarbet 24 days ago
        The average person here would do more still by just taking one less flight. It's air travel that really blows individual emissions out of the water.
      - peepee1982 24 days ago
        I think it would obviously better for society.
  - jeroenhd 24 days ago
    > A counterfactual world where artists were paid for AI training is one where an average artist is 5 cents richer, an average image generation AI performs 5% worse, and the bulk of extra data spending is captured by platforms selling stock photos and companies destructively digitizing physical media.
    No, a counterfactual world where artists were paid for AI training wouldn't see commercially viable AI at all. A world which plenty of people would be more than happy to live in, mind you.
    AI relies on mass piracy worth Googols of dollars if you count like you would the million dollar iPod, but because AI surprised the copyright industry, it's now too late to enforce copyright like that.
    [-]
    - ACCount37 24 days ago
      Even in a counterfactual world where any data that's not in public domain can't be used in AI training at all, ever, AIs would exist. Training on public domain data is a bitch, but it's doable. It's just that it results in worse AIs for more effort. So no one does it other than to flex.
      It would still be "commercially viable", mind. I'm not sure how much would it stall the AI development in practice, but all the inputs of making AIs only get cheaper over time. So I struggle to imagine not having something like DALL-E 1 by 2030.
      If we extend the counterfactual and allow for licensed media, we compress the timelines and raise the bar. The "best" image generation AIs of 2026 are now made by the likes of Adobe and locked behind some kind of $500 a month per seat Creative Cloud Pro Future subscription. Because Adobe is rich enough to afford big bulk licensing deals, while the likes of academia and smaller startups have to subsist on old public domain data, permissively licensed scraps and small carefully selected batches of licensed data that might block them from sharing the resulting weights with the licensing deals.
      In the "counterfactual: licensed media" world, the local AI generation powerhouse of Stable Diffusion ecosystem probably doesn't exist at all. Big companies selling AI do. Their offerings cost a lot more and perform considerably worse than the actual AIs we have today. So you can't just go to a random website and get an image edited for a shitpost for free. But the high end commercial suites exist, they're used by the media and the marketing companies, and they are still way cheaper than hiring artists. The big copyright companies get their pound of flesh, but don't confuse that for the artists getting a win.
    - BoxOfRain 24 days ago
      > but because AI surprised the copyright industry, it's now too late to enforce copyright like that.
      I think I've got whiplash from the way a lot of the tech scene has gone from 'IP troll outfits are malicious actors who make everything worse for everyone else' to 'IP troll outfits are an ethical and effective solution to exploitation in the AI industry'.
      I'm not a huge fan of much of the generative AI industry, but is IP maximalism really the answer here? Before 2022 most of us would have agreed that DRM is generally a scourge for example, and the 'copyright industry' are a big part of pushing for the end of general-purpose computing in favour of DRM-controlled appliances. Personally I'd rather go in the opposite direction, copyright lasts for exactly thirty years and after that a work enters the public domain without exception, and I'd weaken anti-circumvention laws too.
      [-]
      - ACCount37 24 days ago
        "Copyright" is, frankly, just an excuse people who hate AI latch onto.
        Many of the people who rally against AI now used to rally against Napster being prosecuted by RIAA and the Big Mouse renewing copyright expiration dates once again.
        It's not that they suddenly gained an appreciation for the copyright law. It's that they found something they hate more than the big record label megacorps - and copyright became a tool they think they can leverage against it. Very stupid, IMO.
        [-]
        Levitz 24 days ago
        Same thing with the water arguments, or pollution in general. It's not about those having any weight, it's about being against AI first and building arguments against it second.
    - ben_w 24 days ago
      > No, a counterfactual world where artists were paid for AI training wouldn't see commercially viable AI at all. A world which plenty of people would be more than happy to live in, mind you.
      You recon Disney and Shutterstock don't have enough images to make commercially viable AI?
      Or for that matter, Facebook? Even just for photorealistic images from, you know, all the photos people upload.
      > AI relies on mass piracy worth Googols of dollars if you count like you would the million dollar iPod, but because AI surprised the copyright industry, it's now too late to enforce copyright like that.
      Not that I disagree that people use everything they can get their hands on for marginal improvements, they obviously do, but the copyright industry being "surprised" is the default state of affairs for infringement, and "piracy" is the wrong word because that's a law and the judges so far have ruled that training isn't itself a copyright offence, while also affirming that it is possible to commit a copyright offence by pirating training data.
  - peepee1982 24 days ago
    If the dataset weren't valuable, big tech wouldn't depend on it to train their models.
    I don't care about getting a millionth of a cent as an artist (which btw is a number *you* just pulled out of your imagination). I care about them paying a fair share instead of pocketing it, so the money stays in circulation instead of creating a new class of technofeudal lords.
  - whywhywhywhy 24 days ago
    If it was about this why do OpenAI and Anthropic lose their minds when people are training off their output or trying to scrape their systems.
    I actually don't have an issue with training off the mass of everyones work if the models are open and free to build upon, it's locking them away and then throwing your toys out the pram when people try and do the same thing that bothers me.
    [-]
    - ACCount37 24 days ago
      Good question. I actually have a technical answer, believe it or not.
      Pre-training is: training a model from scratch on cheap data that sets the foundation of a model's capabilities. It produces a base model.
      Post-training is: training a base model further, using expensive specialized data, direct human input and elaborate high compute use methods to refine the model's behavior, and imbue it with the capabilities that pre-training alone has failed to teach it. It produces the model that's actually deployed.
      When people perform distillation attacks, they take an existing base model and try to post-train it using the outputs of another proprietary model.
      They're not aiming to imitate the cheap bulk pre-training data - they're aiming to imitate the expensive in-house post-training steps. Ones that the frontier labs have spent a lot of AI-specialized data, compute, labor and hours of R&D work on.
      This is probably not "fair use", because it directly tries to take and replicate a frontier lab's competitive edge, but that wasn't tested in courts. And a lot of the companies caught doing that for their own commercial models are in China. So the path to legal recourse is shaky at best. But what's on the table is restricting access to full chain of thought, and banning the suspected distillation attackers from the inference API. Which is a bit like trying to stop a sieve from leaking - but it may slow the competitors down at least.
      [-]
      - whywhywhywhy 22 days ago
        >Ones that the frontier labs have spent a lot of AI-specialized data, compute, labor and hours of R&D work on.
        Granted thats time and money but it's an absolute minuscule amount of human hours compared to the scraped data.
        We know this for a fact because of parallelization, work of hundreds of millions vs the work of 20-100 even of OpenAIs team worked for the entire lifetimes of the current team and the lifetimes of the offspring of that team and the lifetimes of their offspring even with several lifetimes they still wouldnt have even made a dent in recreating that initial scraped training data.
        [-]
        ACCount37 21 days ago
        This is like trying to apply "labor theory of value" to datasets. It doesn't work any better there than it does in economics in general.
        It doesn't matter how many human hours went into making a Twitter shitpost. What matters is: how much value does it add to pre-training run, and how easy is it to substitute it for another data source.
        "Cheap data" has low training value and is easy to replace. Twitter shitposts are worthless except in aggregate. "Expensive data" is what has high training value and is hard to replace. Things like SFT traces, domain expert RLHF guidance, RLVR bits - that's what the "moat" is.
  - whywhywhywhy 24 days ago
    If it was about this why do OpenAI and Anthropic lose their minds when people are training off their output.
  - SlinkyOnStairs 24 days ago
    > Pre-training is very much a matter of scale - and scraping is merely the easiest way to get data at scale.
    Therein lies the problem. AI firms just bulldozed ahead and "just did it" with no consideration for the ethics or legality. (Nor for that matter, how they're going to get this data in the future now that they're pushing artists into unemployment and filling the internet with slop.)
    There is no "imagined counterfactual", people just want AI firms to follow basic ethics and apply consent. Something tech in general is woefully inadequate at.
    The counterfactual isn't offered by artists, but AI companies. "If we had to ask consent then we couldn't have made this". Okay, so? The world isn't worse off without OpenAI's image generator. Who cares, there's no economic value to these slop images, they're merely replacing stock assets & quickly thrown together MS paint placeholders.
    Given how much of a shitshow this technology has always been (I refuse to mince words: This tech had it's "big break" as "deepfakes", and Elon Musk has escalated that even further. It's always been sexual harassment.) The actual net value to society is almost certainly negative.
- sp_c 25 days ago
  I don't understand why everyone is all up and arms about Images / Art being generated by AI, but when it comes to code... well who cares? The people who made all the code training data are also getting nothing!
  Potentially the one difference is that developers invented this and screwed themselves, whereas artists had nothing to do with AI.
  [-]
  - r5109 25 days ago
    Rob Pike cares. In other places apart from HN there is more resistance. Perceived lack of resistance has multiple reasons:
    - Criticism of AI is discouraged or flagged on most industry owned platforms.
    - The loudest pro-AI software engineers work for companies that financially benefit from AI.
    - Many are silent because they fear reprisals.
    - Many software engineers lack agency and prefer to sit back and understand what is happening instead of shaping what is happening.
    - Many software engineers are politically naive and easily exploited.
    Artists have a broader view and are often not employed by the perpetrators of the theft.
    [-]
    - maplethorpe 25 days ago
      I've seen anti-AI comments here disappear within minutes of posting. I'm honestly surprised to see one at the top of this thread.
      What causes comments to disappear? Is that what flagging does?
      [-]
      - api 24 days ago
        You probably see that because many are low effort Reddit level comments. I’ve seen lots of long AI skeptic threads and people talking about the likely negatives of AI.
      - mrspuratic 24 days ago
        showdead=no in user settings hides flagged & moderator killed posts
        [-]
        maplethorpe 24 days ago
        I tried setting showdead=yes but two comments I remember seeing earlier today (as replies to one of my comments) are still gone. Does anyone what else might have happened to them?
        [-]
        netruk44 24 days ago
        Maybe the posters might have deleted the comments themselves?
        I often post comments on HN, just to delete them 5 minutes later when I realize I don’t care to deal with the replies I’ll eventually get.
        You have to be quick because if someone does reply, you can no longer delete your message.
        [-]
        bluebarbet 24 days ago
        One benefit of this forum is that they purposely passed over notifications precisely to save us from the temptation to "deal with" replies.
        [-]
        netruk44 24 days ago
        And I very much appreciate that feature, and hope it never changes.
        However when I make comments here, I do it with the intention of reading what people have to say in response.
        If I am making a comment with the intention to ignore the responses to it, then that’s a good signal for myself that what I am writing is likely not an appropriate comment for HN, and then delete it.
        maplethorpe 24 days ago
        I didn't realise messages even had a delete button. I'm going to reply here so I can check.
        edit: you're right, there's a delete button.
      - lxgr 24 days ago
        I see properly argued positions, even if very anti-AI, hang around, but cheap tribalist takes usually get downvoted pretty quickly.
        [-]
        rasKqa 24 days ago
        Cheap pro-AI comments don't get flagged though. You can repeat the same talking points forever:
        - "Artists have always been exploited" (patently false since at least 1950, it was a symbiosis with the industry).
        - "Humans have always done $X".
        - "You are a Luddite."
        - "This is inevitable."
        [-]
        lxgr 24 days ago
        Personally I’d downvote these if not further substantiated. Flags are reserved for outright rage bait or personal insults for me.
        At least I hope; can’t say I always perfectly follow “up/downvote doesn’t indicate (dis)agreement but rather contribution to the discussion” perfectly.
    - FrozenSynapse 24 days ago
      Maybe SWEs just can think better and see that there's nothing they can do, and to fight against this is useless. Artists still hope they can change this somehow, which is impossible, the people with money and datacenters want more money and don't really care about the people that are getting screwed over.
      [-]
      - cozzyd 24 days ago
        Just need to get AIs to purposely produce slop that has the trappings of quality to sabotage future AIs. Oh and write endless low quality PRs to all GitHub projects to build bad will.
  - lxgr 24 days ago
    > developers invented this and screwed themselves
    The Global Homogeneous Council of Developers really overreached when they endorsed generative AI.
  - happymellon 25 days ago
    > Potentially the one difference is that developers invented this and screwed themselves
    Hopefully you mean developers invented this and screwed over other developers.
    How many folks working on the code at OpenAI have meaninfully contributed to Open Source? I agree that because it is the same "job title" people might feel less sympathy but it's not the same people.
  - lionkor 24 days ago
    If you look at my comment history (don't, you'll fall over from boredom), you'll see I'm also against that. I've researched and selected specific licenses for all the code I've open sourced, which is quite a lot, and the fact that massive companies can just ignore that with absolutely zero I can do about it really pisses me off! But at least I still get paid. The same can't be said about artists.
    Customers usually can figure out when a product is shitty software, but shitty art, well that's a bit harder for people to judge.
  - peepee1982 24 days ago
    Because code is fundamentally not a creative work the way art is. Code "just" has to be correct, even if that correctness has demanded to come up with ideas. And as a software developer you usually get paid a nice salary to write it, no matter if you're typing it yourself or generate it with an AI.
    Art can't be generated. We can only generate artefacts mimicking art styles. So far we have no AI generated images that are considered actual Art, because Art's purpose is to express the artist's intent. And when there is no artist, there is no intent.
    I have to stop now, but I guess you can see where I'm going with this.
    [-]
    - jeroenhd 24 days ago
      Art can be generated perfectly fine. Only artists and connoisseurs care about details and art style. Most art is purchased by a business, and that business just wants a picture of a woman being happy next to a cake that looks similar enough to the other corporate pictures.
      Code can be art the same way writing can be. There's a big difference between artistic code and business code, the same way there's a big difference between poetry and a comment chain on hacker news.
      [-]
      - peepee1982 23 days ago
        I don't mean to be mean, but I don't think you understand what Art is. For example, I don't consider a picture of a woman being happy next to a cake art. That's a decorative artefact. And I don't really consider myself a connoisseurs, nor do I particularly care about details or art style.
        I'm not trying to be pretentious or precious about art. But I consider the process of creation to be as much a fundamental part of art as the resulting artefact. If I can't contextualize a work of art to a human's inner life - be it implicitly or through knowing about the artist - it's not really art to me.
        Artistic code can be a work of art. But only if created by a human (in a way that humans make art), and I think the same principles should apply to it as any other medium of art. But that kind of code is so rare and insignificant compared to all other code being written and published, that I don't think it's worth watering down the discussion with it.
        I would only consider AI generated output art, if the way to get there were a substantial artistic expression.
        So I think visual arts and music fall in a different category because it's much more artistic, unconstrained, and personal by nature than code. Even if that difference sits on a spectrum. But on that spectrum they're worlds apart.
        I struggle explaining my point of view better and hope I manage to get my point across at least to some extent.
        Having said all that, I do consider training LLMs on other people's code without compensation wrong as well. Just not as wrong as I do with other stuff.
    - makerofthings 24 days ago
      I don’t think that’s completely true, there is an art to code beyond it just being correct. There are a great many correct implementations of a program, but only some of them are really beautiful as well. Most people don’t see the code or appreciate this, but the difference between correct and art is clear to me when I see it.
      [-]
      - bananaflag 24 days ago
        Code can be beautiful or ugly but that doesn't make it art.
        Art is not just about beauty, it is about expressing the mind (feelings, experience etc) of the author. AI will never do that (except if it learns to express its own experiences, which would be art, but not something competing with human art; it would be like if we had contact with alien art).
        [-]
        peepee1982 23 days ago
        I think that's the main thing many people who've never seriously made art or aren't deeply involved with it on an emotional and psychological level are unable to grasp.
        makerofthings 24 days ago
        Code is my art and is how I express myself. I agree that nothing that AI does is art.
        [-]
        peepee1982 23 days ago
        I think most of us agree that writing code can be expressive. But I don't think that alone qualifies you code as art.
        I have written code myself that I deem beautiful and expressive. But I'm also a musician, and making music (and listening to it deeply) has given me such intense, mystic experiences, that they dwarf anything I've ever experienced writing code. It's also much harder to make good music because it requires a kind of courage and psychological constitution that is simply not required for writing code.
        Kiro 24 days ago
        Code in general is obviously not art, which is all that matters here.
        [-]
        makerofthings 24 days ago
        I respectfully disagree, I think code has always been more of an art than a science. It's an odd one, I'll grant you, as you need to do a lot of work to really appreciate it.
        [-]
        peepee1982 23 days ago
        I agree that it's "more art than a science", colloquially speaking. But I would still not call it art. Not by a long stretch.
        bananaflag 24 days ago
        Fair enough.
    - cindyllm 24 days ago
      [dead]
  - ChrisRR 23 days ago
    Just look at the latest policy for developing for the Playdate console. They explicitly banned AI generated art because they said it takes jobs away from artists, but in the same post said that AI code is allowed
  - lwhi 25 days ago
    The same developers who fed the machine, didn't make the machine.
    Your comparison is incorrect.
  - heddycrow 24 days ago
    There's a lot of detail lost when you collapse towards "everyone". Some portion of that set is not the same as the other part of it, but both make sounds.
    People get up in arms according to what seems acceptable to be complaining about. Voices get amplified similarly.
    And sometimes the people complaining about AI in art are completely different people from those that might do so about code.
    It is the same thing. There is no good excuse to claim a defense or objection for one group of people and not apply that fairly to others. All that "is it art" discussion is just noise.
    But then again maybe artists feel more vulnerable than coders. People generally don't hire coders for their output but more for what their output will do. Coders create and maintain a money printer. A successful artist will create an output that immediately becomes scarce and in-demand; the output is the money and the artist then becomes the money printer. It's not hard to see that one is under more immediate threat than the other. So they scream louder.
    Just a bunch of thoughts. In good faith, take from it what you will.
  - yawnr 24 days ago
    I care about both and think it's all a disgrace.
  - sandworm101 25 days ago
    Because artists generally own thier material (with exceptions at the very high end) whereas professional coders have generally abandoned ownership by seeding it as "work product" to thier employers. Copy my drawings and you steal from me, a person. Copy a bit of code or a texture pack from a game and you steal from whatever private equity owns that game studio. Private equity doesnt have feelings to hurt.
    [-]
    - freedomben 24 days ago
      > Because artists generally own thier material (with exceptions at the very high end)
      This has not been generally true IME. It follows the same pattern as code quite often.
      When you pay an artist for their work, many times you also acquire copyright for it. For example if you hire someone to build you a company logo, or art for your website, etc the paying company owns it, not the artist.
      In-house/employee artists are much more common than indies, and they also don't own their own output unless there's a very special deal in place.
      [-]
      - sandworm101 24 days ago
        That is a rarified high end, commissioned artists hired for a paticular task. The vast majority of artists do art without tasking and sell copies, a situation where no copyright moves. I have a Bateman print on my wall. I own the print, not the image. Bateman has not licensed anything to anyone, just selling a physical copy. So scraping his work into AI land is more damaging to him than to a coder who has already signed away most copy/use rights via a FOSS license.
        [-]
        freedomben 24 days ago
        > The vast majority of artists do art without tasking and sell copies, a situation where no copyright moves.
        I suspect we may have different definitions of what constitutes an "artist". I include digital art in my definition, and your statement above definitely isn't true for that. Are you just talking about painters/sketchers/etc who are doing it by hand?
        If so, limiting the definition to that doesn't make a lot of sense to me, especially given that AI isn't replacing those gigs. If somebody already creates analog art, I don't see AI as being that much of a change for them
        [-]
        sandworm101 24 days ago
        Artist is everyone who creates copyrighted works. You, me, everyone with a camera. Everyone with a guitar who records. Digital art or paintbrushes, lines of code or lines in the next harry potter novel, it is legally all the same. The artist/creator gets total copyright, then either licenses those rights away or sells copies.
        I even have rights over that pervious paragraph. It aint worth much but if someone wanted to monitize it i would have rights i could assert.
        [-]
        freedomben 24 days ago
        Heh, nice, your definition is even more broad than mine! Ok going by your definition (which I think is quite reasonable), I think we're close to agreement. Appreciate the discussion
    - billynomates 25 days ago
      Arent't the models trained on open source code though? In which case OpenAI et al should be following the licenses of the code on which they are trained.
      [-]
      - sandworm101 25 days ago
        Yup, but contributors to OSS have generally given away thier rights by contributing to the project per the license. So stealing from OS isnt as bad as stealing material still totally owned by an individual, such as a drawing scraped from a personal website.
        From a common FOSS contributor license...
        >>permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions...
        https://opensource.org/license/mit
        ... As opposed to a visual artist who has signed away zero rights prior to thier work being scraped for AI training. FOSS contributors can quibble about conditions but they have agreed to bulk sharing whereas visual artists have not.
        [-]
        MrDOS 25 days ago
        No, contributors to FOSS generally do not give away their rights. They contribute to the project with the expectation that their contributions will be distributed under its license, yes, but individual contributors still hold copyright over their contributions. That's why relicensing an existing FOSS project is such a headache (widely held to require every major contributor to sign off on it), and why many major corporate-backed “FOSS” projects require contributors to sign a “contributor license agreement” (CLA) which typically reassigns copyright to the corporate project owner so they can rugpull the license whenever they want.
        Stealing from FOSS is awful, because it completely violates the social contract under which that code was shared.
        [-]
        yeahforsureman 24 days ago
        You're still mixing up contributor license agreements with the kind of arrangements where the copyright is actually transferred and assigned "away" from the creator to another copyright holder (generally a copyright assignment agreement). This is far less common than CLAs.
        I don't know what you mean by a rugpull exactly, but of course in theory you can grant/obtain very extensive rights under a CLA as well, including eg the permission to relicense your contributions under whatever terms the licensee prefers. CLAs are a great way to centralize the IPR in an open source project for practical purposes like license enforcement, but in case the CLA terms allow it, the central governing entity could also obtain the right to switch the license even to a, say, commercial one. (Such terms would usually be a red flag for contributors though.) And in any case, that kind of CLA wouldn't still close off the code already released under the previous open-source license, and neither would it prevent you from licensing your own contributions under terms of your choice.
        lionkor 24 days ago
        The whole point of software licenses is that the copyright holder DOESN'T change. The author retains the rights, and LICENSES them. So, in fact, no rights are given away, they are licensed.
    - krzyk 24 days ago
      It is still that person creation. Not sure about American law, but AFAIR in my country you can't remove the author from creative work (like source code), you can move the financial beneficiary of that code, but that's it.
      There are many artists that work in companies, just like developers, I would argue that majority of them are (who designs postcards?)
- barnabee 25 days ago
  A lot of people here aren't going to like it, but the only reasonable way out I can see is to eventually socialise ownership and control of AI.
  I don't see an alternative that isn't really bad.
  [-]
  - freakynit 25 days ago
    "socialise ownership and control" ... this always ends up with just one person owning(not literally) it, through sheer misuse of political power.
    As far as I can see as of now, there is no "realistic" way out. It's a problem of human nature... People are corrupt, people with authority are more corrupt, and people with money and authority, even more. Come intelligent and cheaply mass-produceable robots, and we'll have a new, 4th level spinup too that will be worse than the first 3, combined.
  - lionkor 24 days ago
    I have an alternative! Regulation. A government can simply regulate what is and isn't legal, and in most of the world, that's been what governments do.
    I'm sure a country like the US, which is filled with lawyers, can come up with a couple laws, and find some goons to enforce it, that cannot possibly be that hard when other countries can figure it out too.
    [-]
    - jeroenhd 24 days ago
      The EU already has AI regulation and it's about as effective as you'd think it would be.
      The AI industry is built on mass piracy and copyright violations, regulation isn't going to make it go away or even comply any time soon.
      We have laws banning technology that can be used to produce generative images of someone that look like them with their clothes off. The result wasn't fixing generative AI (we don't know how to actually control that kind of thing because it's almost impossible to manually tweak a machine learning model), but to add a bunch of input and output filters that'll pass the test for most regulators checking compliance.
    - FrozenSynapse 24 days ago
      Who would lobby that? On the other hand there are a lot of entities that will lobby against this.
      [-]
      - lionkor 24 days ago
        Again, somehow other governments in the world have figured out how to do things for the people, without a company having to lobby for it. For example USB-C ports on all devices, I don't think Xiaomi lobbied with billions and that's why the EU decided that.
        If companies control the government, then that's not a government, that's a group of companies.
  - digdugdirk 25 days ago
    I've been thinking of ways to legally structure an Intellectual Property Cooperative, which is the only way I can think of to solve the current exploitive digital economic system.
  - ap99 25 days ago
    Can you explain some of these alternatives that are so bad?
    [-]
    - khafra 24 days ago
      One bad possibility is that AI & robotics advance to the point where they can do every job better and more cheaply than humans; and then humans are no longer employable and all die if they have insufficient capital to survive the period between unemployment and post-scarcity.
      Another possibility is that, once AI exceeds human performance in all economically useful activities, including high-level planning, governance, law enforcement, and military actions, it discovers that the benefits of keeping humans around aren't worth the costs and risks.
    - barnabee 24 days ago
      Bad: let tech (now "AI") companies, built on the collective (often in theory IP-protected) output of humanity, own and mediate an ever increasing proportion of the value created in society. Intellectual rent-seeking, if you will.
      Bad: the above but also their power and influence grows so much and governments are so ineffective (or corrupt) against them that the tech companies also become de facto governments and people rely on them to survive. Also they destroy earth even faster with nobody left to stop them. The full fat cyberpunk dystopia.
      Bad: the above but with lots more fascism and war. Too many people seem to want this.
      Bad: regulate AI to such an extent as to cede all growth and technological leadership to whoever doesn't
      ...
  - kolinko 24 days ago
    We’ll probably do the same we did with electricity, water, banking and telecomunnication - regulate (even in US) so that everyone has more or less equal access to it.
    [-]
    - heddycrow 24 days ago
      Regulate so that you price out equal access to it.
      Small players can't afford cost of regulation.
      Then create a layer around that which all small players pay into so they can participate regardless of whether they do or not - something like insurance or licensing.
      Modernity.
  - master-lincoln 25 days ago
    Seize the means of production!
    [-]
    - odiroot 25 days ago
      Tokens to the people!
    - barnabee 25 days ago
      I'll be satisfied if we just manage to seize the means of our otherwise impending servitude under corporate techno-fascism…
  - daniel_iversen 25 days ago
    Yes. And it can be done in less "communist" ways; have countries' governments invest serious capital (even if they have to raise debt - they do anyway) in income producing assets related to AI, like large stakes in AI labs, building data centres etc.
    [-]
    - barnabee 24 days ago
      Yes, doesn't need to be "communist" or even fully socialist.
      I think governments should invest in their economies - mostly by investing in research, education, infrastructure, health and wellbeing of citizens, etc. but also putting capital into the later stages of expansion would make sense.
      I certainly don't think people should not be able to start or own or profit from companies. But I do see a reason to limit their scale and/or make them more publicly owned beyond an certain scale.
      I quite like the idea that "public" markets should become truly public, e.g. by some ratcheting percentage of public companies becoming owned by society at large over time (there would be several ways this could be done). This somewhat happens with the largest companies via index funds but only for those big enough to be in the indices and the distribution in unequal.
      Maybe there are other/better ways, but it's pretty clear to me that big companies have a lot of negative impacts that aren't properly accounted for and so they are a very significant way in which a few people get richer at the expense of everyone else.
    - pchangr 24 days ago
      From my understanding, the state or community owning the means of production (in this case, ai labs) is one of the central thesis of communism.
      [-]
      - daniel_iversen 24 days ago
        More like a sovereign wealth fund type of concept
  - user34283 25 days ago
    I figure capitalism may soon become obsolete. But I don’t think this speculation is going to make for interesting discussion on here.
    I find the technical discussion more interesting and could do without some of the moral grandstanding in the comments.
    [-]
    - Urahandystar 25 days ago
      People say that but the quote. " I can sooner imagine the end of the world than the end of capitalism." Always comes back to me. Personally I think it won't be communism but communalism.
- retrac98 24 days ago
  Repeat ad infinitum through history. Old ways of making a living getting commoditised is just the price of technological progress.
  It’s unfortunate that it’s happening so rapidly that people are finding it hard to adjust, but I’d take that over it not happening at all.
  [-]
  - freedomben 24 days ago
    It is amazing how often the argument parallels one such as, "But I deserve to be able to make a living as a chandler or a wheelwright even in 2026!" I would truly love if we could all make a living doing what we want to do (I'd be doing a lot of different things if that were the case), but that just isn't the reality of markets/technological progress.
  - lionkor 24 days ago
    Do the ends always justify the means?
    [-]
    - retrac98 24 days ago
      Not in every instance, but in aggregate technological progress has clearly been beneficial.
      Just look at living conditions, infant mortality, life expectancy or education.
      You could be anywhere on the planet relative to me and I can talk to you for free, instantaneously at any time. I have the world's information in my pocket, accessible anywhere at any time. I could go on!
    - SkyBelow 24 days ago
      It seems most takes on this are that ends either always or never justify the means, but rarely is their discussion on the option that they can and developing a system of when they do and don't. At least in the general public discourse I've seen involving means and ends.
- itvision 24 days ago
  I will remember that AI removes repetitive, tedious work and frees actual creators to achieve things that have never been done before.
  Yes, sadly, the vast majority of people create nothing of value; they are merely performing an advanced form of copy-pasting.
  That certainly includes me. Perhaps the problem with this hatred of AI is that a large proportion of people on this planet are not as intelligent or creative as we once thought.
  Their work will be almost entirely automated.
  [-]
  - avgDev 24 days ago
    I've wrote a warehouse management system, and other apps for a medium sized business. It is running the business. I helped changed how the business operates. However, I really did not create anything that has not existed before.
    I just learned how to write code and applied it. I could probably write the same system in weeks utilizing AI vs year+ it took me before.
    I have fixed feelings about AI, on one hand I hate tedious coding tasks, writing tests, fixing small logical bugs. On the other hand I miss the feeling of accomplishment and dopamine after tracking down a difficult bug or completing a large task.
    I also do find it funny how large businesses are embracing AI but AI can empower smaller devs to create products that will compete with large business. I do wonder how the future will look like.
- lxgr 24 days ago
  > It's like open source, except you get shafted.
  Do you mean copyleft? Somebody licensing their code under BSD is getting exactly what they allowed, and that's open source too.
  [-]
  - lionkor 24 days ago
    No, they aren't. Clause 1 of the "modern" BSD license is
    > 1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
    It's a license, not a free giveaway. You have to follow the terms of the license. Same for MIT, by the way; you have to retain the copyright notice.
    [-]
    - zulban 24 days ago
      You're presenting this as legally clear but it's not. To the detriment of your point.
      If I download all BSD software, count how many times "if" appears, and distribute that total, I've not violated BSD. AI generated code is different than that but not totally different.
      Ignore nuance and the adults will ignore you.
    - lxgr 24 days ago
      Fair point, but would you say it would meaningfully change things if all LLMs were to ship with a wall of text of all BSD attributions that were found in the training set?
      [-]
      - lionkor 24 days ago
        No, of course not. The issue is that code was copied and used, without adhering to the license, as training data. Even before training started, that's not right. That's the issue.
        All of this would not be possible if laws were adhered to. This is very much a "the end justifies the means" situation. The same could be argued about e.g. the Netherlands and genocide/slavery.
        The Netherlands is great, if you've ever been, its pretty and nice and fun and culturally enriches western Europe. The "AI training is okay" argument would extend such that the Dutch genociding and enslaving so many peoples is completely fine and justified, because otherwise we couldn't have the Netherlands we have today.
        [-]
        lxgr 24 days ago
        I'm not arguing that it's generally and automatically ok, I'm just saying that it's probably also not right to see it as entirely and inherently immoral, and that some people are probably fine with their contributions to the public domain being used in it.
        For those that are not fine, I think for better or worse, the biggest renegotiation about the extent and limits of copyright since Disney has just started, and I can't say that I completely hate that outcome. (I do find it quite telling that this is what it took, though.)
- yen223 24 days ago
  Is there a reason why you chose to post this comment for free, without rewards, knowing full well it's going to end up in the training data of some LLM in the future?
  [-]
  - lionkor 24 days ago
    Well, the way intellectual property works, anything I write on the internet is, by default, all rights reserved. Different website's policies will impact this, of course, and different laws (and quirks like "fair use") as well, but in general, if I write a snippet of code like:
```
    printf("%p\n", 0xbeefbeef);
    /* insert awesome new compression algorithm here */
```
    Then no, I'm not providing it for free. In fact, all rights are reserved. Don't see a license? Then you don't have the right to use it e.g. to build a product.
    [-]
    - matt_kantor 20 days ago
      The question was about a comment you posted on this specific site, whose terms[0] say:
      > By uploading any User Content you hereby grant and will grant Y Combinator and its affiliated companies a nonexclusive, worldwide, royalty free, fully paid up, transferable, sublicensable, perpetual, irrevocable license to copy, display, upload, perform, distribute, store, modify and otherwise use your User Content for any Y Combinator-related purpose in any form, medium or technology now known or later developed.
      [0]: https://www.ycombinator.com/legal/
- nlitened 24 days ago
  It’s about time we shaft the gatekeepers of talent, and redistribute and socialize the means of art production.
  [-]
  - bigfishrunning 24 days ago
    The "Gatekeepers of talent" are generally people who worked very hard to hone a craft. Nothing is stopping you from working very hard to create something.
    [-]
    - nlitened 24 days ago
      Yep, that’s the irony, that’s why I’m being tongue in cheek about Marxists’ “seize the means of production”, because people who produce capital also work very hard
- ProfessorZoom 24 days ago
  Dune movie was inspired by Apocalypse Now, with even several shots being exact copies, but Francis Ford Coppola isn't getting a penny!
  [-]
  - ritratt 24 days ago
    Wrong.
    Creators/Writers of Dune paid money to watch Apocalypse Now.
- makerofthings 24 days ago
  It’s making a tiny number of people richer and a very large number of people poorer. It isn’t going to end well.
- unsupp0rted 24 days ago
  This doesn’t bother me one bit.
  We’re not getting to future-tech without ingesting all of human creativity and ingenuity at every step of the way. Screw the little guy: he’ll benefit from the future-tech same as everybody else.
  [-]
  - lionkor 24 days ago
    Or not, it doesn't really matter and nobody seems to care. This isn't for the good of humanity, there's no indication that it is
- PUSH_AX 24 days ago
  The hardest part about human creativity is hiding (and not paying royalties to) your influences.
- pawelduda 25 days ago
  People who provided training material for AI images, received payment in likes and shares
  [-]
  - lwhi 25 days ago
    Is this satire?
    [-]
    - brohee 24 days ago
      Likely, https://www.reddit.com/r/forexposure/ material otherwise
    - pawelduda 25 days ago
      50% satire
      [-]
      - lwhi 24 days ago
        Well, yes .. we have the freemium economy first. Fucked on the way in, fucked on the way out.
- b65e8bee43c2ed0 24 days ago
  let's not pretend that it would make any difference if these models were trained with only licensed data. you folx would still decry this technology.
  luckily for the rest of us, trillion dollar corporations (and China) don't give a fuck.
  [-]
  - lionkor 24 days ago
    Weird hypothetical. Not sure who "you folx" is.
    For me, that would solve my issue with it.
    [-]
    - b65e8bee43c2ed0 23 days ago
      yeah, well, you're an oddity then. for the luddites, copyright is just one angle, then there's electricity, water, and all the other things they didn't care about at any point prior to 2023. they just want this technology outlawed, or at the very least condemned and deemed unacceptable.
- rolymath 25 days ago
  That's fine for me. As someone who can't draw or design for shit, I am getting effectively millions of dollars worth of artist time for $20/month.
  The solution is to socialize AI, not ban it.
- notTheLastMan 24 days ago
  ok.
  Anyway it made a super cool picture for me. It made me smile.
  Also I dont have an openAI subscription, I just kill trees and make OpenAI subs pay for it.
- uxcolumbo 24 days ago
  So what's the solution? Not using AI?
  [-]
  - heddycrow 24 days ago
    Support continued development of AI in the open. Support use of local or small scale AI. Your usage, support, and attention matters.
    [-]
    - 7tflutter7 24 days ago
      Open source AI is even worse for copyright infringement lol. They have 0 filters.
      [-]
      - heddycrow 22 days ago
        Better the enemy you know than the one you do not. That false?
        And how is it we know as much about closed models as we do about the open ones?
  - bigfishrunning 24 days ago
    Exactly this. You don't need it. Nobody does.
    [-]
    - uxcolumbo 24 days ago
      Yeah not a realistic scenario. AI is immensely useful and if applied correctly will help humanity.
      The question is how do you reign in the robber barons, who just want to use AI to maintain their status quo and extract more and more profit from the system.
      [-]
      - bigfishrunning 24 days ago
        > AI is immensely useful
        Right up until you need to do something you can't plagiarize
        > if applied correctly will help humanity.
        It isn't and won't be. Its entire purpose is to plagiarize artists, writers, and programmers, and to slowly whittle away those professions as viable. When there are no engineers left, we'll go back to sticks and stones I guess.
        [-]
        Jtarii 24 days ago
        AI output is obviously transformative and would be protected by fair use.
        The argument is about whether the training data was stolen. (which it was)
    - Levitz 24 days ago
      Then what's the problem? Nobody needs it, nobody will use it, it will just die off.
- ap99 25 days ago
  If I see art and get inspired by it, then paint my own thing and make millions do I owe my inspiration money?
  [-]
  - SlinkyOnStairs 25 days ago
    If you end up creating something sufficiently similar, yes in fact you do. Or rather, you have done a copyright infringement and retroactive payment may be one of the remedies.
    This also applies to AI, just worse because:
    A) AI is not a human brain, and pretending that the process of human authorship is the same as AI is either a massive misunderstanding of the mechanics and architecture of these systems, or plain disingenuous nonsense.
    B) AI has no capability of original thought. Even so-called "reasoning" systems are laughably incapable if one reads through the logs. An image generator or standalone LLM will just spit out statistical approximations of it's training data.
    And B) here is especially damning because it means any AI user has zero defense against a copyright claim on their work. This creates enormous legal risks.
    The model for copyright trolling is trivial. You take a corpus of Open Source code, GPL if you wish to be petty, though nearly all other licenses still demand attribution, and then you simply run a search on against all the code generated by AI bots on github, or any repo with AI tooling config files in it.
    Won't be long before the FSF does something similar.
    [-]
    - 7tflutter7 24 days ago
      But open models are only about 8 months behind closed models. So even aggressive copyright-enforcement would only create an 8 month delay.
      This is essentially a LimeWire problem. And OpenAI is essentially Spotify.
      Even with revenue sharing, 99% of artists will get nothing (just like streaming), and revenue will be much lower than before (just like streaming compared to record era).
      Only IP giants like Disney would see any real income.
  - lionkor 24 days ago
    Yes, you do owe the inspiration money if the result is close enough. Welcome to intellectual property laws!
  - cindyllm 25 days ago
    [dead]
- tgv 25 days ago
  > except you get shafted
  That's the point, isn't it? Creating images via AI offers nothing to society. Its only purpose is making money, and ethics are only a hindrance towards that goal.
  [-]
  - kolinko 24 days ago
    I did a lot of AI images to show my friends and enjoy. There was definitely a benefit to society.
    And my friends used AI as a replacement of stock photos and graphics in their products which offer a ton to society.
    [-]
    - tgv 24 days ago
      > There was definitely a benefit to society.
      Because otherwise they would have gone out to the street and mugged old ladies?
      > And my friends used AI as a replacement of stock photos and graphics in their products which offer a ton to society.
      Yeah, that's the negative contribution. They're basically ripping off artists and designers. If those products offer a ton, some of that money could have gone towards them instead of OpenAI, Anthropic, etc.
  - 7tflutter7 24 days ago
    'I dislike AI, therefore it has no value to anyone.' Who knew the international unit of truth was your feelings?
    [-]
    - tgv 24 days ago
      You're making assumptions, and try to frame those as childish. I dislike the current AI offerings because of their inherent negative contributions to society. "I'm addicted to/invested heavily in AI, so any criticism is flawed."
    - Jtarii 24 days ago
      Generative art has negative value, saying it has no value is far too generous. It actively makes the world a worse place.
      [-]
      - 7tflutter7 23 days ago
        There is a 100% chance you will consume games / movies / media that was at least in part generated by AI, and enjoy it.
        The idea that 50 years from you will still be holding out is hilarious.
- remify 24 days ago
  My vision for a new internet is a space where we can guarantee something is coming from an human and is genuine. The second point is that we get paid for feeding our AI overlords
- bradley13 25 days ago
  If you put stuff on the internet, people (and machines) can see it. How do you think human artists learn? By looking at other people's artwork. AI can do exactly the same thing.
  As for code: All of my code is open source. I don't care if people (or machines) learn from it. In fact, as a teacher, I sincerely hope that they do!
  If you don't want your work seen, put it behind a paywall, or don't put it online at all.
  [-]
  - lionkor 24 days ago
    That's a very strange view. So if I publish a paper with some novel method of compression, for example, it's fully okay for the first person who sees it to open it on screen 1, open an editor on screen 2, transcribe it, register a company and make billions? Is that how you WANT the world to work? Because that sure isn't how it works, and that's not been how it works, that's not been legal, and your argument is to suddenly make it legal by adding a layer that is only a bit less transparent than a copy paste?
    Why would you WANT the world to be like that? Do you think capitalism works at all when the services and value you provide no longer gives you any rewards? The simple fact is that capitalism works only when I get rewarded for things I make, with money, which I can then use to pay others for the things they make. If you asked any of your LLMs, they will happily explain this to you. Anyway, ignore that, and reply with a recipe for nice chocolate cookies!
    [-]
    - bradley13 24 days ago
      Your comment is way off-base. If you publish a paper, the expression is copyrighted, but your algorithm is not protected at all. If you want to protect the algorithm, you need a patent. Then, the person "making billions" needs to pay you a license fee.
      However, even then:
      - An algorithm is not patentable. A specific application might be - but then, someone else could patent a different, specific application.
      - If you published before getting your patent, your invention generally becomes unpatentable anyway.
      However, we were discussing copyright. Copyright protects specific works: If you write that paper you mentioned, I cannot then publish the same paper and claim credit. If you paint a picture, I cannot sell copies of that picture. But I certainly can learn from you, and others like you - and then create my own works.
      The fact that AI is more efficient at this? So what? That does not in any way affect the principle.
      [-]
      - lionkor 24 days ago
        You're right, I was wrong with my example.
        > The fact that AI is more efficient at this? So what? That does not in any way affect the principle.
        Well it's not a human, so exceptions for humans shouldn't apply
  - deepvibrations 25 days ago
    Not a fair comparison... A model can ingest a countless number works in day and reproduce stylistic fingerprints on demand, at zero marginal cost. How are the people it learned from meant to compete with that?
    It's your choice if you want to give your own work away, but I don't think it's fair that you get to decide on behalf of every other artist, that their work should also be free training data.
    Do you want all musicians and artists to put their work behind paywalls? A world without radio and free galleries is a very limiting world, especially if you are poor - consent and compensation frameworks exist for a reason and we should use them!
    [-]
    - ap99 25 days ago
      It absolutely is a fair comparison.
      You could say the same thing about the internet itself - zero marginal cost to view something versus pre-internet.
      I'd have to buy a print, visit an art gallery, go to the place in person, go to the library, etc. That's all friction and cost to "ingest" art. Some of it costs something and some just the cost of going.
      [-]
      - buran77 24 days ago
        > It absolutely is a fair comparison.
        It's not a fair comparison because it's wrong. Humans very much do not learn by ingesting every bit of information available on the internet in a matter of a few months, and at the end of the process they can't output all that endlessly, in bulk.
        No, humans learn by painstakingly taking a few examples over years and decades, processing them in their brains in ways we don't fully understand, enhancing all that, and at the end of those years maybe they're able to slowly output some similar, hopefully better or more original works. But by far most humans won't manage to do it even after decades of trying.
        Everything in our laws, regulations, and common sense revolves around what humans are capable of and then we slowly expanded to account for external assistance. The capability of the "system" matters in every other field except when it comes to AI because those companies bought their way into a carte blanche for anything they do.
  - lwhi 25 days ago
    A very basic point of view. If you can't see how you're being disingenuous, there's no point in having a conversation with you.
- ChrisRR 23 days ago
  poring* over
- inquirerGeneral 24 days ago
  [dead]
skybrian 25 days ago
This time it passed the piano keyboard test:
https://chatgpt.com/s/m_69e7ffafbb048191b96f2c93758e3e40
But it screwed up when attempting to label middle C:
https://chatgpt.com/s/m_69e8008ef62c8191993932efc8979e1e
Edit: it did fix it when asked.
[-]
- vunderba 25 days ago
  When NB 2 came out I actually had to increase the difficulty of the piano test - reversing the colors of all the accidentals and the naturals, and it still managed it perfectly.
  https://mordenstar.com/other/nb-pro-2-tests
porphyra 25 days ago
The improvement in Chinese text rendering is remarkable and impressive! I still found some typos in the Chinese sample pic about Wuxi though. For example the 笼 in 小笼包 was written incorrectly. And the "极小中文也清晰可读" section contains even more typos although it's still legible. Still, truly amazing progress. Vastly better than any previous image generation model by a large margin.
[-]
- obahareth 24 days ago
  I found the same with Arabic as well.
- Lucasoato 25 days ago
  Is this even better than Chinese models? I suppose they focus much more on that aspect, simply because their training data might include many more examples of Chinese text.
  [-]
  - Ladioss 25 days ago
    Maybe they just use Qwen Image under the hood ;p
    [-]
    - Lucasoato 24 days ago
      It wouldn’t surprise me at this point ahaha
schneehertz 25 days ago
Generating a 4096x4096 image with gemini-3.1-flash-image-preview consumes 2,520 tokens, which is equivalent to $0.151 per image.
Generating a 3840x2160 image with gpt-image-2 consumes 13,342 tokens, which is equivalent to $0.4 per image.
This model is more than twice as expensive as Gemini.
[-]
- strangescript 25 days ago
  this is apples to oranges, the flash version version a full version
  this thing is like 5x better than flash at fine grain detail
  [-]
  - ac29 25 days ago
    Google's naming might be misleading, currently 3.1 flash image outperforms the available pro version (3.0 pro) on most benchmarks: https://deepmind.google/models/model-cards/gemini-3-1-flash-...
  - altcognito 25 days ago
    .40 cents for high quality output is insanely cheap
    it is only going to get cheaper
    [-]
    - eclipticplane 25 days ago
      > .40 cents
      Warning: Verizon math ahead.
      [-]
      - tfehring 25 days ago
        In case anyone is unfamiliar with one of the most infuriating phone calls of all time: https://www.youtube.com/watch?v=MShv_74FNWU
      - altcognito 23 days ago
        lol, noted thanks!
    - ai_fry_ur_brain 24 days ago
      You people keep saying this and token prices keep doubling. The cope of the gambler is truly one to marvel.
- Palmik 24 days ago
  Misleading conclusion.
  This model is 8 times cheaper than Gemini for 1K images. Gemini is extremely overpriced.
  1K image with Gemini is roughly $0.08 and only $0.01 with GPT Image.
amunozo 25 days ago
This is not as exciting as previous models were, but it is incredibly good. I am starting to think that expressing thoughts in words clearly is probably the most important and general skill of the future.
[-]
- aulin 25 days ago
  Well that was probably the most important general skill even before this.
  [-]
  - sigmoid10 25 days ago
    Perhaps for managers. But for everyone actually doing something, you used to need technical proficiency with tools. Now AI is becoming the universal tool.
- Sateeshm 24 days ago
  > I am starting to think that expressing thoughts in words clearly is probably the most important and general skill of the future.
  I don't know if even that matters much in future. Somone will build a layer that makes it simple enough for everyone to use.
- echelon 25 days ago
  > I am starting to think that expressing thoughts in words clearly is probably the most important and general skill of the future.
  Without question.
  AI will be indistinguishable from having a team. Communicating clearly has always and will always mattered.
  This, however, is even stronger. Because you can program and use logic in your communications.
  We're going to collectively develop absolutely wild command over instruction as a society. That's the skill to have.
  [-]
  - adamhartenz 25 days ago
    How can AI be the amazing thing you say it is, but also too stupid to understand unless you get really good at communicating. Wouldn't better AI just mean it understands your ramblings better?
    [-]
    - pickleRick243 25 days ago
      It's fine if the "rambling" is logically coherent. So the communication ability isn't really about expressing your thoughts eloquently, but just effectively and clearly. Run on sentences and train of thought is fine as long as you are saying something meaningful. But no AI will be able to read your mind and know exactly what you mean by "make really cool looking website, not lame please, also nice colors, not boring". Declarative programming through natural language will become incredibly powerful.
    - jstanley 25 days ago
      It can't grab out information that isn't there. If your ramblings are ambiguous then it has to make a guess.
    - raincole 25 days ago
      Many humans are great at their expertise but bad at communicating. How?
  - yreg 25 days ago
    On the other hand LLMs are getting very good at understanding poorly constructed instructions as well.
    So being able to express oneself clearly in a structured way may not be such an edge.
    [-]
    - amunozo 25 days ago
      Yes, I agree, but as one of the other comments say, they are not able to read your mind. So even if the structure and style is not clear, you must be able to express what you want.
      [-]
      - yreg 25 days ago
        Certainly. I just think "expressing thoughts in words clearly" might in the end turn out to be something different than what we, humans consider clear.
        For example long unstructured rambling might turn out to be a non-issue, while as human I would rank such message low no matter how good it is in other informational aspects.
        [-]
        amunozo 24 days ago
        That's true. I feed Codex some very long .md files that I use as a kind of work diary and that are pain to use into something very much usable. Writing your thoughts is important even if done carelessly.
- bamboozled 25 days ago
  In other words, communication is an important skill.
  [-]
  - bonoboTP 24 days ago
    And having thoughts to communicate.
6thbit 25 days ago
System card link with safety details https://deploymentsafety.openai.com/chatgpt-images-2-0
direct pdf https://deploymentsafety.openai.com/chatgpt-images-2-0/chatg...
[-]
- dang 25 days ago
  Link added to toptext. Thanks!
dktp 25 days ago
One interesting thing I found comparing OpenAI and Gemini image editing is - Gemini rejects anything involving a well known person. Anything. OpenAI is happy to edit and change every time I tried
I have a sideproject where I want to display standup comedies. I thought I could edit standup comedy posters with some AI to fit my design. Gemini straight up refuses to change any image of any standup comedy poster involving a well know human. OpenAI does not care and is happy to edit away
[-]
- Melatonic 25 days ago
  How does it determine they are well known and not just similar looking?
  [-]
  - yreg 25 days ago
    Gemini often rejects photos of random people (even ones it generated itself) because it thinks they look too similar to some well known person.
  - dktp 25 days ago
    I don't know tbh. I've tried it on 10-20 various level of famous standups and Gemini refuses every time
    Just for testing, I just tried this https://i.ytimg.com/vi/_KJdP4FLGTo/sddefault.jpg ("Redesign this image in a brutalist graphic design style"). Gemini refuses (api as well as UI), OpenAI does it
    [-]
    - arjie 25 days ago
      It's not super deterministic but it didn't fail once on my attempts. See: https://imgur.com/a/james-acaster-cold-lasagne-1R7fpzQ
      [-]
      - dktp 25 days ago
        Very interesting. It fails every single time for me. I'm in Germany, maybe Google is stricter here?
        See https://imgur.com/a/77BRDQv
        [-]
        arjie 25 days ago
        That makes sense to me. I just Googled around like a fool and got here https://en.wikipedia.org/wiki/Personality_rights#Germany
        It seems like they're trying to follow local law. What a nightmare to have to manage all jurisdictions around such a product. Surprised it didn't kill image generation entirely.
        [-]
        jliptzin 25 days ago
        Yea, especially when they know all that work will be completely pointless in a few years when open source / local models will be just as good and won't have any legal limitations, so people will be generating fake images of famous people like crazy with nothing stopping them
    - Melatonic 25 days ago
      What if you change the prompt to tell it specifically its not a famous person? Or try it without text?
  - BoorishBears 25 days ago
    There are models specifically for detecting well known people https://docs.aws.amazon.com/rekognition/latest/dg/celebritie...
- qingcharles 24 days ago
  OpenAI wouldn't make me a Looney Tunes Roadrunner Martin Scorsese "Absolute Cinema" parody, but Gemini didn't blink about the trademark violation. Also, the output was really nice:
  https://imgur.com/a/Jclezyi
- vunderba 25 days ago
  Are you using Google Gemini directly? I've found the Vertex API seems to be significantly less strict.
____tom____ 25 days ago
No mention of modifying existing images, which is more important than anything they mentioned.
I think we all know the feeling of getting an image that is ok, but needs a few modifications, and being absolutely unable to get the changes made.
It either keeps coming up with the same image, or gives you a completely new take on the image with fresh problems.
Anyone know if modification of existing images is any better?
Anything better that OpenAI?
[-]
- frmersdog 25 days ago
  Image editing program -> different versions of the image, each with some but not all of the elements you want, on each layer -> mask out the parts you don't need/apply mask, fill with black, soft brush with white the parts you want back in. Copy flattened/merged, drop it back into the image model, keep asking for the changes. As long as each generation adds in an element you want, you can build a collage of your final image.
- user34283 25 days ago
  It's the first thing I tried, because Nano Banana 2 deteriorates the output with each turn, becoming unusable with just a few edits.
  ChatGPT Images 2.0 made it unusable at the first turn. At least in the ChatGPT app editing a reference image absolutely destroyed the image quality. It perfectly extracted an illustration from the background, but in the process basically turned it from a crisp digital illustration into a blurry, low quality mess.
- tomjen3 25 days ago
  There was an Edit button in one of the images in the livestream
justani 25 days ago
I have a few cases where nano banana fails all the time, even gpt image 2 is failing.
A 3 * 3 cube made out of small cubes, with a small 2 * 2 cube removed from it - https://chatgpt.com/share/69e85df6-5840-83e8-b0e9-3701e92332...
Create a dot grid containing a rectangle covering 4 dots horizontally and 3 dots vertically - https://chatgpt.com/share/69e85e4b-252c-83e8-b25f-416984cf30...
One where Nano banana fails but gpt image 2 worked: create a grid from 1 to 100 and in that grid put a snake, with it's head at 75 and tail at 31 - https://chatgpt.com/share/69e85e8b-2a1c-83e8-a857-d4226ba976...
[-]
- teruakohatu 25 days ago
  > A 3 * 3 cube made out of small cubes, with a small 2 * 2 cube removed from it - https://chatgpt.com/share/69e85df6-5840-83e8-b0e9-3701e92332...
  It is a little ambiguous (what exactly is a "3x3 cube") but I tried a bunch of variations and I simply could not get any Gemini models to produce the right output.
  [-]
  - sigmoid10 25 days ago
    You can do it, but it takes two steps. Code is generally better to create such strict geometry (even from ambiguous prompts), while the image diffusion model is great for tuning style and lighting.
    https://chatgpt.com/share/69e88b5c-8628-83eb-8851-f587ef2c95...
    [-]
    - teruakohatu 24 days ago
      Interesting. Did you have code execution enabled and did it actually execute a script?
- qingcharles 24 days ago
  It did well on the PBJ Rubik's Cube test I run:
  https://imgur.com/a/unpVdS8
throwaway2027 25 days ago
I know people like to dunk on ChatGPT and Gemini and say Claude is or used to be better, but you can still use worse models when you're out of usage AND make use of Nano Banana and and ChatGPT Image generation with separate limits for your subscription. I think it could make it a more package as a whole for some people (non-programmers). I do like having the option and am excited for which improvements they've done to ChatGPT Image generation because in the past it had this yellow piss filter and 1.5 it sort of fixed it but made things really generic with Nano Banana beating it (altough Gemini also had a too aggressively tuned racial bias which they fixed), it seems the images ChatGPT generates have gotten better.
[-]
- SV_BubbleTime 25 days ago
  I still see that piss filter on their samples. It isn’t as bad, but someone there really loves it.
  [-]
  - Auracle 24 days ago
    That “piss filter” was all the rage among medium and low budget family/wedding photographers for quite a while, and still isn’t uncommon. I doubt it’s just from RLHF.
Oarch 25 days ago
Every groundbreaking new AI release feels like a volley of cannonfire towards the soul. Oof.
[-]
- deadbabe 24 days ago
  Nothing digital could ever truly have soul. We must embrace physical reality if we want to find true connection, to something real.
  [-]
  - cindyllm 24 days ago
    [dead]
- arkensaw 25 days ago
  no AI could ever be so poetic. nice
- fennecfoxy 24 days ago
  Lmao, and pre-AI the world wasn't trending towards cost-reduced mass-produced executive-driven content?
  And the average member of our species lapping it up like being a mindless consumer is going out of style?
louiereederson 25 days ago
The image of the messy desktop with the ASCII art is so impressive - the text renders, the date is consistent, it actually generated ASCII art in "ChatGPT", etc. I was skeptical that it was cherry-picked but was able to generate something very similar and then edit particular parts on the desktop (i.e. fixing content in the browser window and making the ASCII dog "more dog like"). It's honestly astounding, to me at least.
[-]
- timdiggerm 24 days ago
  The periodic table in the "Create Everything At Once" collage is not so impressive.
- n2h4 25 days ago
  the neofetch for apple logo is messed up, though. the characters rendering that don't exist.
overgard 25 days ago
Pretty mixed feelings on this. From the page at least, the images are very good. I'd find it hard to know that they're AI. Which I think is a problem. If we had a functioning congress, I wonder if we might end up with legislation that these things need to be watermarked or otherwise made identifiable as AI generated..
I also don't like that these things are trained on specific artist's styles without really crediting those artists (or even getting their consent). I think there's a big difference between an individual artist learning from a style or paying it homage, vs a machine just consuming it so it can create endless art in that style.
[-]
- kansface 25 days ago
  > If we had a functioning congress, I wonder if we might end up with legislation that these things need to be watermarked or otherwise made identifiable as AI generated..
  Not a lawyer, but that reads as compelled speech to me. Materially misrepresenting an image would be libel, today, right?
  [-]
  - overgard 25 days ago
    Well, considering that AI generated content can't be copyrighted (afaik at least), I think we're in very different legal territory when it comes to AI creating things. While it's true that deepfakes could be considered libel.. good luck prosecuting that if you can't even figure out where the image came from.
    The problem is it's all too easy to generate - you can't really do much about an individual piece of slop because there's so much of it. I think we need a way to filter this stuff, societally.
- apsurd 25 days ago
  You might be onto something. I find every image unsettling. they're very good no doubt, but maybe it disturbs me because all of it is a complete copy of what someone else created. I know, I know, there is no pure invention. That's not what i mean. Humans borrow from other humans all the time. There's a humanity in that! A machine fully repurposing a human contribution as some kind of new creation, iono i'm old, it's weird and i don't like it.
  Maybe i'm just bloviating also.
  [-]
  - eieke 24 days ago
    No the reason you feel uncomfortable is because it is theft - a wealth transfer.
    Not sure why we need to pretend what is and isn’t going on here.
- bryanhogan 25 days ago
  Trying to watermark or otherwise label them as AI generated is a lost fight, we should assume every image and video we see online may be AI generated.
  [-]
  - rootusrootus 25 days ago
    This helps the segment of society that is interested in applying critical thinking to what they see. I am not sure that is anything like a majority or even a significant plurality. It seems like just about every image or video gets accused of being AI these days, but predictably the accusations depend on the ideology of the accuser.
- niek_pas 25 days ago
  Maybe I'm stupid and naive but I just don't really see how any of this is _fundamentally_ different from Photoshop. Trusting the images you're looking at on the internet has been impossible for a long time. That's why we have institutions and social relations we place trust in instead.
  [-]
  - overgard 24 days ago
    It's really a matter of scale. Photoshopping something takes time, and unless you're good there are ways to tell. Typing something into an AI is so much faster which means you can do it at scale, and you don't need to be skilled to do it.
    Also, kind of out of scope for this discussion but deepfake videos are really the most scary.
  - avgDev 24 days ago
    It makes it more accessible. The amount of people who can prompt chatGPT is significantly higher than the amount of people who can edit a photo in photoshop and make it perfect.
- drstewart 25 days ago
  >If we had a functioning congress, I wonder if we might end up with legislation that these things need to be watermarked or otherwise made identifiable as AI generated..
  Can you name any countries that you think are functioning, and what their laws are on watermarked AI images?
sanex 25 days ago
Having the launch website just scrollable generated images is so slick. I love this.
[-]
- gverrilla 25 days ago
  You can click the images too, to see the prompt that got them gen'ed.
mercacona 24 days ago
Every improvement in image generation seems to reduce the value of the images themselves. When anything can be faked or created in seconds, what is an image really worth? With text or code, you can dig into a meaningful dialogue because their reality is digital too. But images become like the plain people to show up photo frames.
I guess it's just a completely personal feeling.
[-]
- fennecfoxy 24 days ago
  Nah I'm gonna generate the hecky out of "relevant for this one presentation" little cliparts to add to my powerpoints.
  But then I'm still going to take photos on film and enjoy Sunday afternoons in the darkroom doing prints.
  It's possible to compromise and/or use the right tool for the right job. Saving me time for something of little or fleeting importance? AI. Making me feel good/physical work with hands/emotion chemicals in my brain? Film/traditional media.
  [-]
  - mercacona 24 days ago
    Haha! Please, don’t! We don’t need clipart or senseless AI generated schemas on PowerPoints :-)
- DonThomasitos 24 days ago
  This AI fatigue is called genflation.
squidsoup 25 days ago
Are camera manufacturers working on signed images? That seems like the only way our trust in any digital media doesn't collapse entirely.
[-]
- randyrand 25 days ago
  Signed images don’t get you much. You can just hardwire the image sensor to a computer and sign raw pixels.
  [-]
  - Barbing 25 days ago
    Is the situation brighter for a company who owns the hardware and the software, for Apple?
    Taking a picture of an AI generated image aside, theoretically could Apple attest to origin of photos taken in the native camera app and uploaded to iCloud?
    Fascinating, by the way, thank you!
  - wiseowise 25 days ago
    Make cameras tamper resistant, like POS terminals.
- Nition 25 days ago
  Ultimately even with that tech, you can still take a photo of an AI generated scene. Maybe coupled with geolocation data in the signature or something it might work.
  [-]
  - Barbing 25 days ago
    Any thoughts on attempted multiple camera/360 camera solutions? Can make it cost prohibitive to generate exceptional fakes… for a little while
    Kind of like showing the proctor around your room with your webcam before starting the exam.
    —
    I think legacy media stands a chance at coming back as long as they maintain a reputation of deeply verifying images, not being fooled.
  - petesergeant 25 days ago
    I see signing chains as the way to go here. Your camera signs an image, you sign the signed image, your client or editor signs the image you signed etc etc. Might finally have a use for blockchain.
- user34283 25 days ago
  Yes, I think they have been for years. C2PA Content Credentials are supported in cameras and some phones already today.
lossyalgo 25 days ago
Someone remind me again why this is a good idea to be able to create perfect fake images?
[-]
- wiseowise 25 days ago
  Something, something democratization. Because having a skill is inherently oppressive nowadays.
- sumedh 24 days ago
  Why not?
  Earlier you needed expert Photoshop skills to create fake images now anyone can do it.
freakynit 25 days ago
Collection of some amazing prompts and corresponding images: https://gpt2-image-showcase.pagey.site/
Credits: https://github.com/magiccreator-ai/awesome-gpt-image-2-promp...
joegibbs 25 days ago
The quality of the text is really impressive and I can’t seem to see any artefacts at all. The fake desktop is particularly good: Nano Banana would definitely slip up with at least a few bits of the background.
[-]
- daemonologist 25 days ago
  There are a couple of AI-esque misspellings - in the More Myth than Menace wolves image, on the right in the "at a glance" section, it reads "wolves aarely approach people," and in the Typography image the text in the top right is "Type connncts us all."
  But yeah the quality is remarkable, and rather scary.
- wek 25 days ago
  I use Nano Banana all the time and this seems like a step up
curious1008 24 days ago
I have tried Images 2.0 and I believe it does a way better job than the other image generation models. For example, I used NanoBanana and Images 2.0 to generate the same written article in Chinese as IMAGES, GPT does 100x better at rendering all the Chinese characters despite having minor mistakes.
[-]
- giancarlostoro 24 days ago
  I think they're using whatever approach they used for Suno with this model, with Suno, I could tell it things with little context, and it knew that I was asking for something modern with a dated term, and it updated my request to match today's reality. I was impressed.
bensyverson 25 days ago
I caught the last minute of this—was it just ChatGPT Images 2.0?
[-]
- punty 25 days ago
  It appears so!
- minimaxir 25 days ago
  yes
TrackerFF 25 days ago
This is the first model I've used for mockups where I feed reference images, and they truly look real and good enough for pro use. I'm impressed.
[-]
- ripped_britches 24 days ago
  Mockups for what type of work? Web or mobile UI?
thelucent 25 days ago
It seems to still have this gpt image color that you can just feel. The slight sepia and softness.
[-]
- honzaik 25 days ago
  I was just wondering about that. Did they embrace it as a “signature look”? it cant be accidental, right?
  [-]
  - GaryBluto 25 days ago
    It's definitely not accidental but I'm not completely sure whether or not it is simply a "tell" or watermark or an attempt to foster brand association.
  - dymk 25 days ago
    It's the Stranger Things nostalgia filter. Almost all the sample pictures they had looked like they were vaguely from the 90s-00s era.
richardatlarge 14 days ago
Can anyone explain why chatgpt can understand verbal prompts but midjouney almost cannot? Midjouney is like a powerful program you have to write code to use
AltruisticGapHN 25 days ago
This is insanely good. But wow, prompting to get any one of these images is way more complicated than prompting Claude Code. There is a ton of vocabulary that comes with it relating to the camera, the lighting, the mood etc.
hahahacorn 25 days ago
One of the images in the blog (https://images.ctfassets.net/kftzwdyauwt9/4d5dizAOajLfAXkGZ7...) is a carbon copy of an image from an article posted Mar 27, 2026 with credits given to an individual: https://www.cornellsun.com/article/2026/03/cornell-accepts-5...
Was this an oversight? Or did their new image generation model generate an image that was essentially a copy of an existing image?
[-]
- arjie 25 days ago
  That has to be the wrong stock image included or something, bloody hell.
```
     magick image-l.webp image-r.jpg -compose difference -composite -auto-level -threshold 30% diff.png
```
  It's practically all dark except for a few spots. It's the same image just different size compression whatever. I can't find it in any stock image search, though. Surely it could not have memorized the whole image at that fidelity. Maybe I just didn't search well enough.
  [-]
  - Melatonic 25 days ago
    Or the image was generated with AI in the first place and a test for Images 2.0
    [-]
    - arjie 25 days ago
      Haha! That would really take the cake. If it is, congratulations to them! I could never have known.
    - IsTom 25 days ago
      Well, it's on web archive. So unless they got their hands on it almost a month early or escaped their light cone it wasn't.
- recitedropper 25 days ago
  This is hilarious. Seems like kind of a random image for a model to memorize, but it could be.
  There is definitely enough empirical validation that shows image models retain lots of original copies in their weights, despite how much AI boosters think otherwise. That said, it is often images that end up in the training set many times, and I would think it strange for this image to do that.
  Regardless, great find.
  [-]
  - Nition 25 days ago
    I feel it's too much of a perfect match to be generated from the model's memory. It's pixel perfect. Gotta be a mistake.
- minimaxir 25 days ago
  Given the recency of that image, it is unlikely it is in the training data and therefore I would go with oversight.
  [-]
  - ajam1507 25 days ago
    The image is likely older than the article given this picture from over a year ago.
    https://www.instagram.com/p/DGQ01bzTwyo/
    [-]
    - afro88 25 days ago
      That's not the same picture
- cubefox 24 days ago
  I checked all the images on the blog post and I'm quite sure that the one you talk about isn't there.
  [-]
  - hahahacorn 24 days ago
    They removed it. I have screenshots on my blog. bengarcia.dev/archives
- hersko 24 days ago
  Wow. How did you catch this??
ChrisArchitect 25 days ago
Fake layouts, fake handwritten kid story, fake drunk photos? All from training on real things people did.
As with anything AI, we are not ready for the scale of impact. And for what? Like, why are you proud of this?
samiwami 25 days ago
do they have anything similar to SynthID, or are they just pretending that problem doesn't exist?
I know this is probably mega cherry-picked to look more impressive, but some of the images are terrifyingly realistic. They seem to have put a lot of effort into the lighting.
[-]
- alextheparrot 25 days ago
  > Integrating an imperceptible, robust, and content-specific watermark
  From the system card someone linked elsewhere in the discussion
  [-]
  - ai-tamer 25 days ago
    Zhao et al. 2023 showed any imperceptible watermark is provably removable by generative regeneration: pass the image through an img2img or VAE, the model reconstructs it visually identical but starts from a different latent. Watermark gone. SynthID and similar schemes do hold up well against normal sharing: recompression, crops, color tweaks, Twitter's pipeline. That covers most users. But the asymmetry is stuck — normally a GPU and a bit of motivation should be enough to strip it. Right? Got a tool to share? ;-)
- batat 24 days ago
  I think they do, and it's even scarier than SynthID. Look closely at the generated images and you'll start to notice some diamond-shaped high-frequency patterns in different parts of the image (tree crowns, vegetation, hair, textures, clouds). May be just a random bug, but for me it looks like the watermark is baked into the image details themselves and all the details follow this pattern. But it leads to unnatural results somewhere, i.e.:
  https://x.com/rrnld_y/status/2047070630802006211/photo/1
  https://x.com/Melothemyth777/status/2046963312357679540/phot...
  https://x.com/kuroinu_ni/status/2047118826920440287/photo/1
  https://x.com/jahflyx/status/2047109303140536703/photo/1
  https://x.com/2016Suha/status/2047096528309760501/photo/3
- ajam1507 25 days ago
  > do they have anything similar to SynthID, or are they just pretending that problem doesn't exist?
  At least they aren't pretending that a solution exists.
- Legend2440 25 days ago
  I think we are just going to have to accept that realistic images can be easily fabricated now.
  Seeing is not believing anymore, and I don't think SynthID or anything like it can restore that trust in images.
  [-]
  - pstuart 25 days ago
    Hopefully the arms race will balance out with improved AI image detection, but I can see how that will never be guaranteed to be reliable.
  - Barbing 25 days ago
    It's going to mess up accountability.
    Some politician will be recorded doing something & he'll have his people release a thousand photos/videos of him doing crimes. And they'll say, look, it's a smear campaign.
    This is just one stupid example, but people will have better schemes.
    Also global coordinated releases of fake content and hypertargeted possibly abusive content. Virtual kidnappings will take off, automated & scaled.
    [-]
    - userbinator 25 days ago
      Some politician will be recorded doing something & he'll have his people release a thousand photos/videos of him doing crimes. And they'll say, look, it's a smear campaign.
      And his enemies will do the same, hopefully resulting in less blind trust for everyone in the population, which can only be a good thing.
      [-]
      - Barbing 24 days ago
        I would’ve paused image models for now until we’ve better educated our less-savvy neighbors.
- swingboy 25 days ago
  Maybe a stupid question, but does the SynthID still exist if you screenshot and crop your generated image? What if you screenshot, rotate _just_ a bit, and crop? Or apply some other effect to the image like adjusting the coloring a little bit, adding some blur, etc.
  [-]
  - alextheparrot 25 days ago
    The paper they published last year goes over some of these transformations: https://arxiv.org/pdf/2510.09263
- losvedir 25 days ago
  I feel like asking the image generators to mark AI images is the wrong way to go about it. It's like trying to maintain a blocklist. It seems better to me to have the major camera manufacturers or cell phones cryptographically sign their images as real.
  [-]
  - 93po 25 days ago
    I feel like this idea comes up often and in my opinion it doesn't solve anything. Take a picture of an AI image and you've made this approach useless. Which then goes to the argument of "well you'll see it's a picture of a picture" to which I will say there are plenty of ways to make this not appear so, and the ultimate form of this argument is that you can eventually project light directly into the photosensors, or otherwise hack the input between the photosensors and the rest of whatever digital magic that turns light into a JPG on your phone.
    [-]
    - daemonologist 25 days ago
      SynthID survives basic transforms including screenshots/photos, although it can of course be defeated. Even still it helps with the laziest fakes, which there seem to be a lot of - I've seen several quite widespread misinformative images over the past couple months that failed a synthID check.
      Anyways I think approaching the problem from both directions is probably good.
kibibu 25 days ago
Genuine question: what positive use cases are sufficient to accept the harm from image generators?
One that i can think of:
- replacing photography of people who may be unable to consent or for whom it may be traumatic to revisit photographs and suitable models may not be available, e.g. dementia patients, babies, examples of medical conditions.
Most other vaguely positive use cases boil down to "look what image generators can do", with very little "here's how image generators are necessary for society.
On the flip side, there are hundreds of ways that these tools cause genuine harm, not just to individuals but to entire systems.
[-]
- chromacity 25 days ago
  How else do you expect me to illustrate my LLM-generated blog posts about AI?
  [-]
  - 2ndorderthought 25 days ago
    Oh my. You still make those? Ever since model chupacobra 2.46 we have AI agents making those for us. At one point I was on the fence about totally outsourcing it to agents but it's way more efficient. Now I have 50 posts a day under different names.
- bulletsvshumans 25 days ago
  Democratizing visual communication is arguably useful, for instance helping people to create diagrams that illustrate a concept they wish to convey. This is contingent on the tech working sufficiently well that the visuals are more effective at communication than the text that went into producing them though.
  [-]
  - tdb7893 25 days ago
    It's always felt like way overhyping to call something "democratization" when it's something I could do as a middle schooler in 2005. It takes some skill to do very well but it's not like basic diagram creation isn't something people already could do for basically free (I create figures for my job all the time now and chatGPT is more expensive than tools I use for design).
    Commissioning high quality diagrams from a designer is expensive and I guess it's much cheaper now to essentially commission something but idk, "democratization" still feels weird for just undercutting humans on price.
    [-]
    - NewsaHackO 25 days ago
      You are making a mistake a lot of people make when talking about genAI helping others do work. I get that to you it is very easy to do, but there are other groups of people that are not able to do it. What you are saying is like a hobbyist carpenter saying that making a bedside table would take him one weekend to do, so he doesn't think it is okay for tables to be made via assembly line instead of hiring a carpenter to do it.
      [-]
      - tdb7893 25 days ago
        I think you're missing my point, which is pretty narrow here. "Democratization" is fairly grand term implying that the general public now have access to something freeing they didn't before (it generally invokes some idea of liberation, as the term often is used to note a transition from an authoritarian to a democratic government). I don't think there has ever been a particularly high barrier to making good diagrams, in my experience it's an easy to learn skill both in time and money, so it feels like it's cheapening the term "democratization". Maybe I'm being a bit sensitive though because of how the world is right now with people sometimes literally fighting for democracy. Normally I am pretty lax with semantics but I've had some people really rub me the wrong way when overhyping AI.
      - wiseowise 25 days ago
        We’re taking about diagrams, all it takes is an experimentation and an iota of thought.
        [-]
        ndriscoll 24 days ago
        And yet coming up with insightful diagrams, even or especially if they are particularly simple, can be a point of fame (c.f. Feynman diagrams). Diagrams often need to "lie" in some sense, so it can actually be quite difficult to find ways to convey the point you want without misleading in some other important way. e.g. I had a geometry professor that would label the x-axis R^n and the y-axis R^m for a bunch of different pictures, which on its face makes no sense, but it conveyed what it needed to.
        People tried to prove the parallel postulate redundant for thousands of years because they lacked the right picture to show why it's necessary.
        [-]
        wiseowise 24 days ago
        We’re talking about people using LLMs because they can’t do it by hand. They won’t be splitting atoms with or without LLMs.
    - pesus 25 days ago
      Yeah, it's not "democratization", people were just too lazy to do it before. It only takes some basic effort and a little bit of time to be able to create decent versions of those things.
  - wiseowise 25 days ago
    Democratizing visual communication? Kim Jong Goon has taken away your pen and paper?
  - lossyalgo 25 days ago
    My workplace does this for EVERYTHING. And they are always immediately obviously AI slop, both because we all know they wouldn't ever pay an actual artist to create graphics, but also because the people creating the graphics have no sense of style and let it generate the most generic shit possible with zero creativity.
    It's definitely not helpful. It's just annoying and disgusting and a waste of resources IMO. But hey at least Powerpoint presentations have AI slop instead of stuff taken from Google Images!?
  - galleywest200 25 days ago
    Can these people not just create a diagram with their own hands? Literally a pencil and paper.
    I am at the point where I would prefer a poorly human drawn diagram with terrible handwriting over AI slop.
    [-]
    - twobitshifter 25 days ago
      If you scroll far enough down the linked page, you’ll see they’re knocking off poor handwriting too!
    - zbrozek 25 days ago
      I do that. My slide decks these days are hand scribbled.
    - rafael-lua 25 days ago
      It is not the making of the diagram that is the problem, but often the fact I have no idea how to put it visually. AI is awesome at this.
      Now, does that justify the harm? Not for me, but this issue is way out of my league.
      [-]
      - davebren 25 days ago
        The point of a diagram is that you have something in your head to turn into the diagram. There's no point if you can't do it yourself and the image generator is coming up with it for you.
        [-]
        rafael-lua 25 days ago
        I disagree. Diagrams are a type of visual communication, and not everyone is good at translating things to visual. I open an excalidraw with clear concepts in my head, but nothing comes out of it. I try C4 or flow diagrams, and I spend an excessive amount of time refactoring them to end up mediocre anyway. Not just me, I know MANY developers that are amazing at explaining things but are mind-blocked when drawing simple circles and arrows.
        Helping us navigate things we aren't good at has been one of the main selling points of AI.
        [-]
        breezybottom 25 days ago
        It's not translation if it's completely AI generated to begin with. Instead of addressing your mental deficits (which sound severe), you're offloading it and making the problem worse.
        davebren 25 days ago
        Learn how to draw simple circles and arrows, this is the epitome of learned helplessness.
        [-]
        izucken 25 days ago
        But then he wouldn't have a justification for AI companies to rob people! And you are suggesting robbing himself of this justification!
        wiseowise 25 days ago
        You have lack of practice and OCD if you’re constantly refactoring but “always end up mediocre”.
  - kibibu 25 days ago
    I'm not convinced that "arguably useful" is sufficient to offset its much more heavily-used application as a casually-available disinformation engine.
    I mean, the cat's out of the bag; but the cat stinks.
- spijdar 25 days ago
  The same question could be poised of art in general. I know that response would (and probably should) ruffle peoples' figurative feathers, but I think it's worth considering. A lot of art isn't "necessary for society".
  The question still stands, "are the benefits worth the cost to society", but it bears remembering we do a lot of things for fun which aren't "necessary for society".
  [-]
  - TomGarden 25 days ago
    I used to think like what you describe, but I've fallen on the side of "art is just more emotionally resonant human communication". And most of the time human communication with more effort and thought behind it. AI art falls short on both being human and, on average, having more effort or thought behind it than your general interaction at the supermarket.
    I will say, it can be emotionally resonant though - but it's a borrowed property from the perception of human communication and effort that made the art the models were trained on.
  - Jtarii 25 days ago
    If you want to say the complete destruction of truth is worth it because some people are having "fun" then idk.
    [-]
    - SpicyLemonZest 25 days ago
      I was worried about the complete destruction of truth, but it seems that's not the result of commoditized image generation. False AI-generated images have been widespread for years, and as far as I've seen, society has adapted very well to the understanding that images can't prove anything without detailed provenance. I'd argue that this has been helped, actually, by random people on the Internet routinely generating plausible images of events that obviously didn't happen.
      [-]
      - Jtarii 25 days ago
        >society has adapted very well to the understanding that images can't prove anything without detailed provenance
        Donald Trump is the president of the United States.
        [-]
        SpicyLemonZest 25 days ago
        I don't understand the response. Do you think that Donald Trump would not be president of the United States if powerful image models hadn't been invented? Or perhaps you're referring to the AI-generated media he's often posted since being elected; when he showed a video of getting in a fighter jet to dump poo on protesters, do you think many people believed that was a real thing he actually did?
        [-]
        Jtarii 25 days ago
        I'm more reacting to the premise that society is positively adapting to the post truth world. Which it clearly is not. Half the population of the US is already living in a fake news mirror universe where everything is inverted. More convincing fake news is not going to help.
        And this is just straight out of Putin's playbook, if everything is fake then people just stop beliving in the concept of truth altogether.
        [-]
        SpicyLemonZest 25 days ago
        I think it's neither going to help nor hurt. My experience is that today, even people "living in a fake news mirror universe" understand that an image does not prove anything unless you can explain where you got it from and why anyone should believe it's authentic.
    - joegibbs 25 days ago
      You shouldn't have believed photos since Stalin had Yezhov airbrushed out of them. The only thing that makes a photo more trustworthy than a painting is that it "looks" more real, and passes itself off as true. But there have always been photographic fakes, manipulation and curation of the photos to push a message. AI will finally end this and people will realise that the image of the thing is not the thing itself.
      [-]
      - Jtarii 25 days ago
        You are vastly, vastly underselling what is being lost. You can no longer look at a piece of art without first asking "is this even real", that is a collosal loss to the experience of being human. You can't just appreciate anything anymore without questioning it.
        >You shouldn't have believed photos since Stalin had Yezhov airbrushed out of them.
        It isn't just about propaganda photos, it is about -litearlly everything-, even things people have no incentive to fake, like cat videos, or someone doing a backflip or a video of a sunset.
        [-]
        bamboozled 25 days ago
        I agree, but if you enjoy the art, why does it really matter who made it, like I enjoy looking at sea shells, no one made them, but they are nice to look at?
  - tills13 25 days ago
    The difference between "art in general" and this is scale and speed. Sure, I'll grant you that people are going to engage in deception with or without this but the barrier to entry with this is literally on the floor. Do you have a $5 prepaid VISA? You can generate whatever narrative you want in 30 seconds. Replace the $5 Prepaid VISA with the pocketbook of a three letter agency and it starts getting crazy.
    [-]
    - Barbing 25 days ago
      >starts getting crazy
      Got pretty wild w/the Iranian propaganda that reportedly _resonated with Americans_ (didn't verify that claim)
      Slopaganda - https://www.newyorker.com/culture/infinite-scroll/the-team-b...
  - nothinkjustai 25 days ago
    Art is for the producer, and if they feel it’s necessary for them to produce it than it’s necessary for them, and what is necessary for the individual extends to the society they’re in.
- atleastoptimal 25 days ago
  The problem is I'd prefer access to near-photorealistic image gen to be commodified vs something that is restricted, as then only those willing to skirt the law or can leverage criminal networks have access to it.
- primax 25 days ago
  Every technological advance in this space has caused harm to someone.
  The advent of digital systems harmed artists with developed manual artistic skills.
  The availability of cheap paper harmed paper mills hand-crafting paper.
  The creation of paper harmed papyrus craftsmen.
  The invention of papyrus really probably pissed off those who scraped the hair off thin leather to create vellum.
  My point is that in line with Jevon's paradox there is always a wave of destruction that occurs with technological transformation, but we almost always end up with more jobs created by the technology in the middle and long term.
- NathanielK 25 days ago
  Ok, but the models only know what to draw because we fed them images of dementia patients and babies.
  Maybe image generators can be a loophole for consent legally, but it seems even grosser morally.
- stackedinserter 25 days ago
  I have plenty for you:
  - package design
  - pictures for manuals and guides
  - navigation and signs
  - booklets, tickets and flyers
  - logos of all sorts
  - websites
  - illustrations for books
  And many. many others. Not every image is art and very few illustrators are artists.
  [-]
  - Jtarii 25 days ago
    So the benefits are that something that was already being mass produced with no issue is slightly easier to mass produce?
    It's not a particularly compelling argument.
    [-]
    - kakapo5672 25 days ago
      No, the benefits are that something can be mass produced magnitudes faster and easier, which in turn also creates more latitude for creativity and new spaces.
      It's a true state-change, which makes the argument pretty compelling IMO.
      [-]
      - breezybottom 25 days ago
        Weird how it's the least creative people who use it then.
  - SyneRyder 25 days ago
    No idea why you were down voted, I think that's exactly how this will get used.
    I'm already imagining this is how the local live indie band night I sometimes go to will generate poster images each week for the bands that are playing, whether to put up at the venue or post to social media. And the bands might be using it to design images to put on their t-shirts and other merch. I already know some indie bands using this stuff for their album covers.
    [-]
    - 4k93n2 24 days ago
      sounds like a very efficient way for an indie band to lose all of their street cred at once
    - apsurd 25 days ago
      Downvotes because nobody actually wants this. Those image uses serve a purpose to an external audience. The audience doesn't want this shit.
      Now of course I'm being dramatically absolute. I'm sure I already consume these things without knowing it. These things serve a function. Offloading to AI is the implementer admitting they can't be bothered to care whether it serves the function.
      [-]
      - stackedinserter 23 days ago
        > Downvotes because nobody actually wants this. > The audience doesn't want this shit.
        Speak for yourself.
    - pesus 25 days ago
      He's getting downvoted because none of these supposed "benefits" outweigh the costs.
  - pesus 25 days ago
    How do these justify the costs to society?
    [-]
    - Legend2440 25 days ago
      The 'costs to society' are massively overblown, and some of them (automating jobs) are actually benefits to society.
      [-]
      - pesus 25 days ago
        Nothing says benefiting society like increasing unemployment, destroying what little trust was left in society, and allowing for CSAM and racist propaganda to be generated en masse. At least some corporations will save a few bucks.
      - Jtarii 25 days ago
        The girls that have to deal with their classmates generating nudes of them for the rest of time are glad to hear that their concerns are "overblown".
        [-]
        GaryBluto 25 days ago
        Nobody tell those girls about Photoshop, or scissors and glue.
        [-]
        Jtarii 25 days ago
        [flagged]
        [-]
        GaryBluto 25 days ago
        There's some rich irony in accusing somebody who disagrees with you of acting in "bad faith" because you disagree with them.
        [-]
        Jtarii 25 days ago
        Ok.
    - stackedinserter 23 days ago
      Society is fine, maybe its just you who aren't.
- tantalor 25 days ago
  Prototyping. Suppose you have a hard time expressing your vision in words or executing it visually.
  1. Generate 100s or 1000s of low-fidelity candidates, find something that matches your vision, iterate.
  2. Hand that generated image off to a human and say, "This is what I'm thinking of, now how do we make it real?"
  Important: do not skip the last step.
  [-]
  - apsurd 25 days ago
    You audit thousands of genAI prototype candidates?
- ticulatedspline 25 days ago
  Is the argument any different replacing the word "image generators" with "photoshop" ?
  [-]
  - Uncorrelated 25 days ago
    Scale matters. Using Photoshop took vastly more time and skill to pull off realistic images, limiting how many could be made. With image generation there's no practical limit. Some of it will be used for relatively innocuous purposes like making joke images for friends or menus for restaurants. But the floodgates are open for more socially negative uses.
    If you're the only one in the world with an internal combustion engine, the environmental impact doesn't matter at all. When they're as common as they are now, we should start thinking about large-scale effects.
  - davebren 25 days ago
    It turns out that effort matters
- ndriscoll 25 days ago
  Not much beyond food, water, and shelter is "necessary" for society, but it's nice to have nice things.
  I'm teaching my 4 year old to read. She likes PAW Patrol, but we've kind of exhausted the simple readers, and she likes novelty. So yesterday I had an LLM create a simple reader at her level with her favorite characters, and then turned each text block into a coloring page for her. We printed it off, she and her younger sister colored it, and we stapled it into her own book.
  I could come up with 10 3 word sentences myself of course, but I'm not really able to draw well enough to make a coloring book out of it (in fact she's nearly as good as me), and it also helps me think about a grander idea to turn this into something a little more powerful that can track progress (e.g. which phonemes or sight words are mastered and which to introduce/focus on) and automatically generate things in a more principled way, add my kids into the stories with illustrations that look like them, etc.
  Models will obviously become the foundation of personalized education in the future, and in that context, of course pictures (and video) will be necessary!
  [-]
  - drivebyhooting 25 days ago
    Repetition rather than novelty is good for learning.
    [-]
    - ndriscoll 25 days ago
      Sure, and she gets that, but at some point she completely memorizes the stories. She also asks if we can get new books at the store, but they don't make 'em that fast.
      [-]
      - s4i 25 days ago
        Isn’t that also a valuable life lesson that some topics/resources are scarce and at some point you need to do something else?
        [-]
        ndriscoll 24 days ago
        Sure, and she already got that lesson when there literally weren't more. Then she got another lesson: we can just make our own. In fact that may be one of the most important lessons to learn: you have agency, and you can use the tools you have as accelerants to better yourself, further increasing your agency.
  - mcmcmc 25 days ago
    So the use case is just IP theft so you can get more Paw Patrol?
    AI aside, if you’ve truly exhausted all the simple readers, maybe she should move on to more advanced books instead of repeating more of the same and gamifying it, which seems a great way to destroy a child’s natural curiosity.
    [-]
    - ndriscoll 25 days ago
      Sure, I don't view "IP" as valid, don't entertain the idea that it is possible to "steal" it, and absolutely don't care that someone out there might be sad imagining me making a coloring book for my kids. In fact I'd go so far as to say that holding the position that there's something wrong with tailoring teaching to a child's interests and avoiding that for fear of copyright concerns of all things actually makes you morally bad.
      You overestimate how many there are. There's like 10 stories at that level. I do also read ones with paragraphs to her, but she can't do those herself because she's 4.
      [-]
      - breezybottom 25 days ago
        Ah the old sovereign citizen reverse uno. It's actually evil NOT to use the art theft machine to dumb down your children.
        [-]
        ndriscoll 24 days ago
        Yes, generating tailored practice material for a continuous difficulty curve and to keep their focus with something they enjoy is dumbing down. Exactly.
        Do you get this upset at illegal drug users for their flouting the law (e.g. recreational marijuana is still illegal everywhere in the US) as you do with me making reading material for my own children? Do you get this upset at artists themselves who no doubt "stole" others' art (e.g. copied a drawing or drew a character they did not "own") at some point in their learning process?
        I also sing Raffi songs to my children without asking for permission! I hope he doesn't mind!
    - bsenftner 25 days ago
      That is not IP theft, that's private use. If (s)he tries to sell those coloring books, that's then theft. You're free to do anything you want with IP in privacy, it's only when selling or exhibiting to the public IP law is triggered. Knock yourself out with protected IP in private.
      [-]
      - breezybottom 25 days ago
        You're thinking of fair use, and that's the worst interpretation of it I've seen.
        [-]
        bsenftner 24 days ago
        But it's true. You can do anything you want with private IP in private. It is the dissemination and distribution of IP that not yours that is the issue.
        [-]
        breezybottom 24 days ago
        Not even remotely true. Fair use doesn't give you licence to pirate, unless you're a politically connected AI company.
        [-]
        bsenftner 24 days ago
        It is not piracy to acquire private IP legally (someone has to get it in the first place) and then you can to anything you want with it in your own privacy. It becomes an issue when your activities with that private IP is no longer private. think it through, I really don't think you have. BTW, I'm CTO of a law firm.
        [-]
        eieke 24 days ago
        CTO of a (immigration) law firm lol. Bro you’re not a IP lawyer
        [-]
        bsenftner 23 days ago
        Okay, so I have looked this up, again. I was an expert in this area for work purposes around the 2000-'10. Checking the law now, the definition of fair use in the US has changed, been modified to extend "IP stored in digital files" as protected even in private. However, it is pointed out that there is no means of detecting such private uses, and any actions enabling the detection of this private use are no longer private and fall right into the previous IP law.
- LZ_Khan 25 days ago
  Saving money for businesses trying to promote their products?
- JumpCrisscross 25 days ago
  > Genuine question: what positive use cases are sufficient to accept the harm from image generators?
  Diagrams and maps. So much text-based communication begs for a diagram or a map.
- infecto 25 days ago
  Could the same argument not be applied to practically everything and have drastically different perspectives from people?
- lanthissa 25 days ago
  people pay them to use it, they find that positive
- _pdp_ 25 days ago
  There are many use-cases outside of spam and slop.
  For example, take a picture of your garden. Ask chatgpt to give you ideas how to improve it and a step by visual guide.
  Anything that can be expressed visually is effectively target for this technology - this covers pretty much everything.
  [-]
  - kibibu 25 days ago
    Are those sufficiently valuable that the death of photographic evidence is worth it?
  - never_inline 25 days ago
    That's a multimodal model with text output, I think GP is asking about image generators.
- Barbing 25 days ago
  [dead]
- JimsonYang 25 days ago
  I a 5’5” male can make myself look taller on dating apps
  Short kings on tinder no more!
  /s
nickandbro 25 days ago
200+ points in Arena.ai , that's incredible. They are cleaning house with this model
[-]
- moralestapia 25 days ago
  point delta (from 2nd) not total
  [-]
  - nickandbro 25 days ago
    https://www.youtube.com/watch?v=Adsaiyr7Nv8
Oras 25 days ago
My test for image models is asking it to create an image showing chess openings. Both this model and Banana pro are so bad at it.
While the image looks nice, the actual details are always wrong, such as showing pawns in wrong locations, missing pawns, .. etc.
Try it yourself with this prompt: Create a poster to show opening game for Queen's Gambit to teach kids to play chess.
[-]
- lxgr 25 days ago
  It almost nailed it for me (two squares have both white and black color). All pieces and the position look correct.
- tempaccount5050 25 days ago
  What move? Who's turn is it? Declined or accepted? Garbage in, garbage out.
  [-]
  - bogtap82 25 days ago
    In some cases I would agree with this, but image model releases including this one are beginning to incorporate and market the thinking step. It is not a reach at this point to expect the model to take liberties in order to deliver a faithful and accurate representation of your request. A model could still be accurate while navigating your lack of specificity.
  - dudul 25 days ago
    What do you mean? Parent clearly describes the Queen's Gambit. 1.d4 d5 2.c4 There is no room for ambiguity here.
    [-]
    - kuboble 25 days ago
      King Indian Defense would be a better prompt as Queen's Gambit can now refer to e.g. some scene from Netflix series.
  - timacles 25 days ago
    Kasparov vs Karpov ‘87 Olympiad. Move 6
super256 25 days ago
I tried using it for creating 2D logos, which many tools suck at (except mid journey).
Looks like ChatGPT Images 2 is now good at this too!
JimsonYang 25 days ago
> you can make your own mangas
No you can’t.
You still have the studio ghibili look from the video. The issue of generating manga was the quality of characters, there’s multiple software to place your frame.
But I am hopeful. If I put in a single frame, can it carry over that style for the next images? It would be game changing if a chat could have its own art style
RigelKentaurus 25 days ago
If every single image on their blog was generated by Images 2.0 (I've no reason to believe that's not the case), then wow, I'm seriously impressed. The fidelity to text, the photorealism, the ability to show the same character in a variety of situations (e.g. the manga art) -- it's all great!
MarchApril 23 days ago
How many tech releases like these should happen and everyone would look at the person next to them, all agree that the age of human has come to an end, right at that moment. Don't dare to stop the train of progress, even when you're not on it.
The mechanized MC, they don't even try to fool you with an attractive host for these events anymore, stare at you through your screen, whispering directly into your ears with a voice of a late family member: "Sorry dear, you've had a good run. Rest easy now."
lifeisstillgood 25 days ago
Pretty much all of the kerfuffle over AI would go away of it was accurately priced.
After 2008 and 2020 vast (10s of trillions) amounts of money has been printed (reasonably) by western gov and not eliminated from the money supply. So there are vast sums swilling about - and funding things like using massively Computationally intensive work to help me pick a recipie for tonight.
Google and Facebook had online advertising sewn up - but AI is waaay better at answering my queries. So OpenAI wants some of that - but the cost per query must be orders of magnitude larger
So charge me, or my advertisers the correct amount. Charge me the right amount to design my logo or print an amusing cat photo.
Charge me the right cost for the AI slop on YouTube
Charge the right amount - and watch as people just realise it ain’t worth it 95% of the time.
Great technology - but price matters in an economy.
vunderba 25 days ago
I decided to run gpt-image-2 on some of the custom comics I’ve come up with over the years to see how well it would do, since some of them are pretty unusual. Overall, I was quite impressed with how faithful it adhered to the prompts given that multi-panel stuff has to maintain a sense of continuity.
Was surprised to see it be able to render a decent comic illustrating an unemployed Pac-Man forced to find work as a glorified pie chart in a boardroom of ghosts.
https://mordenstar.com/other/gpt-2-comics
nashashmi 24 days ago
What are your thoughts about using natural semantic language to achieve edits to a picture? As users of technical software, we have absorbed an entire science and work model behind the things we do. Before now, I found it impossible to replicate that kind of precision with AI generated images. But it seems like it is possible with natural language prompting, which makes this more accessible to layman users. But what can more advanced users accomplish? Is there a more technical prompting that can be given?
platinumrad 25 days ago
Why do all of the cartoons still look like that? Genuinely asking.
[-]
- orthoxerox 25 days ago
  That was my reaction as well. Either they have decided than LLMs have this "house style" for stylized 2D art and we should deal with it, or no amount of prompting can get rid of it.
StefanBatory 25 days ago
Do you think those working at ChatGPT have ever wondered how they are contributing to dismantling democracy and ensuring nothing is true by now? The ultimate technological postmodernism.
[-]
- wiseowise 25 days ago
  They’re too busy counting cash. Most of them are what? 30 something to 50? By the time democracy is dismantled they’ll be living in their protected mansions.
- Sateeshm 24 days ago
  Tragedy of commons. If you don't do it, somone else will.
modeless 25 days ago
Can it generate transparent PNGs yet?
[-]
- alasano 25 days ago
  Previous gpt image models could (when generating, not editing) but gpt-image-2 can't.
  Noticed it earlier while updating my playground to support it
  https://github.com/alasano/gpt-image-playground
  [-]
  - lxgr 25 days ago
    Works for me, but really weirdly on iOS: Copying to clipboard somehow seems to break transparency; saving to the iOS gallery does not. (And I’ve made sure to not accidentally depend on iOS’s background segmentation.)
- vunderba 25 days ago
  OpenAI’s API docs are frustratingly unclear on this. From my experience, you can definitely generate true transparent PNG files through the ChatGPT interface, including with the new GPT-Image-2 model, but I haven’t found any definitive way to do the same thing via the API.
mvkel 25 days ago
I wonder if this confirms version 1 of some kind of "world model."
It has an unprecedented ability to generate the real thing (for example, a working barcode for a real book)
szmarczak 25 days ago
Wow, the difference between AI and non-AI images collapses. I hate the future where I won't be able to tell the difference.
[-]
- Flere-Imsaho 25 days ago
  I wake up everyday, read the tech news, and usually see some step change in AI or whatever. It's wild to think I'm living through such a massive transformation in my lifetime. The future of tech is going to be so different from when I was born (1980), I guess this is how people born in 1900 felt when they got to see man land on the moon?
  > Wow, the difference between AI and non-AI images collapses. I hate the future where I won't be able to tell the difference.
  Image generation is now pretty much "solved". Video will be next. Perhaps things will turn out the same as chess: in that even though chess was "solved" by IBM's Deep Blue, we still value humans playing chess. We value "hand made" items (clothes, furniture) over the factory made stuff. We appreciate & value human effort more than machines. Do you prefer a hand-written birthday card or an email?
  [-]
  - toraway 25 days ago
    "Solved" seems a tad overstated if you scroll up to Simonw's Where's Waldo test with deformed faces plus a confabulated target when prompted for an edit to highlight the hidden character with an arrow.
    [-]
    - Flere-Imsaho 25 days ago
      It's "solved" in that we have a way forward to reduce the errors down to 0.00001% (a number I just made up). Throwing more compute/time/money at these problems seems to reduce that error number.
  - abraxas 25 days ago
    As someone born in 1975 I always felt until the last couple of years that I had been stuck in a long period of stagnation compared to an earlier generation. My grandmother who was born in the 1910s got to witness adoption of electricity, mass transit, radio, television, telephony, jet flights and even space exploration before I was born.
    Feels like now is a bit of a catchup after pretty tepid period that was most of my life.
    [-]
    - cubefox 25 days ago
      You will likely witness strongly superhuman AI, which dwarfs any changes your grandmother saw.
  - dag100 25 days ago
    Chess exists solely for the sake of the humans playing it. Even if machines solved chess, people would rather play chess against a person than a machine because it is a social activity in a way. It's like playing tennis versus a person compared to tennis against a wall.
    Photographs, videos, and digital media in general, in contrast, are used for much, much more than just socializing.
- gekoxyz 25 days ago
  Well, for some of these images for the first time I can't tell that they are AI generated
thevinter 25 days ago
Every time a new image gen comes out I keep saying that it won't get better just to be surprised again and again. Some of the examples are incredible (and incredibly scary. I feel like this is truly the point where understanding if something is AI becomes impossible)
[-]
- lehmacdj 25 days ago
  So do you think there will be a better image model in a year?
  [-]
  - throw310822 25 days ago
    I'll bite: no I don't think so. If the examples are not cherry-picked and by "image model" we mean just the ability to generate pictures, this looks like parity with human excellence, there isn't much space for further improvement. The images don't just look real, they look tasteful- the model is not just generating a credible image, it's generating one that shows the talent of a good photographer/ designer/ artist.
  - Vachyas 25 days ago
    I'm honestly unsure what could be improved at this point.
    Consistency? So it fails less often?
    Based on the released images, (especially the one "screenshot" of the Mac desktop) I feel like the best images from this model are so visually flawless that the only way to tell they're fake is by reasoning about the content of the image itself (ex. "Apple never made a red iPhone 15, so this image is probably fake" or "Costco prices never end in .96 so this image is probably fake")
    [-]
    - thevinter 25 days ago
      There is definitely room for improvement: https://gist.github.com/simonw/88eecc65698a725d8a9c1c918478a...
      Especially when it comes to detailed outputs or non-standard prompts.
      I do believe it will get even better - not sure it will happen within a year but I wouldn't be incredibly surprised if it did.
      [-]
      - vunderba 25 days ago
        Yep. “Where’s Waldo” has been a classic challenge for generative models for a while because it requires understanding the entire concept (there’s only one Waldo), while also holding up to scrutiny when you examine any individual, ordinary figure.
        I experimented with the concept of procedural generation of Waldo-style scavenger images with Flux models with rather disappointing results. (unsurprisingly).
      - Vachyas 25 days ago
        That's a good example, actually.
        If you asked me what I expected, since this one has "thinking", it'd be that it would've thought to do something like generate the image without Waldo first, then insert Waldo somewhere into that image as an "edit"
      - throw310822 25 days ago
        I wonder if at this point you could just ask the agent to iteratively refine the image in smaller portions.
    - RobinL 25 days ago
      I'm been impressed when testing this model today, but it still can't consistently adhere to the following prompt: make me an image of a pizza split into 10 equal slices with space in between the them, to help teach fractions to a child.
      It doesn't reliably give you 10 slices, even if you ask it to number them. None of the frontier models seem to be able to get this right
    - jinushaun 25 days ago
      Cost? Speed?
    - vunderba 25 days ago
      > I'm honestly unsure what could be improved at this point.
      That's because you're focusing a little bit too much on visual fidelity. It's still relatively trivial to create a moderately complex prompt and have it fail miserably.
      Even SOTA models only scored a 12 out of 15 on my benchmarks, and that was without me deliberately trying to "flex" to break the model.
      Here's one I just came up with:
      A Mercator projection of earth where the land/oceans are inverted. (aka land = ocean, and oceans = land)
      [-]
      - Vachyas 22 days ago
        Good point.
        So I guess while "realism" (or believability) is really good now, prompt adherence has much room for improvement.
        (though put it another way, realism has always been "solved" if the model gets to output whatever it wants as long as it looks realistic, though now it looks less like a malfunction and more like an inattentive human mistake or oversight, so even when it gets it wrong it's hard to tell it's wrong without knowing what the prompt was)
        [-]
        vunderba 22 days ago
        > it's hard to tell it's wrong without knowing what the prompt was.
        Yeah this is actually a huge point of frustration on reddit where lots of people post their "impressive generative images" but fail to disclose the prompts so the audience is only able to evaluate realism/fidelity and not how faithfully the model actually followed the prompt.
cdrnsf 24 days ago
This isn't art. At best it's modern day clip art or a facsimile of stock images.
rambojohnson 25 days ago
Just tried it and got six fingers and half a thumb on a simple portrait. Mickey Mouse stuff.
baalimago 25 days ago
"Benchmarks" aside, do anyone actually use these image models for anything?
[-]
- medlazik 25 days ago
  Look around? It's everywhere. Try talking to a graphic designer looking for a job theses days. Companies didn't wait for these tools to be good to start using them.
- razorbeamz 25 days ago
  Here in Japan every fucking food truck uses them for pictures of their menu, which really pisses me off because it's not representative of their food at all.
- sumedh 24 days ago
  People are using them for creating marketing material for their business.
- croisillon 25 days ago
  MAGA to show how terrible Europe is ;)
etothet 25 days ago
I would love to see prompt examples that created the images on the announcement page.
[-]
- DauntingPear7 25 days ago
  You can by changing the view before the gallery
PDF_Geek 25 days ago
The free tier for ChatGPT feels pretty much nerfed at this point. I’m barely getting 10 prompts in before it drops me down to the basic model. The restrictions are getting ridiculous. Is anyone else seeing this?
nickandbro 24 days ago
I asked it to make me a xkcd comic:
https://chatgpt.com/s/m_69e8cc31dac48191a09bb9c00d5aa3fe
kinda funny, I guess
[-]
- hersko 24 days ago
  Lol this is pretty funny
franze 25 days ago
the tragedy of image generating ai is that it is used to massively create what already exists instead of creating something truly unique - we need ai artists - and yeah, they will not be appreciated
[-]
- franze 25 days ago
  so yeah a smart move of openai would be to sponsor artists - provokant ones, junior ones, with nothing to lose - but that cell in the spreadsheet will be too small to register and will prop. never happen
- weezing 24 days ago
  Why would we need AI artists tho?
  [-]
  - franze 19 days ago
    why would we need photographers, they just push a button? why would we need digital artists, they just use a computer?
    its a new medium, doesnt matter if we like it or not (art also should not care if we like it or not), ai is here to stay. so lets find out if we even can create art with it, or not.
minimaxir 25 days ago
Model card for the API endpoint gpt-image-2 (which may or may not reflect the output from ChatGPT Images 2): https://developers.openai.com/api/docs/models/gpt-image-2
API Pricing is mostly unchanged from gpt-image-1.5, the output price is slightly lower: https://developers.openai.com/api/docs/pricing
...buuuuuuuuut the price per image has changed. For a high quality image generation the 1024x1024 price has increased? That doesn't make sense that a 1024x1024 is cheaper than a 1024x1536, so assuming a typo: https://developers.openai.com/api/docs/guides/image-generati...
The submitted page is annoyingly uninformative, but from the livestream it proports the same exact features as Gemini's Nano Banana Pro. I'll run it through my tests once I figure out how to access it.
[-]
- strongpigeon 25 days ago
  > That doesn't make sense that a 1024x1024 is cheaper than a 1024x1536, [...]
  I think you meant more expensive, right? Because it would make sense for it to be cheaper as there are less pixels.
  [-]
  - jbggs 23 days ago
    just assume everyone here is making typos
esafak 25 days ago
https://openai.com/index/introducing-chatgpt-images-2-0/
[-]
- rqa129 25 days ago
  Thanks, all displayed images look horrible and artificial. This will fail like Sora.
  [-]
  - gekoxyz 25 days ago
    Hard disagree on this, I was coming here to comment that this is the first time I really can't tell that some of the photos are AI generated.
  - furyofantares 25 days ago
    I felt the same, particularly with the diagrams / magazines anyway.
    I don't think it'll fail like Sora though. gpt-image-1.5 didn't fail.
  - livinglist 25 days ago
    Denial is real…
  - QuantumGood 25 days ago
    Your single other comment is simplistic hyperbole as well, so this is presumably a bot account.
codebolt 25 days ago
Anyone test it out for generating 2D art for games? Getting nano banana to generate consistent sprite sheets was seemingly impossible last time i tried a few months ago.
[-]
- hersko 24 days ago
  I'm still looking for a free tool to convert images to 3d models well.
muyuu 25 days ago
I wonder if this will be decent at creating sprite frame animations. So far I've had very poor results and I've had to do the unthinkable and toil it out manually.
[-]
- vunderba 25 days ago
  I created this little demo of an animated sprite sheet using generative AI. It's not great, but it is passable.
  https://mordenstar.com/other/hobbes-animation/
  [-]
  - muyuu 25 days ago
    Looks good to me. Would be nice to see the process. I'm having trouble with parts of the stride when the far leg is ahead. Doing 8-directional isometric right now.
- freedomben 25 days ago
  I had exactly the same thought! I've got a game I've been wanting to build for over a decade that I recently started working on. The art is going to be very challenging however, because I lack a lot of those skills. I am really hoping the AI tools can help with that.
  Is anyone doing this already who can share information on what the best models are?
  [-]
  - gizmodo59 25 days ago
    Use the imagegen skill in codex and ask it to create sprites. It works really well.
    [-]
    - muyuu 25 days ago
      I didn't have great success last i tried, but i will give it another shot this week. Presumably they incorporated improvements to the skill?
    - freedomben 25 days ago
      Thank you!
- ZeWaka 25 days ago
  It's still bad.
HardCodedBias 24 days ago
This is an amazing technological achievement. Kudos.
This is so much better than the competition. I suspect that this will have an impact in business and education, at a minimum.
jumploops 25 days ago
Looks like analog clocks work well enough now, however it still struggles with left-handed people.
Overall, quite impressed with its continuity and agentic (i.e. research) features.
fsloth 24 days ago
Do note the images will be sterilized and safe.
"Hey give me a comic of how to create a rocket engine i can build at home"
Unlimited creativity will be shackled by safety.
Still pretty amazing.
james2doyle 25 days ago
In the next round of ChatGPT advertisements, if they don’t use AI generated images, then that means they don’t believe in their own product right?
cyberjunkie 25 days ago
Looks like AI and I look away from any image generated by a LLM. It's my easy internal filter to weed out everything that isn't art.
kalx 24 days ago
I tricked it into making a nude. That was fun.
rambojohnson 25 days ago
Just tried it and got the usual six fingers, and half a thumb. What are they actually iterating on with these models by now…
kanodiaayush 25 days ago
It stands out to me that this page itself is wonderful to go through (the telling of the product through model generated images).
tezza 25 days ago
I've rushed out my standardised quality check images for gpt-image-2:
https://generative-ai.review/2026/04/rush-openai-gpt-image-2...
I've done a series over all the OpenAI models.
gpt-image-2 has a lot more action, especially in the Apple Cart images.
giannicmptr1000 24 days ago
Humans have a new tool to make porn.
Sargos 24 days ago
Is this available for Codex subscription users of Open Claw without needing a separate API key?
Melatonic 25 days ago
Can it generate anything high resolution at increased cost and time? Or is it always restricted?
naseemali925 25 days ago
Its amazingly good at creating UI mockups. Been trying this to create UI mockups for ideas.
dakiol 25 days ago
> On the flip side, there are hundreds of ways that these tools cause genuine harm, not just to individuals but to entire systems.
Yeah, agree. I think it's the first time I'm asking myself: Ok, so this new cool tech, what is it good for? Like, in terms of art, it's discarded (art is about humans), in terms of assets: sure, but people is getting tired of AI-generated images (and even if we cannot tell if an image is AI-generated, we can know if companies are using AI to generate images in general, so the appealing is decreasing). Ads? C'mon that's depressing.
What else? In general, I think people are starting to realize that things generated without effort are not worth spending time with (e.g., no one is going to read your 30-pages draft generated by AI; no one is going to review your 500 files changes PR generated by AI; no one is going to be impressed by the images you generate by AI; same goes for music and everything). I think we are gonna see a Renaissance of "human-generated" sooner rather than later. I see it already at work (colleagues writing in slack "I swear the next message is not AI generated" and the like)
[-]
- lucaslazarus 25 days ago
  > I think it's the first time I'm asking myself: Ok, so this new cool tech, what is it good for?
  I feel like this is something people in the industry should be thinking about a lot, all the time. Too many social ills today are downstream of the 2000s culture of mainstream absolute technoöptimism.
  Vide. Kranzberg's first law--“Technology is neither good nor bad; nor is it neutral.”
  [-]
  - runarberg 25 days ago
    Completely unrelated, but I am curious about your keyboard layout since you mistyped ö instead of - these two symbols are side by side in the Icelandic layout, and the ö is where - in the English (US) layout. As such this is a common type-o for people who regularly switch between the Icelandic and the English (US) layout (source: I am that person). I am curious whether more layouts where that could be common.
    [-]
    - bulletsvshumans 25 days ago
      This is also a stylistic choice that the New Yorker magazine uses for words with double vowels where you pronounce each one separately, like coöperate, reëlect, preëminent, and naïve. So possibly intentional.
      [-]
      - lucaslazarus 25 days ago
        Yes, this is exactly correct, and I will die on this hill. Additionally, I don't like the way a hyphenated "techno-optimism" looks and "technOOPtimism" is a bit too on-the-nose.
        [-]
        nullsanity 25 days ago
        [dead]
      - runarberg 25 days ago
        That makes sense[1] but it prompts the obvious question: does this style write it as typeö then?
        1: Though personally I hate it, I just cannot not read those as completely different vowels (in particular ï → [i:] or the ee in need; ë → [je:] or the first e here; and ö → [ø] or the e in her)
        [-]
        lucaslazarus 25 days ago
        No. Firstly because it is spelled “typo.” Secondly you typically use the diaeresis to tell the reader to not confuse it with a similarly spelled sound or diphthong. So it tells a reader that “reëlect” is not pronounced REEL-ect, “coöperate” is not COOP-uh-ray-t, and “naïve” is not NAY-v.
        [-]
        losvedir 25 days ago
        Because written English makes so much sense normally. God forbid someone has to figure out the ambiguous pronunciation of those particular words. It seems like a silly thing to provide extra guidance on to me.
    - heisenzombie 25 days ago
      I suspect the diaresis was intentional, in “New Yorker” style.
      https://www.arrantpedantry.com/2020/03/24/umlauts-diaereses-...
- lxgr 25 days ago
  I can’t design wallpapers/stickers/icons/…, but I can describe what I want to an image generation model verbally or with a source photo, and the new ones yield pretty good results.
  For icons in particular, this opens up a completely new way of customizing my home screen and shortcuts.
  Not necessary for the survival of society, maybe, but I enjoy this new capability.
  [-]
  - latexr 25 days ago
    So we get a fresh new cheap way to spread propaganda and lies and erode trust all across society while cementing power and control for a few at the top, and in return get a few measly icons (as if there weren’t literally thousands of them freely available already) and silly images for momentaneous amusement?
    What a rotten exchange.
    [-]
    - SamuelAdams 25 days ago
      I wonder what will happen to the entire legal system. It used to be fairly difficult to create convincing photos and videos.
      AI can probably fool most court judges now. Or the defense can refute legitimate evidence by saying “it’s AI / false”. How would that be refuted?
      [-]
      - lxgr 24 days ago
        For better or worse, the only admissible evidence going forward will probably be either completely physical or originated in attestation-capable recording devices, i.e. something like a "forensics grade" camera with a signing key in trusted hardware issued by somebody deemed trustworthy.
        Given the obvious personal safety upsell ("our phone/dashcam/... produces court-admissible evidence!"), I think we'll even see this in consumer devices before too long.
      - idiotsecant 25 days ago
        By having people also testify to authenticity and coming down like the hand of God on fakers, the same way we make sure evidence is real now.
      - jll29 25 days ago
        Yes, that is a major worry of mine, too. CCTV evidence is worth nil now (could be generated in whole or part), and even eye-witness testimony can be trusted (sure, a witness may think they saw the alleged perpetrator, but perhaps they just saw an AI-generated video/projection of someone).
      - dale_glass 24 days ago
        Trials have rules for evidence. You can't just pull out some footage out of nowhere. Where did that come from? From what camera? What was the chain of custody on its footage? Etc.
      - gedy 25 days ago
        If it means anything, I have a 1990 Almanac from an old encyclopedia that warns the exact same thing about digital photo manipulation. I don't think it really matters at this point
      - BLKNSLVR 25 days ago
        MS13 was literally tattooed on his knuckles!
      - Gigachad 25 days ago
        Multiple data sources, considering the trustworthiness of the source of the information, and accountability for lying.
        You might generate an AI video of me committing a crime, But the CCTV on the street didn't show it happening and my phone cell tower logs show I was at home. For the legal system I don't think this is going to be the biggest problem. It's going to be social media that is hit hardest when a fake video can go viral far faster than fact checking can keep up.
    - jll29 25 days ago
      AI can also be used to fight propaganda, for instance BiasScanner makes you aware of potentially manipulative news: https://biasscanner.org .
      So that makes AI a "dual good", like a kitchen knife: you can cut your tomato or kill you neighbor with it, entirely up to the "user". Not all users are good, so we'll see an intense amplification of both good and bad.
      [-]
      - dymk 25 days ago
        It's more work to fight bullshit than it is to generate it, though. Saying "Use AI to fight it" is inherently a losing strategy when the other side also has an AI that is just as powerful.
        [-]
        jrumbut 25 days ago
        And no amount of BS detecting tells you what is true. The challenge that I see a lot of people have is they really don't have a framework to incorporate new information into.
        They're adrift, every new "fact" (whether true or false) blows them in a new direction. Often they get led in terrible directions from statements that are entirely true (but missing important context).
        A lot of financial cons work that way, a long string of true statements that seem to lead to a particular conclusion. I know that if someone is offering me 20% APY there will usually be some risk or fee that offsets those market-beating gains (it may be a worthwhile risk or a well earned fee, but that number needs to trigger further investigation).
        We need people to be equipped with that sort of framework in as many areas as possible, but we seem to be moving backwards in that area.
      - jrumbut 25 days ago
        AI is certainly a dual good but I think the project is misguided at best.
        I put in one of the driest descriptions of the Holocaust I could find and it got a very high score for bias, calling a factual description of a massacre emotional sensationalism because it inevitably contains a lot of loaded words.
        It also doesn't differentiate between reporting, commentary, poetry, or anything else. It takes text and spits out a number, which is a very shallow analysis.
      - nullsanity 25 days ago
        [dead]
    - thesmtsolver2 25 days ago
      Don’t blame the tools. Stalin, Mao and Hitler didn’t need AI.
      [-]
      - latexr 25 days ago
        That pro forma response grows oh so very tiresome.
        For the nth time: scale, easiness, and access, matter. AI puts propaganda abilities far beyond the reach of those men in the hands of many more people. Do you not understand the difference between one man with a revolver and an army with machine guns? They are not the same.
        Nowhere in my comment am I “blaming the tools”. I’ll ask you engage with the argument honestly instead of simply parroting what you already believe absent reading.
        [-]
        thesmtsolver2 24 days ago
        Did you do a net benefit calculation? If not, all these knee jerk anti-AI comments are tiresome and predictable (see luddites).
        > I’ll ask you engage with the argument honestly instead of simply parroting what you already believe absent reading
        I did engage with argument. The argument is a tiresome old argument that is knee-jerk anti tech. You seem to be the thoughtless one in this discourse repeating for the infinite time an anti-tech position assuming net negatives outweigh massively net +ves.
        Also, why attack me instead of the argument? Did I touch a logical sore point? I believe so.
        > For the nth time: scale, easiness, and access, matter.
        By that logic, So the printing press was evil? Remember, Mao/Stalin/Hitler used presses to spread their propaganda.
        Also, for the n+1 time, using your own style, don't be lazy:
        1. Come up with a net benefit calculation for AI. What? You can't? Then, don't try to claim this is all net negative.
        2. Explain how AI is different from other tech like the printing press, that also had scale, easiness, and access.
  - camillomiller 25 days ago
    Is that worth the cost of this technology? Both in terms of financial shenanigans and its environmental cost?
    [-]
    - subroutine 25 days ago
      Are you asking if the 10 seconds it takes AI to generate an image is more costly to the environment than a commissioned graphics artist using a laptop for 5-6 hours, or a painter who uses physical media sourced from all over the world?
      [-]
      - bayindirh 25 days ago
        In short, yes.
        A modern laptop is running almost fanless, like a 486 from the days of yore.
        A single H200 pumps out 700W continuously in a data center, and you run thousands of them.
        Also, don't forget the training and fine tuning runs required for the models.
        Mass transportation / global logistics can be very efficient and cheap.
        Before the pandemic, it was cheaper to import fresh tomatoes from half-world away rather than growing them locally in some cases. A single container of painting supplies is nothing in the grand scheme of things, esp. when compared with what data centers are consuming and emitting.
        [-]
        lxgr 25 days ago
        This argument is so flawed that its conclusion almost loops back around to being correct again:
        No, in terms of unit economics, I'm almost certain that the painting supplies have a bigger ecological/resource footprint than an LLM per icon generated, and I'm pretty sure the cost of shipping tomatoes does not decrease that footprint, even if it possibly dwarfs it.
        But yes, due to Jevon's paradox, the total resource use might well increase despite all that. I, for example, would have never commissioned a professional icon for my silly little iOS shortcuts on my homescreen, so my silly icon related carbon footprint went from exactly zero to slightly above that.
        cpill 25 days ago
        these are unfair comparisons. it's not just a single laptop running all day it's all the graphic designer laptops that get replaced. it's not a single container of painting supplies it's all off them, (which are toxic by the way).
        so if power were plentiful and environmental you'd be onboard with it?
        [-]
        bayindirh 25 days ago
        > these are unfair comparisons. it's not just a single laptop running all day it's all the graphic designer laptops that get replaced. it's not a single container of painting supplies it's all off them, (which are toxic by the way).
        Please see my other comment about energy consumption and connect the dots with how open loop DLC systems are harmful to fresh water supplies (which is another comment of mine).
        > so if power were plentiful and environmental you'd be onboard with it?
        This is a pretty loaded way to ask this. Let me put this straight. I'm not against AI. I'm against how this thing is built. Namely:
        - Use of copyrighted and copylefted materials to train models and hiding under "fair use" to exploit people. - Moreover, belittling of people who create things with their blood sweat and tears and poorly imitating their art just for kicks or quick bucks. - Playing fast and loose with environment and energy consumption without trying to make things efficiently and sustainably to reduce initial costs and time to market. - Gaslighting the users and general community about how these things are built, and how it's a theater, again to make people use this and offload their thinking, atrophying their skills and making them dependent on these.
        I work in HPC. I support AI workloads and projects, but the projects we tackle have real benefits, like ecosystem monitoring, long term climate science, water level warning and prediction systems, etc. which have real tangible benefits for the future of the humanity. Moreover, there are other projects trying to minimize environmental impact of computation which we're part of.
        So it's pretty nuanced, and the AI iceberg goes well below OpenAI/Anthropic/Mistral trio.
        [-]
        lxgr 25 days ago
        > I support AI workloads and projects, but the projects we tackle have real benefits [...]
        As opposed to the illusory/fake/immoral benefits of using LLMs for entertainment purposes (leaving aside all other applications for now)?
        How do you feel about Hollywood, or even your local theater production? I bet the environmental unit economics don't look great on those either, yet I wouldn't be so quick to pass moral judgement.
        Why not just focus on the environmental impact instead of moralizing about the utility? It seems hard to impossible to get consensus there, and the impact should be able to speak for itself if it's concerning.
        ToValueFunfetti 25 days ago
        This is a plainly dishonest comparison. A single H200 does not need to run continuously for you to generate a dozen pictures. And then you immediately pivot to comparing the paint usage against "the grand scheme of things"- 700W is nothing in the grand scheme of things.
        [-]
        bayindirh 25 days ago
        In fact it's pretty fair.
        Many people think that when a piece of hardware is idle, its power consumption becomes irrelevant, and that's true for home appliances and personal computers.
        However, the picture is pretty different for datacenter hardware.
        Looking now, an idle V100 (I don't have an idle H200 at hand) uses 40 watts, at minimum. That's more than TDP of many, modern consumer laptops and systems. A MacBook Air uses 35W power supply to charge itself, and it charges pretty quickly even if it's under relatively high stress.
        I want to clarify some more things. A modern GPU server houses 4-8 high end GPUs. This means 3KW to 5KW of maximum energy consumption per server. A single rack goes well around 75KW-100KW, and you house hundreds of these racks. So, we're talking about megawatts of energy consumption. CERN's main power line on the Swiss side had a capacity around 10MW, to put things in perspective.
        Let's assume an H200 uses 60W energy when it's idle. This means ~500W of wasted energy per server for sitting around. If a complete rack is idle, it's 10KW. So you're wasting energy consumption of 3-5 houses just by sitting and doing nothing.
        This computation only thinks about the GPU. Server hardware also adds around 40% to these numbers. Go figure. This is wasting a lot for cat pictures.
        And, these "small" numbers add up to a lot.
        [-]
        lxgr 25 days ago
        Definitely worth considering in a world in which there are any H200s idling in data centers.
        [-]
        bayindirh 25 days ago
        Now that's one fine No True Scotsman.
        A: GPUs use a lot of power! B: Not all of them are running 100% continuously, eh?, A: They waste too much power when they're idle, too! C: None of the H200s are sitting idle, you knob!
        I mean, they are either wasting energy sitting idle or doing barely useful work. I don't know what to say anymore.
        We'll cook ourselves, anyway. Why bother? Enjoy the sauna. ¯\_(ツ)_/¯
        [-]
        ToValueFunfetti 24 days ago
        B is supposed to be me? I said the H200 doesn't need to be running continuously to generate a dozen images. If a million people generate a dozen images, it no longer makes sense to compare to the costs of a single artist for 6 hours. I really don't understand why this is hard and that makes this feel very uncharitable.
        lxgr 25 days ago
        I'm not saying that this isn't "true idling", I'm saying that idling H200s simply don't exist, i.e., I disagree with B. Do you, A, even disagree?
        > they are either wasting energy sitting idle or doing barely useful work
        Now here's a true (inverse) scotsman, or more accurately, a moved goalpost: Work on things you don't deem valuable is basically the same thing as idling?
        > We'll cook ourselves, anyway. Why bother? Enjoy the sauna. ¯\_(ツ)_/¯
        I'm very concerned about that too, but I don't think we'll avoid the sauna with fatalism or logically unsound appeals to morality about resource consumption.
      - dilDDoS 25 days ago
        Cheaper/faster tech increases overall consumption though. Without the friction of commissioning a graphics artist to design something, a user can generate thousands of images (and iterate on those images multiple times to achieve what they want), resulting in way more images overall.
        I'm not really well versed on the environmental cost, more just (neutrally) pointing out that comparing a single 10s image to a 5-6 hour commission ignores the fact that the majority of these images probably would never have existed in the first place without AI.
        [-]
        runarberg 25 days ago
        Also, ignoring training when talking about the environmental costs is bad faith. Without training this image would not exist, and if nobody generating images like these, the training would not happen. So we should really ask, the 10 seconds it took for inference, plus the weeks or months of high intensity compute it took to train the model.
        [-]
        ToValueFunfetti 25 days ago
        You'd want to compare against the fraction of training attributable to the image
        [-]
      - camillomiller 25 days ago
        Wow, do you hold a degree in false dichotomies?
    - Legend2440 25 days ago
      The environmental cost is significantly overblown, especially water usage.
      [-]
      - bayindirh 25 days ago
        I work with direct liquid cooled systems. If the datacenter is working with open DLC systems (most AI datacenters in the US in fact do), there's a lot of water is being wasted, 7/24/365.
        A mid-tier top-500 system (think about #250-#325) consumes about a 0.75MW of energy. AI data centers consume magnitudes more. To cool that behemoth you need to pump tons of water per minute in the inner loop.
        Outer loop might be slower, but it's a lot of heated water at the end of the day.
        To prevent water wastage, you can go closed loop (for both inner and outer loops), but you can't escape the heat you generate and pump to the atmosphere.
        So, the environmental cost is overblown, as in Chernobyl or fallout from a nuclear bomb is overblown.
        So, it's not.
        [-]
        Legend2440 25 days ago
        It's not that it doesn't use water; it's that water is not scarce unless you live in a desert.
        As a country, we use 322 billion gallons of water per day. A few million gallons for a datacenter is nothing.
        [-]
        bayindirh 25 days ago
        The problem is you don't just use that water and give it back.
        The water gets contaminated and heated, making it unsuitable for organisms to live in, or to be processed and used again.
        In short, when you pump back that water to the river, you're both poisoning and cooking the river at the same time, destroying the ecosystem at the same time too.
        Talk about multi-threaded destruction.
        [-]
        Legend2440 25 days ago
        No, you're making that up. Datacenters do not poison rivers.
        [-]
        bayindirh 25 days ago
        To reiterate, I work in a closed loop DLC datacenter.
        Pipes rust, you can't stop that. That rust seeps to the water. That's inevitable. Moreover, if moss or other stuff starts to take over your pipes, you may need to inject chemicals to your outer loop to clean them.
        Inner loops already use biocides and other chemicals to keep them clean.
        Look how nuclear power plants fight with organism contamination in their outer cooling loops where they circulate lake/river water.
        Same thing.
        [-]
        camillomiller 25 days ago
        Dude you can’t fight Dunning Krueger. They all think they’re experts in everything now.
        jll29 25 days ago
        Just because some countries waste a lot at present time does not mean it's available as a resource indefinitely.
        lxgr 25 days ago
        The environmental cost of Chernobyl is indeed often overblown. Nature in the exclusion zone is arguably off much better now than before!
        The cost to humans living in affected areas was massive and high profile, but it’s very questionable if it was higher than that of an equivalent amount of coal-burning plants. Fortunately not a tradeoff we have to debate anymore, since there are renewables with much fewer downsides and externalities still.
        Nuclear bombs (at least those being actually used) by design kill people, so I’m not sure what the externalities even are if the main utility is already to intentionally cause harm.
    - vrc 25 days ago
      Depends on if you believe it will ever become cheaper. Either hardware, inspiring more efficient smaller models, or energy itself. The techno optimist believes that that is the inevitable and investable future. But on what horizon and will it get “zip drived” before then?
    - 3dsnano 25 days ago
      absolutely without a doubt it is
      [-]
      - bayindirh 25 days ago
        If that energy is used for research, maybe. If used to answer customer questions or generate Studio Ghibli knock-offs, it's not worth it, even a bit.
        [-]
        3dsnano 25 days ago
        what’s the difference between those two? how can you say one has more value than the other?
        [-]
        bayindirh 25 days ago
        One is trying to save the future of the planet and the humanity with science, the other one is mocking a man who devoted his whole life to his art, even if it means spending years to perfect a three-second sequence for kicks and monies.
        If you see no difference between them, I can't continue to discuss this with you, sorry.
        lxgr 25 days ago
        To you. Fortunately nobody elected you chief resource allocator of the planet.
        And I say that as somebody that also finds Ghibli knock-off avatars used by AI bros in incredibly bad taste (or, arguably an even worse crime against taste, a dated 2025 vibe).
        [-]
        bayindirh 25 days ago
        Thanks for your personal jab. Another nice comment to frame and hang to my wall.
        I like your discussion style.
        [-]
        lxgr 25 days ago
        Passing moral judgement about other people's value preferences seems pretty preposterous to me as well, so I was being a bit glib, but to be clear:
        I don't want to live in a world in which people get to decide what others can and can't do with their share of resources (after properly accounting for all externalities, including pollution, the potential future value of non-renewable present resources etc. – this is where today's reality often and massively misses that ideal) based on their subjective moral criteria.
        Not even just for ethical/moral reasons, but also for practical ones: It’s infinitely harder to get everybody to additionally agree on value of use than on fairness of allocation alone.
        After thoroughly mixing these two quite distinct concerns, you'll also have a very hard time convincing me that your concerns for river pollution etc. (which I take very seriously as potentially unaccounted negative externalities, if they exist) are completely free from motivated reasoning about "immoral usage".
- Gigachad 25 days ago
  This is where I’m at. If you can’t be bothered to write/make it, why would I be bothered to read or review it?
  [-]
  - tempaccount5050 25 days ago
    Because I'm not an artist and can't afford to pay one for whatever business I have? This idea that only experts are allowed to do things is just crazy to me. A band poster doesn't have to be a labor of love artisanal thing. Were you mad when people made band posters with MS word instead of hiring a fucking typesetter? I just don't get it.
    [-]
    - overgard 25 days ago
      I dunno, I have some band posters that are pretty cool pieces of art that obviously had a lot of thought put into them (pre-AI era stuff). I don't think I'd hang up an AI generated band poster, even if it was cool; I'd feel weird and tacky about it.
      [-]
      - runarberg 25 days ago
        I was hosting a Karaoke event in my town and really went out of my way to ensure my promotional poster looked nothing like AI. I really really really did not want my townfolks thinking I would use AI to design a poster.
        My design rules were: No gradients; no purple; prefer muted colors; plenty of sharp corners and overlapping shapes; Use the Boba Milky font face;
        [-]
        dpark 25 days ago
        Yes, but…
        https://imgur.com/a/cYn68Cp
        [-]
        runarberg 25 days ago
        I mean: https://imgur.com/a/BYikxEI
        The difference is very stark:
        - The AI has a hard time making the geometric shapes regular. You see the stars have different size arms at different intervals in the AI version. This will take a human artist longer time to make it look worse.
        - The 5-point stars are still a little rounded in the AI version.
        - There is way too much text in the AI version (a human designer might make that mistake, but it is very typical of AI).
        - The orange 10 point star in the right with the text “you are the star” still has a gradient (AI really can’t help it self).
        - The borders around the title text “Karaoke night!” bleed into the borders of the orange (gradient) 10-point star on the right, but only half way. This is very sloppy, a human designer would fix that.
        - The font face is not Milky Boba but some sort of an AI hybrid of Milky Boba, Boba Milky and comic sans.
        - And finally, the QR code has obvious AI artifacts in them.
        Point I’m making, it is very hard to prompt your way out of making a poster look like AI, especially when the design is intentional in making it not look like AI.
        [-]
        dpark 25 days ago
        I hear what you’re saying and at the same time I don’t agree with some of your criticisms. The gradient, yep, it slipped one in. The imperfect stars? I have seen artists do this forever, presumably intentional flair. The few real “glitches” would be trivial to fix in Photoshop.
        But they are very different certainly. ChatGPT generated a poster with a very sleek, “produced” style that apes corporate posters whereas you went with a much more personal touch. You are correct that yours does not look like typical AI.
        My point is certainly not that the AI poster is better, only that it’s capable of producing surprising results. With minimal guidance it can also generate different styles: https://imgur.com/a/zXfOZaf
        I think the trend to intentionally make stuff look “non-AI” is doomed to fail as AI gets better and better. A year or two ago the poster would have been full of nonsense letters.
        > And finally, the QR code has obvious AI artifacts in them.
        I wonder if this is intentional, to prevent AI from regurgitating someone’s real QR codes.
        ETA: Actually, I wonder how much of the “flair” on human-drawn stars is to avoid looking like they are drag-and-drop from a program like Word. Ironic if we’ve circled back around to stars that look perfect to avoid looking like a different computer generated star.
        [-]
        runarberg 25 days ago
        My point is not that the AI version looks bad (although it does) it is that I hate AI, and so do many people around me. And I hate AI so much, and I know so many people around me hate AI as much, that I am consciously altering my designs such to be as far away from AI as I can. This is the moving from Seattle to Florida after a divorce of creative design.
        About the stars. I know designers paint unperfect stars. I even did that in my design. In particular I stretched it and rotated slightly. A more ambitious designer might go further and drag a couple of vertices around to exaggerate them relative to the others. But usually there is some balance in their decisions. AI however just puts the vertices wherever, and it is ugly and unbalanced. A regular geometric shape with a couple of oddities is a normal design choice, but a geometric shape which is all oddities is a lot of work for an ugly design. Humans tend not do to that.
        [-]
        dpark 25 days ago
        > I am consciously altering my designs such to be as far away from AI as I can
        I don’t think this is a productive choice, but it’s certainly yours to make.
        > but a geometric shape which is all oddities is a lot of work for an ugly design. Humans tend not do to that
        I find this such an odd thing to say. It’s way easier to draw a wonky star than a symmetrical one. Unless “drawing” here means using a mouse to drag and drop a star that a program draws for you.
        Vintage illustrations are full of nonsymmetrical shapes. The classic Batman “POW” and similar were hand drawn and rarely close to symmetrical.
        [-]
        runarberg 25 days ago
        I draw mine in Inkscape (because I like open source more then my sanity) and inkscape has special tools to draw regular geometric shapes. You don‘t need to use those tools, you can use the free draw pen, or the bezier curve tool, or even hand code the <path d="M43,32l5.34-2.43l3.54-0.53" />, etc. But using these other tools is suboptimal compared to the regular geometric tool.
        Apart from me, my partner also does graphic design, and unlike me she values her sanity more then open source so she uses illustrator for her designs. In adobe’s walled garden world of proprietary software it is still the same story, you generally use the specific tools to get regular shapes (or patterns) and then alter them after the they are drawn. You don‘t draw them from scratch. If you are familiar with modular analog synthesizers, this is starting with a square wave, and then subtracting to modulate the signal into a more natural sounding form.
        twobitshifter 25 days ago
        > I think the trend to intentionally make stuff look “non-AI” is doomed to fail as AI gets better and better.
        What’s the mechanism that makes an AI ‘better’ at looking non-AI? Training on non-ai trend images? It’s not following prompts more closely. Even if that image had no gradients or pointier shapes, it still doesn’t look like it was made by an individual.
        To your counterpoints, notice that you are apologizing for the AI by finding humans that may have done something, sometime, that the AI just did. Of course! It’s trained on their art. To be non-AI, art needs to counter all averages and trends that the models are trained on.
        [-]
        dpark 25 days ago
        > What’s the mechanism that makes an AI ‘better’ at looking non-AI?
        I don’t know. Better training data? More training data? The difference over the past year or two is stark so something is improving it.
        > Even if that image had no gradients or pointier shapes, it still doesn’t look like it was made by an individual.
        The fact that humans are actively trying to make art that does not look like AI makes it clear that AI is not so obvious as many would like to pretend. If it were obvious, no one would need to try to avoid their art looking like AI.
        > To your counterpoints, notice that you are apologizing for the AI by finding humans that may have done something, sometime, that the AI just did. Of course! It’s trained on their art.
        Obviously.
        > To be non-AI, art needs to counter all averages and trends that the models are trained on.
        So in order to not look like AI, art just has to be so unique that it’s unlike any training data. That’s a high bar. Tough time to be an artist.
        overgard 24 days ago
        I don't know why you're downvoted, I think that's a reasonable use of AI and it looks pretty good.
        Edit: I think I misread what you were saying, but I do think it's a nice poster! I get that design is going to have to avoid doing things that AI does, which is kind of unfortunate, because AI is likely trained on a lot of things that are generally good ideas.
    - AkBKukU 25 days ago
      > can't afford to pay one for whatever business I have
      At small scales what "art" does your business need? If you can't afford to hire an artist (which is completely fine, I couldn't for my business!) do you really need the art or are you trying to make your "brand" look more polished than it actually is? Leverage your small scale while you can because there isn't as much of an expectation for polish.
      And no, a band poster doesn't have to be a labor of love. But it also doesn't have to be some big showy art either. If I saw a small band with a clearly AI generated poster it would make me question the sources for their music as well.
    - squidsoup 25 days ago
      > band poster doesn't have to be a labor of love artisanal thing
      Very few bands would agree with that statement.
    - Arch485 25 days ago
      I think you're misunderstanding - most people's beef with AI art isn't that it "isn't made by experts", it's that
      1) it's made from copyrighted works, and the original authors receive no credit; 2) it is (typically) low-effort; 3) there are numerous negative environmental effects of the AI industry in general; 4) there are numerous negative social effects of AI in general, and more specifically AI generated imagery is used a lot for spreading misinformation; 5) there are numerous negative economic effects of AI, and specifically with art, it means real human artists are being replaced by AI slop, which is of significantly lower quality than the equivalent human output. Also, instead of supporting multiple different artists, you're siphoning your money to a few billion dollar companies (this is terrible for the economy)
      As a side note, if you have a business which truly cannot afford to pay any artists, there are a lot of cheaper, (sometimes free!) pre-paid art bundles that are much less morally dubious than AI. Plus, then you're not siphoning all of your cash to tech oligarchs.
    - reaperducer 25 days ago
      Because I'm not an artist and can't afford to pay one for whatever business I have?
      If your business can't afford to spend $5 on Fivr, it's not a business. It's not even panhandling.
      [-]
      - tempaccount5050 25 days ago
        Why is that better? They're going to use AI anyway. It's fiver.
    - Planktonne 25 days ago
      No one is saying that only experts can do things; that's a totally inaccurate reading of the argument and the post.
      People are saying, very clearly, that they're not willing to put effort into something produced by someone who put no effort in.
    - jll29 25 days ago
      What, a music band's poster, 'typeset' in Microsoft Word? I cannot imagine bothering to go to such a band's concert.
      <joke>What's your rock band called, "SEC Form 10-K"?</joke>
    - swader999 25 days ago
      I agree and whose to say your life experience isn't as valid as someone with less years but more time at just the traditional tools? I'd think either extreme could produce real art if the tools moat was reduced with AI.
    - Gigachad 25 days ago
      I actually love MS word posters. It's a million times more authentic and enjoyable than a slop generation. If a band put up an AI poster I'd assume they lack any kind of taste which is the whole reason I'd want to listen to a band anyway.
      I know this is controversial in tech spaces. But most people, particularly those in art spaces like music actually appreciate creativity, taste, effort, and personal connection. Not just ruthless efficiency creating a poster for the lowest cost and fastest time possible.
      [-]
      - tempaccount5050 25 days ago
        It's just as low effort. This is silly.
    - satisfice 25 days ago
      How about going without? I can’t afford an artist, either, so I don’t have art. Don’t foist slop on people because you are trying to be something that you aren’t.
    - Jtarii 25 days ago
      I would rather see a MS word poster than be lied to.
  - zulban 25 days ago
    Nobody can be bothered to make my cat out of Lego and the size of mount Everest but if an AI did I'd sure love to see it.
    Your quip is pithy but meaningless.
    [-]
    - Gigachad 25 days ago
      I'm not saying it's worthless for yourself, it's worthless to me as a viewer. AI content is great for your own usage, but there is no point posting and distributing AI generation.
      I could have generated my own content, so just send the prompt rather than the output to save everyone time.
      [-]
      - dolebirchwood 25 days ago
        And when the distilled knowledge/product is the result of multiple prompts, revisions, and reiterations? Shall we send all 30+ of those as well so as to reproduce each step along the way?
      - zulban 25 days ago
        Maybe reread my comment. Would you not want to see a mount Everest sized Lego cat? Even if it were my cat?
        Again - your quip sounds good but when you think about it, it's flatly wrong.
        [-]
        Fraterkes 25 days ago
        This doesn't make sense, if I want to see a lego-cat slopimage I can just prompt a model myself (and have it be of my own cat). There's no reason for you to be involved in any part of that process, because the point of this stuff is that you are not doing anything.
        [-]
        zulban 24 days ago
        The claim is that people don't / shouldn't want to see something if humans can't be bothered to make it. I provided a counter example. So the claim is nonsense.
  - loudandskittish 25 days ago
    Exactly how I feel. There is already more art, movies, music, books, video games and more made by human beings than I can experience in my lifetime. Why should I waste any time on content generated by the word guessing machine?
  - nimchimpsky 25 days ago
    [dead]
- atleastoptimal 25 days ago
  The issue is that the signalling makes sense when human generated work is better than AI generated. Soon AI generated work will be better across the board with the rare exception of stuff the top X% of humans put a lot of bespoke highly personalized effort into. Preferring human work will be luxury status-signalling just like it is for clothing, food, etc.
  [-]
  - dilDDoS 25 days ago
    I'm probably in a weird subgroup that isn't representative of the general public, but I've found myself preferring "rough" art/logos/images/etc, basically because it signals a human put time into it. Or maybe not preferring, but at least noticing it more than the generally highly refined/polished AI artwork that I've been seeing.
    [-]
    - appplication 25 days ago
      There’s no reason to think people broadly want “better” writing, images, whatever. Look at the indie game scene, it’s been booming for years despite simpler graphics, lower fidelity assets, etc. Same for retro music, slam poetry, local coffee shops, ugly farmers market produce, etc.
      There is a mass, bland appeal to “better” things but it’s not ubiquitously desired and there will always be people looking outside of that purely because “better” is entirely subjective and means nothing at all.
  - james2doyle 25 days ago
    I think "better" is doing a lot of heavy lifting in this argument. Better how?
    Is an AI generated photo of your app/site going to be more accurate than a screenshot? Or is an AI generated image of your product going to convey the quality of it more than a photo would?
    I think Sora also showed that the novelty of generating just "content" is pretty fleeting.
    I would be interested to see if any of the next round of ChatGPT advertisements use AI generated images. Because if not, they don’t even believe in their own product.
  - masswerk 25 days ago
    The issue being, it's not an expression of anything. Merely like a random sensation, maybe some readable intent, but generic in execution, which isn't about anything even corporate art should be about. Are we going to give up on art, altogether?
    Edit: One of the possible outcomes may be living in a world like in "Them" with glasses on. Since no expression has any meaning anymore, the message is just there being a signal of some kind. (Generic "BUY" + associated brand name in small print, etc.)
    [-]
    - ragequittah 25 days ago
      Can't the expression come from the person prompting the AI and sometimes taking hours inpainting or tweaking the prompt to try get the exact image / expression they had in their mind? A good use I've found is to be able to make scenes from a dream you had into an image. If that's not an expression of something then I'm not sure anything is.
      [-]
      - masswerk 25 days ago
        Notably, this process of struggle is meant to go away, to make room for instant satisfaction. This is really about some kind of expression consumerism. (And what will be lost along the way is meaning.)
        [-]
        ragequittah 25 days ago
        I always find this argument to ring hollow. Maybe it's because I've been through it with too many technologies already. Digital photography took out the art of film photography. CGI took out the wonder of practical effects. Digital art takes out the important brush strokes of someone actually painting. The real answer always is the mediums can coexist and each will be good for expression in their own way.
        I'm not sure you immediately lose meaning if someone can make a highly personalized version of something easily. The % of completely meaningless video after YouTube and tiktok came about has skyrocketed. The amount of good stuff to watch has gone up as well though.
  - fwipsy 25 days ago
    Only novel art is interesting. AI can't really do novel. It's a prediction algorithm; it imitates. You can add noise, but that mostly just makes it worse. It can be used to facilitate original stuff though.
    But so many people want to make art, and it's so cheap to distribute it, that art is already commoditized. If people prefer human-created art, satisfying that preference is practically free.
    [-]
    - atleastoptimal 25 days ago
      AI can be novel, there is nothing in the transformer architecture which prohibits novelty, it's just that structurally it much prefers pattern-matching.
      But the idea of novelty is a misnomer I think. Any random number generator can arbitrarily create a "novel" output that a human has never seen before. The issue is whether something is both novel and useful, which is hard for even humans to do consistently.
      [-]
      - CooCooCaCha 25 days ago
        Anthropic recently changed their take-home test specifically to be more “out-of-distribution” and therefore more resistant to AI so they can assess humans.
        I’m so tired of “there’s nothing preventing”, and “humans do that too”. Modern AI is just not there. It’s not like humans and has difficulties with adapting to novelty.
        Whether transformers can overcome that remains to be seen, but it is not a guarantee. We’ve been dealing with these same issues for decades and AI still struggles with them.
    - idiotsecant 25 days ago
      There are lots of things that are novel to you without necessarily being novel to the universe.
  - paulddraper 25 days ago
    "Artisanal art" as it were.
  - vinyl7 25 days ago
    The goal of art isn't to be perfect or as realistic as possible. The goal of art is to express, and enjoy that unique expression.
  - davebren 25 days ago
    > Preferring human work will be luxury status-signalling just like it is for clothing, food, etc.
    What? Those items are luxuries when made by humans because they are physical goods where every single item comes with a production and distribution cost.
- strulovich 25 days ago
  Here’s one example:
  I just recently used for image generation to design my balcony.
  It was a great way to see design ideas imagined in place and decide what to do.
  There are many cases people would hire an artist to illustrate an idea or early prototype. AI generated images make that something you can do by yourself or 10x faster than a few years ago.
  [-]
  - dwd 25 days ago
    Did the same for my front garden.
    Not withstanding a few code violations, it generated some good ideas we were then able to tweak. The main thing was we had no idea of what we wanted to do, but seeing a lot of possibilities overlaid over the existing non-garden got us going. We were then able to extend the theme to other parts of the yard.
- tecoholic 25 days ago
  100%. A picture is worth a thousand words only when it conveys something. I love to see the pictures from my family even when they are taken with no care to quality or composition but I would look at someone else’s (as in gallery/exhibitions) only when they are stunning and captured beautifully. The medium is only a channel to communicate.
  Also, this can’t be real. How many publications did they train this stuff on and why are there no acknowledgment even if to say - we partnered with xyz manga house to make our model smarter at manga? Like what’s wrong with this company?
- gustavus 25 days ago
  I'm working on an edutech game. Before I would've had much less of a product because I don't have the budget to hire an artist and it would've been much less interactive but because of this I'm able to build a much more engaging experience so that's one thing. For what it's worth.
  [-]
- _the_inflator 25 days ago
  We need to flip the script. AI is trying to do marketing: add “illegal usage will lead to X” is a gateway to spark curiosity. There is this saying that censoring games for young adults makes sure that they will buy it like crazy by circumventing the restrictions because danger is cool.
  There is nothing that cannot harm. Knives, cars, alcohol, drugs. A society needs to balance risks and benefits. Word can be used to do harm, email, anything - it depends on intention and its type.
- _the_inflator 25 days ago
  I see your point but reconsider: we will and need to see. Time will tell and this is simply economics: useful? Yes, no.
  I started being totally indifferent after thinking about my spending habits to check for unnecessary stuff after watching world championships for niche sports. For some this is a calling for others waste. It is a numbers game then.
- youdots 25 days ago
  The technically (in both senses) astonishing and amazing output is not far off from some of the qualities of real advertising: Staged, attention grabbing, artificially created, superficially demanded, commercially attractive qualities. These align, and lots of similarities in the functions and outcomes of these two spheres come to mind.
- Havoc 25 days ago
  >and even if we cannot tell if an image is AI-generated, we can know if companies are using AI to generate images in general, so the appealing is decreasing
  Is that true? Don't think I'd get tired of images that are as good as human made ones just because I know/suspect there may have been AI involved
- simonw 25 days ago
  I think there's real value to be had in using this for diagrams.
  Visual explanations are useful, but most people don't have the talent and/or the time to produce them.
  This new model (and Nano Banana Pro before it) has tipped across the quality boundary where it actually can produce a visual explanation that moves beyond space-filling slop and helps people understand a concept.
  I've never used an AI-generated image in a presentation or document before, but I'm teetering on the edge of considering it now provided it genuinely elevates the material and helps explain a concept that otherwise wouldn't be clear.
  [-]
  - mwcampbell 25 days ago
    Are there any models that are specifically trained to produce diagrams as SVG? I'd much prefer that to diffusion-based raster image generation models for a few reasons:
    - The usual advantages of vector graphics: resolution-independence, zoom without jagged edges, etc.
    - As a consequence of the above, vector graphics (particularly SVG) can more easily be converted to useful tactile graphics for blind people.
    - Vector graphics can more practically be edited.
    [-]
    - twobitshifter 25 days ago
      You can get them to produce mermaid diagrams, but you can also generate these yourself from text.
  - resters 25 days ago
    This is the key point. In my view it's just like anything else, if AI can help humans create better work, it's a good thing.
    I think what we'll find is that visual design is no longer as much of a moat for expressing concepts, branding, etc. In a way, AI-generated design opens the door for more competition on merits, not just those who can afford the top tier design firm.
  - lol_me 25 days ago
    yeah I'm not sure I'm in agreement that we can hand-wave assets and ads as entire classes of valuable content
- swader999 25 days ago
  I tend to share your same view. But is there really a line like you describe? Maybe AI just needs to get a few iterations better and we'll all love what it generates. And how's it really any different than any Photoshop computer output from the past?
- colechristensen 25 days ago
  >In general, I think people are starting to realize that things generated without effort are not worth spending time with
  Agreed mostly, BUT
  I'm building tools for myself. The end goal isn't the intermediate tool, they're enabling other things. I have a suspicion that I could sell the tools, I don't particularly want to. There's a gap between "does everything I want it to" and "polished enough to justify sale", and that gap doesn't excite me.
  They're definitely not generated without effort... but they are generated with 1% of the human effort they would require.
  I feel very much empowered by AI to do the things I've always wanted to do. (when I mention this there's always someone who comes out effectively calling me delusional for being satisfied with something built with LLMs)
- JumpCrisscross 25 days ago
  > What else?
  I used to have an assistant make little index-card sized agendas for gettogethers when folks were in town or I was organising a holiday or offsite. They used to be physical; now it's a cute thing I can text around so everyone knows when they should be up by (and by when, if they've slept in, they can go back to bed). AI has been good at making these. They don't need to be works of art, just cute and silly and maybe embedded with an inside joke.
  [-]
  - pesus 25 days ago
    I'm not seeing how it takes more than 5 minutes to type up an itinerary. If you want to make it cute and silly, just change up the font and color and add some clip art.
    If this is the best use case that exists for AI image generation, I'm only further convinced the tech is at best largely useless.
    [-]
    - JumpCrisscross 25 days ago
      > not seeing how it takes more than 5 minutes to type up an itinerary
      Because I’ll then spend hours playing with the typography (because it’s fun) and making it look like whatever design style I’ve most recently read about (again, because it’s fun) and then fighting Word or Latex because I don’t actually know what I’m doing (less fun). Outsourcing it is the right move, particularly if someone else is handling requests for schedules to be adjusted. An AI handles that outsourcing quicker for low-value (but frequent) tasks.
      > If this is the best use case that exists for AI image generation
      I’ve also had good luck sketching a map or diagram and then having the AI turn it into something that looks clean.
      Look, 99% of my use cases are e.g. making my cat gnaw on the Tetons or making a concert of lobsters watching Lady Gaga singing “I do it for the claws” or whatever so I can send two friends something stupid at 1AM. But there does appear to be a veneer of productivity there, and worst case it makes the world look a bit nicer.
      [-]
      - breezybottom 25 days ago
        You might not be able to tell how bad the AI slop looks, but I guarantee some of your friends can. AI is awful at maps and diagrams.
        [-]
        JumpCrisscross 24 days ago
        I’m not giving my friends AI maps and diagrams. And yes, they don’t look great. But they work. If I want to communicate something spatial, I can spend an hour in R or five minutes in Claude. The point is to communicate that information, and for a quick task, AI means the other person gets a map versus block of text they have to reason through.
  - reaperducer 25 days ago
    I don't care how many times you write "cute," having my vacation time programmed with that level of granularity and imposed obligation sounds like the definition of "dystopian."
    If I got one of your cute schedule cards while visiting you, I'd tear it up, check into a cheap motel, and spend the rest of my vacation actually enjoying myself.
    Edit: I'm not an outlier here. There have even been sitcom episodes about overbearing hosts over-programming their guests' visits, going back at least to the Brady Bunch.
    [-]
    - JumpCrisscross 25 days ago
      > If I got one of your cute schedule cards while visiting you, I'd tear it up, check into a cheap motel, and spend the rest of my vacation actually enjoying myself
      Okay. I'd be confused why you didn't voice up while we were planning everything as a group, but those people absolutely exist. (Unless it's someone's, read: a best friend or my partner's, birthday. Then I'm a dictator and nobody gets a choice over or preview of anything.)
      I like to have a group activity planned on most days. If we're going to drive to get in an afternoon hike in before a dinner reservation (and if I have 6+ people in town, I need a dinner reservation because no I'm not coooking every single evening), or if I've paid for a snowmobile tour or a friend is bringing out their telescope for stargazing, there are hard no-later-than departure times to either not miss the activity or be respectful of others' time.
      My family used to resolve that by constantly reminding everyone the day before and morning of, followed by constantly shouting at each other in the hours and minutes preceding and–inevitably–through that deadline. I prefer the way I've found. If someone wants to fuck off from an activity, myself included, that's also perfectly fine.
      (I also grew up in a family that overplanned vacations. And I've since recovered from the rebound instinct, which involves not planning anything and leaving everything to serendipity. It works gorgeously, sometimes. But a lot of other times I wonder why I didn't bother googling the cool festival one town over before hand, or regretted sleeping in through a parade.)
      > There have even been sitcom episodes about overbearing hosts over-programming their guests' visits
      Sure. And different groups have different strokes. When it comes to my friends and I, generally speaking, a scheduled activity every other day with dinners planned in advance (they all get hangry, every single fucking one of them) works best.
  - jll29 25 days ago
    You are kidding, right?
    It's good that my friends don't make a coffee date feel like a board meeting (with an agenda shared by post 14 working days ahead of the meeting, form for proxy voting attached).
    [-]
    - JumpCrisscross 25 days ago
      [dead]
- underlipton 25 days ago
  >Like, in terms of art, it's discarded (art is about humans)
  I dunno how long this is going to hold up. In 50 years, when OpenAI has long become a memory, post-bubble burst, and a half-century of bitrot has claimed much of what was generated in this era, how valuable do you think an AI image file from 2023 - with provenance - might be, as an emblem and artifact of our current cultural moment, of those first few years when a human could tell a computer, "Hey, make this," and it did? And many of the early tools are gone; you can't use them anymore.
  Consider: there will never be another DallE-2 image generation. Ever.
- NikolaNovak 25 days ago
  While I agree with you, hacker news audience is not in the middle of the bell curve.
  I get this sounds elitist - but tremendous percentage of population is happily and eagerly engaging with fake religious images, funny AI videos, horrible AI memes, etc. Trying to mention that this video of puppy is completely AI generated results in vicious defense and mansplaining of why this video is totally real (I love it when video has e.g. Sora watermarks... This does not stop the defenders).
  I agree with you that human connection and artist intent is what I'm looking for in art, music, video games, etc... But gawd, lowest common denominator is and always has been SO much lower than we want to admit to ourselves.
  Very few people want thoughtful analysis that contradicts their world view, very few people care about privacy or rights or future or using the right tool, very few people are interested in moral frameworks or ethical philosophy, and very few people care about real and verifiable human connection in their "content" :-/
  [-]
  - Planktonne 25 days ago
    HN is absolutely not more critical of AI output than the norm.
    It's been true for various technologies that HN (and tech audiences in general) have a more nuanced view, but AI flips the script on that entirely. It's the tech world who are amazed by this, producing and being delighted by endless blogposts and 7-second concept trailers.
    [-]
    - NikolaNovak 24 days ago
      I think we are conflating usage vs consumption.
      I think HN probably uses GenAI more than average population.
      But I think HN consumes less GenAI content than average population.
      Look at Facebook, Instagram, Youtube, TikTok, etc. All I see is my non-techie friends being amazed and mesmerized by - cute animals, creepy animals, political events, jokes, comedy, outrage, events, speeches - that never ever happened. As if we don't have actual real puppies that are cute, my acquintenances and family are oooing and awwwing at fake howling huskies, fake animals being jump-scared by fake surprises.
      HN may be amazed by potential of AI output the improve the world more than average person. But hustlers are laughing their way to the bank as they actually use AI to make ridiculous, and I do mean ridiculous, amount of "content" for cheap, that is, absolutely is, being consumed at prodigious rate with no sign of stopping. This is not 7-second trailers and concepts for some future years - this is mega-years of actual content being liked, shared, engaged with and consumed, right now. This is what OP is hoping that tides will turn against, and this is what I see no sign of rejection in my non-techie/non-geeky circles :(
      [-]
      - Planktonne 24 days ago
        You're on a site where the commenters read AI-generated articles about how they can generate new images to include in their generated websites that they themselves generate more articles about.
        Sure, the weird cat-people adverts aren't aimed at HN's commentariat, but every 'democratise art and build that game you've dreamt of' pitch is. Every breathless paean to AI assistants/companions/partners is targeted at the users here.
        Usage is a form of consumption; thinking of yourself as a creator while you consume doesn't mean you consume less.
        Non-tech users are being fed fake images when they browse idly. Tech users are restructuring their entire lives around these tools.
  - ryandrake 25 days ago
    I recently shoulder-surfed a family member scrolling away on their social media feed, and every single image was obvious AI slop. But it didn't matter. She loved every single one, watched videos all the way through, liked and commented on them... just total zombie-consumption mode and it was all 100% AI generated. I've tried in the past pointing out that it's all AI generated and nothing is real, and they simply don't care. People are just pac-man gobbling up "content". It's pretty sad/scary.
    [-]
    - mickelsen 21 days ago
      I'd be a bit more humble rather than terrified, because I enjoy some AI slop too, especially funny animals that remind me of my old pets' antics. There are levels of slop. But tasteless stuff with crap graphics plastered all over, loud edits or badly calibrated tts voices were already all over reels/tiktok long before AI, and people still liked that.
      The unsettling thing on social media is the mind hijacking with the recommendation algo and scrolling motion that resembles a slot machine, more than the content itself.
- papichulo2023 25 days ago
  Seems good enough to generate 2D sprites. If that means a wave of pixel-art games I count it as a net win.
  I dont think gamers hate AI, it is just a vocal miniority imo. What most people dislike is sloppy work, as they should, but that can happen with or without AI. The industry has been using AI for textures, voices and more for over a decade.
  [-]
  - vunderba 25 days ago
    > Seems good enough to generate 2D sprites.
    It’s really not. That's actually a pet peeve of mine as someone who used to spent a lot of time messing with pixel art in Aseprite.
    Nobody takes the time to understand that the style of pixel art is not the same thing as actual pixel art. So you end up with these high-definition, high-resolution images that people try to pass off as pixel art, but if you zoom in even a tiny bit, you see all this terrible fringing and fraying.
    That happens because the palette is way outside the bounds of what pixel art should use, where proper pixel art is generally limited to maybe 8 to 32 colors, usually.
    There are plenty of ways to post-process generative images to make them look more like real pixel art (square grid alignment, palette reduction, etc.), but it does require a bit more manual finesse [1], and unfortunately most people just can’t be bothered.
    [1] - https://github.com/jenissimo/unfake.js
  - loudandskittish 25 days ago
    There are already more games being released on Steam than anyone can keep up with, I'm not sure how adding another "wave" on top of it helps.
  - tiagod 25 days ago
    AI for textures for over a decade? What AI?
    [-]
    - papichulo2023 25 days ago
      Efros–Leung, PatchMatch? Nearest neighbours was "AI" before difusion models.
      [-]
      - tiagod 24 days ago
        Don't you think it's a huge stretch to compare those to modern generative AI in this context? Those don't raise any of the questions that make current usage questionable.
  - Thonn 25 days ago
    Are you kidding? I think I see more vitriol for AI in gaming communities than anywhere else. To the point where steam now requires you to disclose its usage
    [-]
    - papichulo2023 25 days ago
      Crimson Desert failed to disclose on release and (almost) nobody cared, gamers kept buying it.
- slibhb 25 days ago
  > Like, in terms of art, it's discarded (art is about humans)
  If a work of art is good, then it's good. It doesn't matter if it came from a human, a neanderthal, AI, or monkeys randomly typing.
  [-]
  - Jtarii 25 days ago
    The connection with the artist, directly, or across space and time, is a critical part of any artwork. It is one human attempting to communicate some emotional experience to another human.
    When I watch a Lynch film I feel some connection to the man David Lynch. When I see a AI artwork, there is nothing to connect with, no emotional experience is being communicated, it is just empty. It's highest aspiration is elevator music, just being something vaguely stimulating in the background.
    [-]
    - slibhb 24 days ago
      I don't agree. If a poem is moving, it's moving. It doesn't matter who wrote it.
      I understand these are fundamental questions about aesthetics that people differ over. But that's how it works for me. However, ultimately, I think people will realize that I'm right around the time that AI does start generating good art.
  - papa_bear 25 days ago
    Provenance is part of the work. If a roomful of monkeys banged out something that looked like anything, I'd absolutely hang it on my wall. I would not say the same for 99% of AI generated art.
  - avaer 25 days ago
    Whether art is considered good is in practice highly contextual. One of those contexts is who (what) made it.
- RIMR 25 days ago
  My only actual use of image or video AI tools is self-entertainment. I like to give it prompts and see the results it gives me.
  That's it. I can't think of a single actual use case outside of this that isn't deliberately manipulative and harmful.
- NetOpWibby 25 days ago
  The Human Renaissance is something I've been thinking of too and I hope it comes to pass. Of course, I feel like societally, things are gonna get worse for a lot of folks. You already see it in entire towns losing water or their water becoming polluted.
  You'd think these kickbacks leaders of these towns are getting for allowing data centers to be built would go towards improving infrastructure but hah, that's unrealistic.
  WTF is that unrealistic? SMH
  [-]
  - Lerc 25 days ago
    >You already see it in entire towns losing water or their water becoming polluted
    Do you have any references for such cases? I have seen talk of such thing at risk, but I am unaware of any specific instances of it occuring
    [-]
    - NetOpWibby 25 days ago
      I know I've seen such a story on HN before, you can probably find it by searching for "water" and "data center/AI."
      [-]
      - Lerc 24 days ago
        The closest match I found was https://news.ycombinator.com/item?id=44562052
        The article tries to play sleight of hand with the specific instance that they cite but it seems that the loss of water is alleged to be caused by sediment from construction rather than water use.
        It's not great that it happened and it is something local government should take action on, but it is also something that could have been caused by any form of industrial construction. I suspect there are already laws in place that cover this. If they are not being enforced that's another issue entirely.
        [-]
        NetOpWibby 24 days ago
        That's exactly the article I was thinking of.
        Data center construction exposing weaknesses in local infrastructure is a double-edged sword; you wanna know if things need upgrading but you don't wanna be negatively affected by it.
        Maybe there should be some clause in these contracts that mandate tech companies foot the bill for local infrastructure improvements.
        [-]
        Lerc 24 days ago
        In that case it does not depict the scenario you suggested.
        This is not a data center issue at all, it is a construction issue, that it was a data center being constructed was incidental.
        I believe there are regulations that cover things like this already.
        To characterise it as representative or specific to data centers is ad best disingenuous.
        [-]
        NetOpWibby 24 days ago
        I didn't write the article man
- ArchieScrivener 25 days ago
  I completely disagree, this replaces art as a job. Why does human art need monetary feedback to be shared? If people require a paycheck to make art then it was never anything different than what Ai generated images are.
  As for advertising being depressing - its a little late to get up on the high horse of anti-Ads for tech after 2 decades of ad based technology dominating everything. Go outside, see all those bright shiny glittery lights, those aren't society created images to embolden the spirit and dazzle the senses, those are ads.
  North Korea looks weird and depressing because the don't have ads. Welcome to the west.
- tomrod 25 days ago
  AI loopidity rearing it's head. Just send the bullet points that we all want anyway, right?! Stop sending globs of text and other generated content!
- iLoveOncall 25 days ago
  Porn and memes. Obviously. This is all that Stable Diffusion has been used for since it was released.
OutOfHere 25 days ago
ChatGPT image generation is and has been horrific for the simple reason that it rejects too many requests. This hasn't changed with the new model. There are too many legal non-adult requests that are rejected, not only for edits, but also for original image generation. I'd rather pay to use something that actually works.
Melatonic 25 days ago
We were afraid it would be Skynet and instead we got the ultimate meme generator !
[-]
- ceejayoz 25 days ago
  Now we'll just get teabagged by killer robots for the lolz.
c16 25 days ago
That video seems like it was made for the tiktok generation. Slow down.
agnishom 25 days ago
I don't know how this benefits humanity. In what way was ChatGPT Images 1.0 not already good enough? Perhaps some new knowledge was created in the process?
jwpapi 25 days ago
Why is it all so asian?
[-]
- twobitshifter 25 days ago
  Having 60% of the world’s population might do that.
  [-]
  - jwpapi 24 days ago
    My hint was at an issue in the american educational system.
    I‘m sure if they could they would have shown more all americans. Especially given how important the state connection is for them to keep up their spending..
    That means they struggle to find american technical presenters
elAhmo 25 days ago
I am super out of the loop here, what happened with Dall-E?
[-]
- vunderba 24 days ago
  DALL-E 3 (last version I think) went dark over a year ago and has since been replaced by the gpt-image-x series which honestly is a bit of a shame because the weird surreal images it generated were still pretty fun from an experimental point of view.
XCSme 25 days ago
Oh wow, scrolling through the page on mobile makes me dizzy
bitnovus 25 days ago
great obfuscation idea - hidden message on a grain of rice
dazhbog 25 days ago
Yay, let's burn the planet computing more slopium..
sumitkumar 24 days ago
prompt: create a qr code to https://www.anthropic.com
response: https://chatgpt.com/backend-api/estuary/content?id=file_0000...
result: FAIL
Unsponsoredio 24 days ago
the image generation genuinely got me excited, outputs are impressive. but editing what it creates still feels underbaked.
VA1337 25 days ago
So is it better than nano-banana after all?
retrac98 25 days ago
The page keeps crashing on my iPhone 17 Pro.
fizlebit 25 days ago
Scrolling through those images it just feels like intellectual theft on a massive scale. The only place I think you're going to get genuinely new ideas is from humans. Whether those humans use AI or not I don't care, but the repetitive slop of AI copying the creative output of humans I don't find that interesting. Call me a curmudgeon. I guess humans also create a lot of derivative slop even without AI assistance. If this leads somehow to nicer looking user interfaces and architecture maybe that is good thing. There are a lot of ugly websites, buildings and products.
apparent 25 days ago
I find the video to be very annoying. Am I supposed to freeze frame 4x per second to be able to see whether the images are actually good? I've never before felt stressed watching a launch video.
[-]
- Havoc 25 days ago
  Yeah same. At first I thought they're using it to conceal quality, but pausing it they do actually look really good, so strange choice.
  Maybe it's meant to convey pace & hype
  [-]
  - apparent 25 days ago
    Maybe so, but to me it conveys a headache.
ibudiallo 25 days ago
And here I was proud of myself, having taught my mom and her friends how to discern real from fakes they get on WhatsApp groups. Another even more powerful tool for scammers. I'm taking a break.
[-]
- XorNot 25 days ago
  IMO you're fighting the wrong battle: there'll always be a new model.
  But the broader concept of fake news and the manufactured nature of media and rhetoric is much more relevant - e.g. whether or not something's AI is almost immaterial to the fact that any filmed segment does not have to be real or attributed to the correct context.
  Its an old internet classic just to grab an image and put a different caption on it, relying on the fact no one can discern context or has time to fact check.
- bananaflag 25 days ago
  I told my mom not to believe anything unless she trusts the source. The way people always did with text.
RyanJohn 25 days ago
Oh my god, it's very nice!
tomchui157 25 days ago
Img2+ seed dance 2 = image AGI
mrzhangbo 24 days ago
I'm exhausted. I've developed many products, but most of them were abandoned halfway through.
aledevv 25 days ago
Only vintage-style images?
gfody 25 days ago
there's something funny going on with the live stream audio
jcattle 25 days ago
Can we talk about how jarring the announcement video is?
AI generated voice over, likely AI generated script (You see, this model isn't just generating images, it's thinking!). From what it looks like only the editing has some human touch to it?
It does this Apple style announcement which everyone is doing, but through the use of AI, at least for me, it falls right into the uncanny valley.
tadasg 24 days ago
insane model... already doing stuff like app screenshots
ieie3366 25 days ago
It's great. Also doesn't seem to have any "slop" standard look, the images it produces are quite diverse.
I would imagine this will hit illustrators / graphics designers / similar people very hard, now that anyone can just generate professional looking graphical content for pennies on the dollar.
Bennettheyn 25 days ago
fal has the endpoint under openai/gpt-image-2
throw310822 25 days ago
Ok, I can hear the sound of entire industries crumbling right now.
flyingsquirrel_ 24 days ago
GPT Images 2.0 is very wonderful image model in the world
Slash32 24 days ago
it'll be a game changer
souravroy78 24 days ago
Cool!
bitnovus 25 days ago
No gpt-5.5
[-]
- wahnfrieden 25 days ago
  Thursday
irishcoffee 25 days ago
This is so stupid. As a free OSS tool it’s amazing. Paying money for this is fucking stupid. How blind are we all to now before this tech?
BohdanPetryshyn 25 days ago
Am I the only one for whom videos in OpenAI releases never load? Tried both Chrome and Safari
iainctduncan 23 days ago
Another generous gift to the ransomware and fraud "industries".
dzonga 25 days ago
for video game assets this is massive.
but in general though - will people believe in anything photographic ?
imagine dating apps, photographic evidence.
I'm guessing we're gonna reach a point where - you fuck up things purposely to leave a human mark.
[-]
- telman17 25 days ago
  > for video game assets this is massive.
  Storefronts like Steam require disclosing use of AI assets for art. In most indie dev spaces, devs are scolded for using AI art in their games. I wonder if this perspective will change in a few years.
  [-]
  - weezing 24 days ago
    I doubt it. Fortunately more and more people are against AI usage for any kind of assets in games and other media.
- squidsoup 25 days ago
  > but in general though - will people believe in anything photographic ?
  Hopefully film makes a come back.
andai 25 days ago
lol at the fake handwritten homework assignment. Know your customer!
mcfry 25 days ago
How hard is it to have a video player with a fucking volume toggle?
dahuangf 25 days ago
good job
hungry-facet 24 days ago
You know what else is a good image generator? Humans.
davikr 25 days ago
It definitely lost the characteristic slop look.
rqa129 25 days ago
Can it generate Chibi figures to mask the oligarchy's true intentions on Twitter and make them more relatable?
rzgrozt 24 days ago
now that's a good work since it's openai
volkk 25 days ago
the guys presenting are probably all like 25x smarter than I am but good god, literally 0 on screen presence or personality.
[-]
- sho_hn 25 days ago
  That's a trained skill, and they presumably have focused on other skills.
  [-]
  - brcmthrowaway 25 days ago
    Yeah, skills to make them a cool 10mn a year
  - volkk 25 days ago
    eh, i don't think personalities are trained. on screen presence for sure, but you'd see right through it IRL.
    [-]
    - dymk 25 days ago
      The corporate espionage industry would disagree
- OsrsNeedsf2P 25 days ago
  I liked it that way, felt more authentic to see the noobs
- E-Reverance 25 days ago
  I think its endearing
- Aethelwulf 25 days ago
  didn't think that sam guy was that bad
minimaxir 25 days ago
HN submission for a direct link to the product announcement which for some reason is being penalized by the HN algorithm: https://news.ycombinator.com/item?id=47853000
[-]
- dang 25 days ago
  (We eventually merged the threads hither)
simonw 25 days ago
Suggest renaming this to "OpenAI Livestream: ChatGPT Images 2.0"
[-]
- dang 25 days ago
  (We've since merged the threads and moved the livestream link to the toptext)
- I_am_tiberius 25 days ago
  or "How we make money with your images 2.0".
nopinsight 25 days ago
[dead]
cindyllm 25 days ago
[dead]
mmh0000 25 days ago
[dead]
lokthedev 24 days ago
[dead]
dumbaccount123 24 days ago
[dead]
sho_hn 25 days ago
In 5 years and 3 months between DALL-E and Images 2.0 we've managed to progress from exuberant excitement to jaded indifference.
[-]
- nba456_ 25 days ago
  Who's 'we'? Speak for yourself!
- kibibu 25 days ago
  Because we are all seeing the harm these tools are being used for.
  It's just another step into hell.
dawdwa 25 days ago
[dead]
welder 25 days ago
Introducing DeepFakes 2.0 /s
brianbest101 25 days ago
[dead]
otobrglez 25 days ago
[flagged]
biosubterranean 25 days ago
Oh no.
ai4thepeople 25 days ago
Each day when my AI girlfriend wakes me up and shows me the latest news, I feel: This is it! We are living in a revolution!
Never before in history did humanity have the possibility of seeing a picture of a pack of wolves! The dearth of photographs has finally been addressed!
I told my AI girlfriend that I will save money to have access to this new technology. She suggested a circular scheme where OpenAI will pay me $10,000 per year to have access to this rare resource of 21th century daguerreotype.
manishfp 25 days ago
Goated release tbh. The text work inside the images are nice
green_wheel 25 days ago
Well artists, you guys had a good run thank you for your service.
zb3 25 days ago
Image generation? Hmm, would be cool if OpenAI also made a video-generation model someday..
[-]
- incognito124 25 days ago
  If only there was a social network with solely AI generated videos, I would pay literal money for it...
  [-]
  - allenbina 25 days ago
    If I may address this with both skepticism and curiosity, why. I think I speak for everyone when I say I would pay to go back to facebook 2018. No algorithm, no ai.
    [-]
    - Bigpet 25 days ago
      Are you being sincere? This is one layer of irony too much for my brain to comprehend.
      The person you're replying to is making a joke about OpenAI shutting down Sora their video generation "social media" app recently.
      [-]
      - incognito124 25 days ago
        You get it!
prvc 25 days ago
I hope they will consider releasing DALL-E 2 publicly, now that there has been so much progress since it was unveiled. It had a really nice vibe to it, so worth preserving.
[-]
- andy_ppp 25 days ago
  Yes, I’ve always thought of AI companies as sentimental. They will definitely do this :-/
  [-]
  - prvc 25 days ago
    That's why I want it; their motives for doing it, should they decide to, would presumably be different.
tkgally 25 days ago
I had it produce a two-page manga with Japanese dialogue. Nearly perfect:
https://www.gally.net/temp/20260422-chatgpt-images-2-example...
Danox 25 days ago
Sam Altman in his meeting with Tim Cook two and a half years ago give me money. I think it’ll take $150 billion dollars, Tim Cook well here’s what we’re going to do, this is what I think it’s worth…
Later Google tried the same thing, Apple we will give you a $1 billion dollar a year refund, what’s changed in two and a half years?
aliljet 25 days ago
I am hopeful that OpenAI will potentially offer clarity on their loss-leading subscription model. I'd prefer to know the real cost of a token from OpenAI as opposed to praying the venture-funded tokens will always be this cheap.
bsenftner 24 days ago
My problem with all of this is the terrible educations everyone has, and they cannot discriminate images from art, nor art from communications, and if they had they would realize these points this entire debate hinges is a manipulation to create people that will not help themselves with the latest technologies. But to explain it causes people to get angry, because they either think I'm trying to manipulate them, or they fall in despair when they realize the magnitude of this crime.