FLUX is fast and it's open source

(replicate.com)

258 points | by smusamashah 86 days ago

21 comments

  • sorenjan 85 days ago
    Text to image models feels inefficient to me. I wonder if it would be possible and better to do it in separate steps, like text to scene graph, scene graph to semantically segmented image, segmented image to final image. That way each step could be trained separately and be modular, and the image would be easier to edit instead of completely replace it with the output of a new prompt. That way it should be much easier to generate stuff like "object x next to object y, with the text foo on it", and the art style or level of realism would depend on the final rendering model which would be separate from the prompt adherence.

    Kind of like those video2video (or img2img on each frame I guess) models where they enhance the image outputs from video games:

    https://www.theverge.com/2021/5/12/22432945/intel-gta-v-real... https://www.reddit.com/r/aivideo/comments/1fx6zdr/gta_iv_wit...

    • miki123211 85 days ago
      In general, it has been shown time and time again that this approach fails for neural network based models.

      If you can train a neural network that goes from a to b and a network that goes from b to c, you can usually replace that combination with a simpler network that goes from a to c directly.

      This makes sense, as there might be information in a that we lose by a conversion to b. A single neural network will ensure that all relevant information from a that we need to generate c will be passed to the upper layers.

      • sorenjan 85 days ago
        Yes this is true, you do lose some information between the layers, and this increased expressibility is the big benefit of using ML instead of classic feature engineering. However, I think the gain would be worth it for some use cases. You could for instance take an existing image, run that through a semantic segmentation model, and then edit the underlying image description. You could add a yellow hat to a person without regenerating any other part of the image, you could edit existing text, change a person's pose, you could probably more easily convert images to 3D, etc.

        It's probably not a viable idea, I just wish for more composable modules that lets us understand the models' representation better and change certain aspects of them, instead of these massive black boxes that mix all these tasks into one.

        I would also like to add that the text2image models already have multiple interfaces between different parts. There's the text encoder, the latent to pixel space VAE decoder, controlnets, and sometimes there a separate img2imgstyle transfer at the end. Transformers already process images patchwise, but why does those patches have to be even square patches instead of semantically coherent areas?

      • smrtinsert 85 days ago
        It's my understanding an a-c will usually be bigger parameter wize and more costly to train
    • kqr 85 days ago
      Isn't this essemtially the approach to image recognition etc. that failed for ages until we brute forced it with bigger and deeper matrices?

      It seems sensible to extract features and reason about things the way a human would, but it turns out its easier to scale pattern matching purely done by computer.

      • WithinReason 85 days ago
        • selvan 85 days ago
          From the PDF - "One thing that should be learned from the bitter lesson is the great power of general purpose methods, of methods that continue to scale with increased computation even as the available computation becomes very great. The two methods that seem to scale arbitrarily in this way are "search" and "learning".

          The second general point to be learned from the bitter lesson is that the actual contents of minds are tremendously, irredeemably complex; we should stop trying to find simple ways to think about the contents of minds, such as simple ways to think about space, objects, multiple agents, or symmetries. All these are part of the arbitrary, intrinsically-complex, outside world. They are not what should be built in, as their complexity is endless; instead we should build in only the meta-methods that can find and capture this arbitrary complexity. Essential to these methods is that they can find good approximations, but the search for them should be by our methods, not by us. We want AI agents that can discover like we can, not which contain what we have discovered. Building in our discoveries only makes it harder to see how the discovering process can be done."

        • nuancebydefault 85 days ago
          If I would take the Lesson literally, we should not even study text to image. We should study how a machine with limitless cpu cycles would make our eyes see something we are currently thinking of.

          My point being, optimization or splitting up int subs, before handing over the problem to the machine, makes sense.

          • stoniejohnson 85 days ago
            I think the bitter lesson implies that if we could study/implement "how a machine with limitless cpu cycles would make our eyes see something we are currently thinking of" then it would likely lead to a better result than us using hominid heuristics to split things into sub-problems that we hand over to the machine.
            • nuancebydefault 85 days ago
              The technology to probe brains and visual related neurons exists today. With limitless cpu cycles we would for sure be able to do make us see whatever we think about.
              • stoniejohnson 85 days ago
                I'm not really familiar with that technology space, but if you take that as true, is your argument something like:

                - We don't have limitless CPU cycles

                - Thus we need to split things into sub-problems

                If so that might still be amenable to the bitter lesson, where Sutton is saying human heuristics will always lose out to computational methods at scale.

                Meaning something like:

                - We split up the thought to vision problem into N sub-problems based on some heuristic.

                - We develop a method which works with our CPU cycle constraint (it isn't some probe -> CPU interface). Perhaps it uses our voice or something as a proxy for our thoughts, and some composition of models.

                Sutton would say:

                Yeah that's fine, but if we had the limitless CPU cycles/adequate technology, the solution of probe -> CPU would be better than what we develop.

                • nuancebydefault 85 days ago
                  I think Sutton is right that if we had limitless cpu, any human split up would be inferior. So indeed since we are far away from limitless cpu, we divide and compose.

                  But i think we're onto something!

                  Voice to image indeed might give better results than text to image, since voice has some vibe to it (intonation, tone, color, stress on certain words, speed and probably even traits we don't know yet) that will color or even drastically influence the image output.

      • nuancebydefault 85 days ago
        A problem with image recognition i can think of, is that any rude categorization of the image, which is millions of pixels will make it less accurate.

        With image generation on the other hand, which starts from a handful of words, we can first do some text processing into categories, such as objects vs people, color vs brightness, environment vs main object, etc.

      • nerdponx 85 days ago
        You could imagine doing it with 2 specialized NNs, but then you have to figure out a huge labeled dataset of scene graphs. The problem fundamentally is that any "manual" feature engineering is not going to be supervised and fitted on a huge corpus, the way the self-learned features are.
    • teh_infallible 85 days ago
      I am hoping that AI art tends towards a modular approach, where generating a character, setting, style, and camera movement each happens in its own step. It doesn’t make sense to describe everything at once and hope you like what you get.
      • sorenjan 85 days ago
        Definitely, that would make much more sense seeing how content is produced by people. Adjust the technology to how people want to use it instead of forcing artists becoming prompt engineers and settling for something close enough what they want.

        At the very least image generators should output layers, I think the style component is already possible with the img2img models.

      • portaouflop 85 days ago
        You can already do that with comfyui - it’s just not easy to set up
    • spencerchubb 85 days ago
      That's essentially what diffusion does, except it doesn't have clear boundaries between "scene graph" and "full image". It starts out noisy and adds more detail gradually
      • WithinReason 85 days ago
        That's true, the inefficiency is from using pixel-to-pixel attention at each stage. It the beginning low resolution would be enough, even at the end high resolution is only needed at the pixel's neighborhood
    • ZoomZoomZoom 85 days ago
      The issue with this is there's a false assumption that an image is a collection of objects. It's not (necessarily).

      I want a picture of frozen cyan peach fuzz.

      • llm_trw 85 days ago
        https://imgur.com/ayAWSKr

        Prompt: frozen cyan peach fuzz, with default settings on a first generation SD model.

        People _seriously_ do not understand how good these tools have been for nearly two years already.

        • ZoomZoomZoom 85 days ago
          If by people you mean me, then I wasn't clear enough in my comment. The example given implied an image without any objects the GP was talking about, just a uniform texture.
        • thomashop 85 days ago
          • corn13read2 85 days ago
            can do this with any image generation model.

            Disclaimer: I'm not behind any

        • sorenjan 85 days ago
          Running that image through Segment Anything you get this: https://imgur.com/a/XzCanxx

          Imagine if instead of generating the RGB image directly the model would generate something like that, but with richer descriptive embeddings on each segment, and then having a separate model generating the final RGB image. Then it would be easy to change the background, rotate the peach, change color, add other fruits, etc, by editing this semantic representation of the image instead of wrestling with the prompt to try to do small changes without regenerating the entire image from scratch.

    • xucian 78 days ago
      I guess the inefficiency is obvious to many, it's just a matter of time until something like this will come out. and yeah, as others said, you might lose info a-to-b that's needed for b-to-c, but you gain more in predictability/customization
    • Zambyte 85 days ago
      You seem to be describing ComfyUI to me. You can definitely do this kind of workflow with ComfyUI.
    • tylerchilds 85 days ago
      disney’s multiplane camera but for ai

      compositing.

      do this with ai today where each layer you want has just the artifact on top of a green background.

      layer them in the order you want, then chroma key them out like you’re a 70s public broadcasting station producing reading rainbow.

      the ai workflow becomes a singular, recursive step until your disney frame is complete. animate each layer over time and you have a film.

    • seydor 85 days ago
      Neural networks will gradually be compressed to their minimum optimal size (once we know how to do that)
  • trickstra 85 days ago
    Non-commercial is not open-source, because if the original copyright holder stops maintaining it, nobody else can continue (or has to work like a slave for free). Open-source is about what happens if the original author stops working on it. Open-source gives everyone the license to continue developing it, which obviously means also the ability to get paid. Don't call it open-source if this aspect is missing.

    Only the FLUX.1 [schnell] is open-source (Apache2), FLUX.1 [dev] is non-commercial.

    • uxhacker 85 days ago
      There is OpenFLUX.1 which is a fine tune of the FLUX.1-schnell model that has had the distillation trained out of it. OpenFLUX.1 is licensed Apache 2.0. https://huggingface.co/ostris/OpenFLUX.1/
    • starfezzy 85 days ago
      Doesn’t open source mean the source is viewable/inspectable? I don’t know any closed source apps that let you view the source.
      • dredmorbius 85 days ago
        "Open Source" has a specific definition, created by the Open Source Initiative:

        <https://opensource.org/osd>

        Certain usages may be covered by trademark protection, as an "OSI Approved License":

        <https://opensource.org/trademark-guidelines>

        It's based on the Debian Free Software Guidelines (DFSG), which were adopted by the Debian Project to determine what software does, and does not, qualify to be incorporated into the core distribution. (There is a non-free section, it is not considered part of the core distribution.)

        <https://www.debian.org/social_contract#guidelines>

        Both definitions owe much to the Free Software Foundation's "Free Software" definition and the four freedoms protected by the GNU GPL:

        - the freedom to use the software for any purpose,

        - the freedom to change the software to suit your needs,

        - the freedom to share the software with your friends and neighbors, and

        - the freedom to share the changes you make.

        <https://www.gnu.org/licenses/quick-guide-gplv3>

        <https://www.gnu.org/philosophy/free-sw.html>

      • miki123211 85 days ago
        > Doesn’t open source mean the source is viewable/inspectable?

        According to the OSI definition, you also need a right to modify the source and/or distribute patches.

        > I don’t know any closed source apps that let you view the source.

        A lot of them do, especially in the open-core space. THe model is called source-available.

        If you're selling to enterprises and not gamers, that model makes sense. What stops large enterprises from pirating software is their own lawyers, not DRM.

        This is why you can put a lot of strange provisions into enterprise software licenses, even if you have little to no way to enforce these provisions on a purely technical level.

      • havaker 85 days ago
        Open source usually means that you are able to modify and redistribute the software in question freely. However between open and closed, there is another class - source-available software. From its wikipedia page:

        > Any software is source-available in the broad sense as long its source code is distributed along with it, even if the user has no legal rights to use, share, modify or even compile it.

      • trickstra 84 days ago
        As I said above: Open-source is about what happens if the original author stops working on it. Having the code viewable/inspectable is a side effect of that - can't sustain a project if all you have are blobs. Famously, Richard Stallman started GNU because he wanted to fix a printer: "Particular incidents that motivated this include a case where an annoying printer couldn't be fixed because the source code was withheld from users." https://en.wikipedia.org/wiki/History_of_free_and_open-sourc...
      • aqme28 85 days ago
        Website frontends are always source viewable, but that is not OSS.
  • thomashop 85 days ago
    If you want to play with FLUX.schnell easily, type the prompt into a Pollinations URL:

    https://pollinations.ai/p/a_donkey_holding_a_sign_with_flux_...

    https://pollinations.ai/p/a_donkey_holding_a_sign_with_flux_...

    https://pollinations.ai/p/Minimalist%20and%20conceptual%20ar...

    It's incredible how fast it is. We generate 8000 images every 30 minutes for our users using only three L40S GPUs. Disclaimer: I'm behind Pollinations

    • peterpans01 85 days ago
      The "only" word sounds quite expensive for most of us.
      • Kiboneu 85 days ago
        He started a whole business to help pay the installments.
      • FridgeSeal 85 days ago
        “I have successfully destabilised many countries with only a few tanks”.
  • jsemrau 85 days ago
    My favorite thing to do with Flux is create images with a white background for my substack[1] because the text following is amazing and I can communicate something visually through the artwork as well.

    [1]https://substackcdn.com/image/fetch/w_1456,c_limit,f_webp,q_...

    • ruthmarx 85 days ago
      That example you gave is a good reason why artists get pissed off IMO. The LLM is clearly aping some artists specific style, and now missing out on paid work as a result.

      Not sure I have an opinion on that, technology marches on etc, but it is interesting.

      • jsemrau 85 days ago
        I understand your point, but in 0% of all cases would I hire an artist to create imagery for my personal blog. Therefore, I would think that market doesn't exist.
        • earthnail 85 days ago
          However, the blogs or newspapers or print outlets that used to hire them hired them because you couldn’t- it was a differentiator.

          That differentiator is gone, and as such won’t pay for it anymore. They’ll just use the same AI as you.

          This destroys the existing market of the artist.

          To be clear, my comment isn’t meant as a judgment, just as market analysis.

          • jsemrau 85 days ago
            I think it does not take into consideration how much thought and expertise goes into design work. Have a look at the recent controversy about the live-action shooter "concord" that failed spectacularly mainly due to bad character design.

            Here are two videos that explain that well. I don't think I would ever be capable of designing with that degree of purpose given a generative AI tool.

            [1] https://www.youtube.com/watch?v=mVyXUMJLzE0 [2] https://www.youtube.com/watch?v=5eymH15AfAU

            • Mklomoto 85 days ago
              There is plenty of space between black and white.

              I have taste just no skill in drawing. I don't need an artist, i need a graphics designer and now i can replace a graphics designer with GenAI.

              Plenty of Artists can draw very well, but what they learn in the industry is to learn to draw for someone else in an aligned art style etc. That has nothing to do with Art.

              Very few people earn there living with being artists.

              Its the same thing with all the other people. Look at masterpieces of woodworkers etc. They look interesting, nice but they normall just work for someone else doing their craft not their art.

            • ionwake 85 days ago
              Thanks for the links Im glad there are people who are experts at character design. For my untrained eyes it just looks like all of the characters are muddy coloured ( washed out greens brown etc ) AND they are pretty much all incredibly ugly. I think I saw one that atleast looked fashionable, the black sniper female.

              The older I get the more concerned I get that the larger the team that makes decisions the worse the decisions are, whats the word for this? Is there any escape? Teamfortress 2, took years and teams to build, but it was just perfect.

              I heard they had a flat structure which is even more confusing as to how they attained such an excellent product.

              • hansvm 85 days ago
                Bureaucracy and hierarchy are much more damaging to good products than a large team. The flat structure and long timelines are how they overcame the limitations of a large team.
                • ionwake 84 days ago
                  Thanks for the reply, I guess thats the secret
          • smrtinsert 85 days ago
            This is about as realistic as replacing coders with ai tools today. High level content organizations demand creative precision that even models like Flux can ape but not replace. Maybe to a non-artist it would be comparable, but to a creative team its not close.
        • ruthmarx 85 days ago
          Yeah, I get that completely, I'm the same way. I just think it's interesting. It's kind of the same argument as piracy, since most people wouldn't pay for what they download if it wasn't free.
          • ilkke 85 days ago
            What is different in this case is that large companies are very likely looking to replace artists with ai, which is a huge potential impact. Piracy never had such risks
            • jsemrau 85 days ago
              I think this will only happen if you could selectively replace parts within an image selectively and reliably. There are still major problems even in Photoshops genaI application. For example, it is not possible select the head of a person on a picture and then type "smile" to make the face smile. We might get there eventually.
          • jsemrau 85 days ago
            I'd rather think it's the same argument as open-source and public domain. Currently, I am researching an agent that ReAct's through a game of TicTacToe. I am using a derivative of the open-source transformer's prompt
            • ruthmarx 85 days ago
              > I'd rather think it's the same argument as open-source and public domain.

              In the context of the point I made, it's definitely more similar to piracy, since the point was about taking advantage of something that if not free people would not pay for.

      • pajeets 85 days ago
        Dont care about artists opinion on rest of using AI tools instead of not paying them because I couldnt and wouldnt so theres no demand in the first place.

        All I wanna know is the prompt that was used to generate the art speaking of which i wanna know how to create cartoony images like that OP

    • slig 85 days ago
      Could you share the prompt? Thanks.
      • jsemrau 85 days ago
        The prompt is actually not that interesting.

        "A hand-drawing of a scientific middle-aged man in front of a white background. The man is wearing jeans and a t-shirt. He is thinking a bubble stating "What's in a ReAct JSON prompt?" In the style of European comic book artists of the 1970s and 1980s."

        Finding the right seed and model configuration is the more difficult part.

        • slig 85 days ago
          Thank you!
        • pajeets 85 days ago
          just tried it out and it struggled with the bubble caption and adopting other drawing styles but oh god yes this is awesome because an image like this would take forever for me to do and if even arranging someone to commission it is expensive

          starving artists are going to famish now, not sure how to feel about it

  • vunderba 85 days ago
    Flux is the leading contender for a locally hosted generative systems in terms of prompt adherence, but the omnipresent shallow depth of field is irritatingly hard to get rid of.
    • cranium 85 days ago
      I guess it's optimized for artsy images?
      • AuryGlenz 85 days ago
        They almost certainly did DPO it, so that would have an effect. It was also probably just trained more on professional photography than cell phone pics.

        I’ve found it odd how there’s a segment of the population that hates a shallow depth of field now, as they’re so used to their phone pictures. I got in an argument on Reddit (sigh) with someone who insisted that the somewhat shallow depth of field that SDXL liked to do by default was “fake.”

        As in, he was only ever exposed to it through portrait mode and the like on phones and didn’t comprehend that larger sensors simply looked like that. The images he was posting that looked “fake” to him looked to be about a 50mm lens at f/4 on a full frame camera at a normal portrait distance, so nothing super shallow either.

        • Adverblessly 85 days ago
          As a DoF "hater", my problem with it is that DoF is just the result of a sensor limitation (when not used artistically etc.), not some requirement of generating images. If I can get around that limitation, there's very little motivation to maintain that flaw.

          In the real world, if I see a person at the beach, I can look at the person and see them in perfect focus, I can then look at the ocean behind them and it is also in perfect focus. If you are an AI generating an image for me, I certainly don't need you to tell me on which parts of that image I'm allowed to focus, just let me see both the person and the ocean (unless I tell you to give me something artsy :)).

          • tcrenshaw 85 days ago
            While you could look at DoF as a sensor limitation, most photographers use it as an artistic choice. Sure, I could take a pic at f/16 and have everything within the frame in focus, but maybe the background is distracting and takes away from the subject. I can choose how much background separation I want; maybe just a touch at f/8, maybe full on blue at f/1.2
          • AuryGlenz 82 days ago
            If you have the camera focus on the person, they'll be in perfect focus. If you then have the camera focus on the ocean, it'll be in focus.

            Our eyes work the same way. Of course, just like the camera's aperture can be set our pupils will be pretty contracted on a beach.

            Of course, you should be able to tell the AI to generate it how you want - that's the goal, after all. Having at least a somewhat shallow depth of field by default makes sense though.

        • vunderba 85 days ago
          That's pretty funny. It reminds me of if you grew up watching movies with the standard 24 fps - trying to watch films at 60fps later felt unnatural and fake.

          I'll say I'm okay with DOF - it just feels (subjectively to me) like its incredibly exaggerated in Flux. The workarounds have mostly been prompt based adding everything from "gopro capture" to "on flickr in 2007" but this approach feels like borderline alchemy in terms of how reliable it is.

      • llm_trw 85 days ago
        Give it another month and it will be porn, just like sdxl.
        • Zopieux 85 days ago
          What are you talking about, the model is months old, it's already all porn - and that's okay.
  • CosmicShadow 85 days ago
    I just cancelled my Midjourney subscription, it feels like it's fallen too far behind for the stuff I'd like to do. Spent a lot of time considering using Replicate as well as Ideogram.
    • simonjgreen 85 days ago
      I have been questioning the value beyond novelty as well recently. I’m curious if you replaced it with another tool or simply don’t derive value from those things?
    • pajeets 85 days ago
      never used midjourney because it had that signature look and bad with hands, feet, letters

      crazy not even a year has past since Emad's downfall a local open source and superior model drops

      which just shows how little moat these companies have and are just lighting cash on fire which we benefit from

      • rolux 85 days ago
        > crazy not even a year has past since Emad's downfall a local open source and superior model drops

        > which just shows how little moat these companies have

        Flux was developed by the same people that made Stable Diffusion.

      • aqme28 85 days ago
        Flux has a signature look too, it’s just a different one.
      • keiferski 85 days ago
        It’s very easy to turn off the default Midjourney look.
  • 112233 85 days ago
    Does someone know what FLUX 1.1 has been trained on? I generated almost hundred images on the pro model using "camera filename + simple word" two word prompts, and it all looks like photos from someones phone. Like, unless it has text I would not even stop to consider any of these images AI. They sometimes look cropped. A lot of food pictures, messy tables and appartments etc.

    Did they scrape public facebook posts? Snapchat? Vkontakte? Buy private images from onedrive/dropbox? If I put as the second word a female name, it almost always triggers nsfw filter. So I assume images in the training set are quite private.

    See for yourself (autoplay music warning):

    people: https://vm.tiktok.com/ZGdeXEhMg/

    food and stuff: https://vm.tiktok.com/ZGdeXEBDK/

    signs: https://vm.tiktok.com/ZGdeXoAgy/

    [edit] Looking at these images feels uneasy, like I am looking at someones private photos. There is not enough "guidance" in a prompt like "IMG00012.JPG forbid" to account for these images, so it must all come from the training data.

    I do not believe FLUX 1.1 pro has radically different training set than these previous open models, even if it is more prone to such generation.

    It feels really off, so, again, is there any info on training data used for these models?

    • smusamashah 85 days ago
      It's not just flux, you can do the same with other models including Stable Diffusion.

      These two reddit threads [1][2] explore this convention a bit.

          DSC_0001-9999.JPG - Nikon Default
          DSCF0001-9999.JPG - Fujifilm Default
          IMG_0001-9999.JPG - Generic Image
          P0001-9999.JPG - Panasonic Default
          CIMG0001-9999.JPG - Casio Default
          PICT0001-9999.JPG - Sony Default
          Photo_0001-9999.JPG - Android Photo
          VID_0001-9999.mp4 - Generic Video
          
          Edit: Also created a version for 3D Software Filenames (all of them tested, only a few had some effects)
          
          Autodesk Filmbox (FBX): my_model0001-9999.fbx
          Stereolithography (STL): Model0001-9999.stl
          3ds Max: 3ds_Scene0001-9999.max
          Cinema 4D: Project0001-9999.c4d
          Maya (ASCII): Animation0001-9999.ma
          SketchUp: SketchUp0001-9999.skp
      
      
      [1]: https://www.reddit.com/r/StableDiffusion/comments/1fxkt3p/co...

      [2]: https://www.reddit.com/r/StableDiffusion/comments/1fxdm1n/i_...

    • jncfhnb 85 days ago
      I highly doubt it’s a product of the raw training dataset because I had the opposite problem. The token for “background” introduced intense blur on the whole image almost regardless of how it was used in the prompt, which is interesting because their prompt interpretation is much better.

      It seems likely that they did heavy calibration of text as well as a lot of tuning efforts to make the model prefer images that are “flux-y”.

      Whatever process they’re following, they’ve inadvertently made the model overly sensitive to certain terms to the point at which their mere inclusion is stronger than a Lora.

      The photos you’re showing aren’t especially noteworthy in the scheme of things. It doesn’t take a lot of effort to “escape” the basic image formatting and get something hyper realistic. Personally I don’t think they’re trying to hide the hyper realism so much as trying to default to imagery that people want.

    • pajeets 85 days ago
      I experienced the same thing, it was so weird i got good results in the beginning and then it "craps out"

      dont know why all the critical comments about flux are being downvoted or flag sure is weird

  • thierryzoller 85 days ago
    They point to their comparison page to claim similar quality. First off it's very clear that the details are way less, but worse, look at the example "Three-quarters front view of a yellow 2017 Corvette coming around a curve in a mountain road and looking over a green valley on a cloudy day."

    The Original model shows the FRONT, the speed version shows the BACK of the corvette. It's a completely different picture. This is not similar but strikingly different.

    https://flux-quality-comparison.vercel.app/

  • Palmik 85 days ago
    Every time there's a thread about models from Meta, there's a flood of comments clarifying that they aren't really open source.

    So let's also set the record straight for FLUX: only one of the models released is open source -- FLUX schnell -- it's a distillation from the proprietary model that's much harder to work with.

    Meta's Llama models have ironically much more permissive license for all practical intents and purposes and they are also incredibly easy to fine tune (using Meta's own open source framework, or several third party ones), while FLUX schnell isn't.

    I think the open source community should rally behind OpenFLUX or a similar project, which tries to fix the artificial limitations of Schnell: https://huggingface.co/ostris/OpenFLUX.1

  • swyx 85 days ago
    > We added a new synchronous HTTP API that makes all image models much faster on Replicate.

    ooh why is synchronous fast? i click thru to https://replicate.com/changelog/2024-10-09-synchronous-api

    > Our client libraries and API are now much faster at running models, particularly if a file is being returned.

    ... thanks?

    just sharing my frustration as a developer. try to explain things a little better if you'd like it to stick/for us to become your advocates.

    • weird-eye-issue 85 days ago
      I mean it literally explains why in the second paragraph. It returns the actual file data in the response rather than a URL where you have to make a second request to get the file data
      • swyx 85 days ago
        thats not "making the image models much faster", thats just making getting the image back slightly faster
        • weird-eye-issue 85 days ago
          In all practical senses it is the same thing
        • popalchemist 85 days ago
          The "making the image models much faster" part is model optimizations that are also explained in the post.
          • ErikBjare 85 days ago
            Where? I don't see any explanation of model optimizations in the linked post.
    • bfirsh 84 days ago
      You're right -- this wasn't clear. Added another paragraph to explain what you had to do before.
  • jncfhnb 85 days ago
    Does this translate to gains on local with comfy
  • LeicaLatte 85 days ago
    Flux is awesome and improving all the time.
  • marginalia_nu 85 days ago
    Given the HN exposure, feels like a huge missed opportunity to write anywhere in the article what FLUX even is and what it's for. A single sentence would help so much. The way it's written, you can read the entire thing and still have no clue.
  • swyx 85 days ago
    this comparison for the quantization effect is very nice https://flux-quality-comparison.vercel.app/

    however i do have to ask.. ~2x faster for fp16->fp8 is expected right? its still not as good as the "realtime" or "lightning" options that basically have to be 5-10x faster. whats the ideal product usecase for just ~2x faster?

    • sroussey 85 days ago
      Funny, sometime I like the fast one better.
  • sandos 84 days ago
    Well, hands are still super-funny in these for sure. How come thats still not a solved problem?
    • troyvit 84 days ago
      I can't speak for AIs but whenever I try to draw a hand I quickly look for an excuse to add mittens.
  • dvrp 85 days ago
    i think we (krea) are faster at the time of writing this comment (but i’ll have to double-check on our infra)
  • ionwake 85 days ago
    How long does flux take to generate an image if it runs on an m1 macbook pro? Can anyone estimate?
  • chmaynard 85 days ago
    Tastes great, too!
  • mvdtnz 85 days ago
    Ok? What is it?
  • lolinder 85 days ago
    • dang 85 days ago
      "Please don't complain about tangential annoyances—e.g. article or website formats, name collisions, or back-button breakage. They're too common to be interesting."

      https://news.ycombinator.com/newsguidelines.html

      • lolinder 85 days ago
        I just posted a reply to another person quoting this guideline:

        > In general I think that's true and agree that minor name collision commentary is uninteresting, but in this case we're talking about 11 collisions (and counting) in tech alone, 3 of those in AI/ML and 1 of those specifically in image generation.

        > When it's that bad I think that the frequency of collisions for this name is an interesting topic in its own right.

        I'll respect your judgement on this and not push it further, but this is my thought process here.

        • dang 85 days ago
          That makes sense and you're not wrong - it's just that there's a clear tradeoff in terms of more vs. less interesting conversation. Having that simple rule is a net win for HN.
    • CGamesPlay 85 days ago
      Well, the word refers to "continuous change", so I guess it's pretty appropriate.
    • dig1 85 days ago
      Also https://github.com/influxdata/flux - "a lightweight scripting language for querying databases and working with data"
    • Conscat 85 days ago
      The first thing that comes to mind when I think "flux" is none of the above too . There's an extremely cool alternative iterator library for C++20 by Tristan Brindle named flux.
      • bigiain 85 days ago
        /me glances across my desk to see my soldering station...
    • roenxi 85 days ago
      And then you can branch out of AI - https://en.wikipedia.org/wiki/The_Flux_Foundation works on public art.
    • worstspotgain 85 days ago
    • artificialLimbs 85 days ago
      Don't forget Caleb Porzio's new Laravel UI kit.

      https://fluxui.dev/

    • Vt71fcAqt7 85 days ago
      >Please don't complain about tangential annoyances—e.g. article or website formats, name collisions, or back-button breakage. They're too common to be interesting.
      • lolinder 85 days ago
        In general I think that's true and agree that minor name collision commentary is uninteresting, but in this case we're talking about 11 collisions (and counting) in tech alone, 3 of those in AI/ML and 1 of those specifically in image generation.

        When it's that bad I think that the frequency of collisions for this name is an interesting topic in its own right.

    • swyx 85 days ago
      there are just some names that technology brothers gravitate to like moths to a flame. Orion, Voltron, Galactus...
  • pajeets 85 days ago
    [flagged]
    • Fauntleroy 85 days ago
      What prompt did you give it? Is it capable of animation?