The future of Deep Learning frameworks

(neel04.github.io)

209 points | by lairv 251 days ago

29 comments

  • srush 251 days ago
    PyTorch is a generationally important project. I've never seen a tool that is so inline with how researchers learn and internalize a subject. Teaching Machine Learning before and after its adoption has been a completely different experience. Never can be said enough how cool it is that Meta fosters and supports it.

    Viva PyTorch! (Jax rocks too)

    • deepsquirrelnet 250 days ago
      This is exactly why I gravitated to it so quickly. The first time I looked at pytorch code it was immediately obvious what the abstractions meant and how to use them to write a model architecture.

      Jax looks like something completely different to me. Maybe I’m dumb and probably not the target audience, but it occurs to me that very few people are. When I read about using Jax, I find recommendations for a handful of other libraries that make it more useable. Which of those I choose to learn is not entirely obvious because they all seem to create a very fragmented ecosystem with code that isn’t portable.

      I’m still not sure why I’d spend my time learning Jax, especially when it seems like most of the complaints from the author don’t really separate out training and inference, which don’t necessarily need to occur from the same framework.

      • 6gvONxR4sf7o 250 days ago
        Honestly, when I turn to JAX, I generally do it without a framework. It’s like asking for a framework to wrap numpy to me. Just JAX plus optax is sufficient for me in the cases I turn to it.
    • PostOnce 250 days ago
      Torch was originally a Lua project, hence why pytorch is called pytorch and not just torch.

      In another timeline AI would have made Lua popular.

      The best part is it trampled TensorFlow which I personally find obtuse.

      • n7g 250 days ago
        > In another timeline AI would have made Lua popular.

        I wonder if it'd have been hated more than Python is - especially with the 1-based indexing...

        • goatlover 249 days ago
          Scientific computing tends to be 1-based. Thus R, Julia, Fortran, Matlab.
        • CuriouslyC 250 days ago
          Python isn't hated AFAICT, though people will profess to hating building large projects in it (myself included), but many of those people also love it for shorter programs and scripts.
          • dartos 249 days ago
            Everything is hated.

            Python has always gotten hate for being super super slow and having an ugly syntax (subjective ofc, but I happen to agree)

    • pjmlp 247 days ago
      Additionally, nowadays it also has Java and C++ bindings to the same native libraries, so others can enjoy performance without having to rewrite their research afterwards.
  • smhx 251 days ago
    the author got a couple of things wrong, that are worth pointing out:

    1. PyTorch is going all-in on torch.compile -- Dynamo is the frontend, Inductor is the backend -- with a strong default Inductor codegen powered by OpenAI Triton (which now has CPU, NVIDIA GPU and AMD GPU backends). The author's view that PyTorch is building towards a multi-backend future isn't really where things are going. PyTorch supports extensibility of backends (including XLA), but there's disproportionate effort into the default path. torch.compile is 2 years old, XLA is 7 years old. Compilers take a few years to mature. torch.compile will get there (and we have reasonable measures that the compiler is on track to maturity).

    2. PyTorch/XLA exists, mainly to drive a TPU backend for PyTorch, as Google gives no other real way to access the TPU. It's not great to try shoe-in XLA as a backend into PyTorch -- as XLA fundamentally doesn't have the flexibility that PyTorch supports by default (especially dynamic shapes). PyTorch on TPUs is unlikely to ever have the experience of JAX on TPUs, almost by definition.

    3. JAX was developed at Google, not at Deepmind.

    • n7g 250 days ago
      Hey, thanks for actually engaging with the blog's points instead of "Google kills everything it touches" :)

      1. I'm well aware of the PyTorch stack, but this point:

      > PyTorch is building towards a multi-backend future isn't really where things are going

      >PyTorch supports extensibility of backends (including XLA)

      Is my problem. Those backends just never integrate well as I mentioned in the blogpost. I'm not sure if you've ever gone into the weeds, but there are so many (often undocumented) sharp edges when using different backends that they never really work well. For example, how bad Torch:XLA is and the nightmare inducing bugs & errors with it.

      > torch.compile is 2 years old, XLA is 7 years old. Compilers take a few years to mature

      That was one of my major points - I don't think leaning on torch.compile is the best idea. A compiler would inherently place restrictions that you have to work-around.

      This is not dynamic, nor flexible - and it flies in the face of torch's core philosophies just so they can offer more performance to the big labs using PyTorch. For various reasons, I dislike pandering to the rich guy instead of being an independent, open-source entity.

      2. Torch/XLA is indeed primarily meant for TPUs - like the quoted announcement, where they declare to be ditching TF:XLA in favour of OpenXLA. But there's still a very real effort to get it working on GPUs - infact, a lab on twitter declared that they're using Torch/XLA on GPUs and will soon™ release details.

      XLA's GPU support is great, its compatible across different hardware, its optimized and mature. In short, its a great alternative to the often buggy torch.compile stack - if you fix the torch integration.

      So I won't be surprised if in the long-term they lean on XLA. Whether that's a good direction or not is upto the devs to decide unfortunately - not the community.

      3. Thank you for pointing that out. I'm not sure about the history of JAX (maybe might make for a good blogpost for JAX devs to write someday), but it seems that it was indeed developed at Google research, though also heavily supported + maintained by DeepMind.

      Appreciate you giving the time to comment here though :)

      • smhx 248 days ago
        If you're the author, unfortunately I have to say that the blog is not well-written -- misinformed about some of the claims and has a repugnant click-baity title. you're getting the attention and clicks, but probably losing a lot of trust among people. I didn't engage out of choice, but because of a duty to respond to FUD.

        > > torch.compile is 2 years old, XLA is 7 years old. Compilers take a few years to mature

        > That was one of my major points - I don't think leaning on torch.compile is the best idea. A compiler would inherently place restrictions that you have to work-around.

        There are plenty of compilers that place restrictions that you barely notice. gcc, clang, nvcc -- they're fairly flexible, and "dynamic". Adding constraints doesn't mean you have to give up on important flexibility.

        > This is not dynamic, nor flexible - and it flies in the face of torch's core philosophies just so they can offer more performance to the big labs using PyTorch. For various reasons, I dislike pandering to the rich guy instead of being an independent, open-source entity.

        I think this is an assumption you've made largely without evidence. I'm not entirely sure what your point is. The way torch.compile is measured for success publicly (even in the announcement blogpost and Conference Keynote, link https://pytorch.org/get-started/pytorch-2.0/ ) is by measuring on a bunch of popular PyTorch-based github repos in the wild + popular HuggingFace models + the TIMM vision benchmark. They're curated here https://github.com/pytorch/benchmark . Your claim that its to mainly favor large labs is pretty puzzling.

        torch.compile is both dynamic and flexible because: 1. it supports dynamic shapes, 2. it allows incremental compilation (you dont need to compile the parts that you wish to keep in uncompilable python -- probably using random arbitrary python packages, etc.). there is a trade-off between dynamic, flexible and performance, i.e. more dynamic and flexible means we don't have enough information to extract better performance, but that's an acceptable trade-off when you need the flexibility to express your ideas more than you need the speed.

        > XLA's GPU support is great, its compatible across different hardware, its optimized and mature. In short, its a great alternative to the often buggy torch.compile stack - if you fix the torch integration.

        If you are an XLA maximalist, that's fine. I am not. There isn't evidence to prove out either opinions. PyTorch will never be nicely compatible with XLA until XLA has significant constraints that are incompatible with PyTorch's User Experience model. The PyTorch devs have given clear written-down feedback to the XLA project on what it takes for XLA+PyTorch to get better, and its been a few years and the XLA project prioritizes other things.

        • n7g 247 days ago
          > There are plenty of compilers that place restrictions that you barely notice. gcc, clang, nvcc -- they're fairly flexible, and "dynamic"

          In the context of scientific computing - this is completely, blatantly false. We're not lowering low-level IR to machine code. We want to perform certain mathematical processes often distributed on a large number of nodes. There's a difference between ensuring optimization (i.e no I/O bottlenecks, adequate synchronization between processes, overlapping computation with comms) vs. simply transforming a program to a different representation.

          This is classic [false analogy](https://simple.wikipedia.org/wiki/False_analogy)

          Adding constraints does mean that you give up on flexibility precisely because you have to work around them. For example, XLA is constrained intentionally against dynamic-loops because you lose a lot of performance and suffer a huge overhead. So the API forces you to think about it statically (like you can work around it with fancier methods like using checkpointing and leveraging a tree-verse algorithm)

          I'll need more clarification regarding this point, because I don't know what dev in which universe will not regard "constraints" as flying against the face of flexibility.

          > popular HuggingFace models + the TIMM vision benchmark

          Ah yes, benchmark it on models that are entirely static LLMs or convnet-hybrids. Clearly, high requirement on dynamicness and flexibility there.

          (I'm sorry but that statement alone has lost you any credibility for me.)

          > Your claim that its to mainly favor large labs is pretty puzzling.

          Because large labs often play with the safest models, which often involves scaling them up (OAI, FAIR, GDM etc.) and those tend to be self-attention/transformer like workloads. The devs have been pretty transparent about this - you can DM them if you want - but their entire stack is optimized for these usecases.

          And ofcourse, that won't involve considering for research workloads which tend to be highly non-standard, dynamic and rather complex and much, much harder to optimize for.

          This is where the "favouring big labs" comes from.

          > 1. it supports dynamic shapes

          I agree that in the specifically narrow respect of dynamic shapes, it's better than XLA.

          But then it also misses a lot of the optimization features XLA has such as its new cost model and Latency Hiding Scheduler (LHS) stack which is far better at async overlapping of comms, computations and even IO (as its lazy).

          > there is a trade-off between dynamic, flexible and performance

          Exactly. Similarly, there's a difference in the features offered by each particular compiler. Torch's compiler's strengths may be XLA's weakness, and vice-versa.

          But its not perfect - no software can be, and compilers certainly aren't exceptions. My issue is that the compiler is being considered at all in torch.

          There are use-cases where the torch.compile stack fails completely (not sure how much you hang around more research-oriented forums) wherein there are some features that simply do not work with torch.compile. I cited FSDP as the more egregious one because its so common in everyone's workflow.

          That's the problem. Torch is optimizing their compiler stack for certain workloads, with a lot of new features relying on them (look at newly proposed DTensor API for example).

          If I'm a researcher with a non-standard workload, I should be able to enjoy those new features without relying on the compiler - because otherwise, it'd be painful for me to fix/restrict my code for that stack.

          In short, I'm being bottlenecked by the compiler's capabilities preventing me to fully utilize all features. This is what I don't like. This is why torch should never be leaning at a compiler at all.

          It 'looks' like a mere tradeoff, but reality is just not as simple as that.

          > XLA:GPU

          I don't particularly care if torch uses whatever compiler stack the devs choose - that's beside the point. Really, I just don't like the compiler-integrated approach at all. The choice of the specific stack doesn't matter.

    • lunaticd 250 days ago
      3. The project started under a Harvard affiliated Github org during the course of PhDs. These same people later joined Google where it continued to be developed and over time adopted more and more in place of TensorFlow.
  • logicchains 251 days ago
    PyTorch beat Tensorflow because it was much easier to use for research. Jax is much harder to use for exploratory research than PyTorch, due to requiring a fixed shape computation graph, which makes implementing many custom model architectures very difficult.

    Jax's advantages shine when it comes to parallelizing a new architecture across multiple GPU/TPUs, which it makes much easier than PyTorch (no need for custom cuda/networking code). Needing to scale up a new architecture across many GPUs is however not a common use-case, and most teams that have the resources for large-scale multi-gpu training also have the resources for specialised engineers to do it in PyTorch.

  • ianbutler 250 days ago
    From an eng/industry perspective, back in 2016/2017 I watched the realtime decline of Tensorflow towards Pytorch.

    The issue was TF had too many interfaces to accomplish the same thing and each one was rough in its own way. Along with some complexity for using serving and experiment logging via Tensorboard, but this wasn’t as bad at least for me.

    Keras was integrated in an attempt to help, but ultimately it wasn’t enough and people started using Torch more and more even against the perception that TF was for prod workloads and Torch was for research.

    TFA mentions the interface complexity as starting to be a problem with Torch, but I don’t think we’re anywhere near the critical point that would cause people to abandon it in favor of JAX.

    Additionally with JAX you’re just shoving the portability problems mentioned down to XLA which brings its own issues and gotchas even if it hides the immediate reality of said problems from the end user.

    I think the Torch maintainers should watch not to repeat the mistakes of TF, but I think theres a long way to go before JAX is a serious contender. It’s been years and JAX has stayed in relatively small usage.

    • markeroon 250 days ago
      Imo the biggest issue (from memory) was that Tensorflow used a static computation graph. PyTorch was so much easier to work with.
      • lostmsu 250 days ago
        I honestly think the static graph was much better in the similar way Vulkan/DX12 are better than OpenGL/DX9. It is harder to program, but gives more explicit control of important things. E.g. who would use PyTorch if they new that for optimal performance they'd need to record CUDA graphs?
      • ianbutler 250 days ago
        Yup that was also definitely one among the many issues
  • wafngar 250 days ago
    PyTorch is developed by multiple companies / stake holders while jax is google only with internal tooling they don’t share with the world. This alone is a major reason not to use jax. Also I think it is more the other way around: with torch.compile the main advantage of jax is disappearing.
    • dauertewigkeit 250 days ago
      It's the old age question in programming: Do you use a highly constrained paradigm that allows easy automatic optimization or do you use a very flexible and more user intuitive paradigm that makes automatic optimization harder?

      If the future is going to be better more intelligent compilers, then that settles the question in my opinion.

    • n7g 250 days ago
      > with torch.compile the main advantage of jax is disappearing.

      Interesting take - I agree here somewhat.

      But also, wouldn't you think a framework that has been from the ground-up designed around a specific, mature compiler stack be better able to integrate compilers in a more stable fashion than just shoe-horning static compilers into a very dynamic framework? ;)

      • wafngar 250 days ago
        Depends. PyTorch on the other hand has a large user base and well defined and tested api. So should be doable; and is already progressing and rapid speed..
      • anon389r58r58 250 days ago
        So the answer is not Jax?

        Because JAX is not designed around a mature compiler stack. The history of Jax is more so that it matured alongside the compiler...

  • 0x19241217 250 days ago
    Pushback notwithstanding, this article is 100% correct in all PyTorch criticisms. PyTorch was a platform for fast experimentation with eager evaluation, now they shoehorn "compilers" into it. "compilers", because a lot of the work is done by g++ and Triton.

    It is a messy and quickly expanding codebase with many surprises like segfaults and leaks.

    Is scientific experimentation really sped up by these frameworks? Everyone uses the Transformer model and uses the same algorithms over and over again.

    If researchers wrote directly in C or Fortran, perhaps they'd get new ideas. The core inference (see Karparthy's llama.c) is ridiculously small. Core training does not seem much larger either.

    • dunefox 250 days ago
      > If researchers wrote directly in C or Fortran...

      ... then they would get nothing done.

      • TheRealKing 249 days ago
        Fortran cannot be placed with C in the same category of low programming productivity.
      • pjmlp 247 days ago
        Apparently we could do research work before Python was invented.
      • pklausler 250 days ago
        Why not?
        • dunefox 250 days ago
          1. data cleaning and other tasks for which Python excels, 2. C and Fortran don't come close to the developer speed of Python. Ideas don't come from using languages like C, but are rather enabled by implementing, prototyping, and iterating quickly and with little resistance from the language, using existing libraries, etc. Python is simply much better in these regards.
  • sundarurfriend 250 days ago
    Can we get the title changed to the actual title of the post? "The future of Deep Learning frameworks" sounds like a neutral and far wider-reaching article, and ends up being clickbait here (even if unintentionally).

    "PyTorch is dead. Long live JAX." conveys exactly what the article about, and is a much better title.

  • funks_ 251 days ago
    I wish dex-lang [1] had gotten more traction. It’s JAX without the limitations that come from being a Python DSL. But ML researchers apparently don’t want to touch anything that doesn’t look exactly like Python.

    [1]: https://github.com/google-research/dex-lang

    • cherryteastain 251 days ago
      It's very rare that an ML project is _only_ the ML parts. A significant chunk of the engineering effort goes into data pipelines and other plumbing. Having access to a widely used general purpose language with plenty of libraries in addition to all the ML libraries is the real reason why everyone goes for Python for ML.
    • hatmatrix 251 days ago
      It seems like an experimental research language.

      Julia also competes in this domain from a more practical standpoint and has less limitations than JAX as I understand it, but is less mature and still working on getting wider traction.

      • funks_ 251 days ago
        The Julia AD ecosystem is very interesting in that the community is trying to make the entire language differentiable, which is much broader in scope than what Torch and JAX are doing. But unlike Dex, Julia is not a language built from the ground up for automatic differentiation.

        Shameless plug for one of my talks at JuliaCon 2024: https://www.youtube.com/live/ZKt0tiG5ajw?t=19747s. The comparison between Python and Julia starts at 5:31:44.

        • hatmatrix 251 days ago
          Ah I had not realized I was corresponding with the author of that talk - I'd followed it back when it was happening as I'm particularly interested in adapting AD.

          Where do you feel Julia is at this point in time (compared to say, JAX or PyTorch) from a practitioner's standpoint?

          • funks_ 248 days ago
            When it comes to general deep learning, Julia is much less mature than the JAX ecosystem. I think deep learning will be the hardest nut for Julia to crack. The field is moving incredibly fast, and network effects are strong. Julia's strength lies in scientific computing, so I think adoption will come through novel applications of AD/ML in the sciences, rather than trying to catch up with the latest LLM developments

            I'm positive about Julia's future because the developer experience just feels so fun and productive. I always find it impressive how much a small group of self-organized volunteers has been able to achieve. Amazing things could happen if a company like Google or Meta paid a team of full-time engineers to advance the deep learning ecosystem. Fun fact: Julia strongly influenced PyTorch's recent design decisions [1].

            [1]: https://dev-discuss.pytorch.org/t/where-we-are-headed-and-wh...

    • mccoyb 251 days ago
      Dex is also missing user authored composable program transformations, which is one of JAX’s hidden superpowers.

      So not quite “JAX without limitations” — but certainly without some of the limitations.

      • 6gvONxR4sf7o 250 days ago
        This is both its strength and its weakness. As soon as you write a jaxpr interpreter, you lose all the tooling that makes the python interpreter so mature. For example stack traces and debugging become black holes. If jax made it easy to write these transformations without losing python’s benefits it would be incredible.
      • funks_ 251 days ago
        Are you talking about custom VJPs/JVPs?
        • mccoyb 251 days ago
          No, I'm talking about custom `Jaxpr` interpreters which can modify programs to do things.
    • hedgehog 251 days ago
      It's not about the syntax, it's all the knowledge, tools, existing code, etc that make Python so attractive.
      • funks_ 251 days ago
        I don't doubt that, but I'm specifically talking about new languages. I've seen far more enthusiasm from ML researchers for Mojo, which doesn't even do automatic differentiation, than for Dex. And to recycle an old HN comment of mine, people are much more eager to learn a functional programming language if it looks like NumPy (I'm talking about JAX here).
        • hedgehog 250 days ago
          Is Mojo actually getting significant uptake in research? I haven't been following closely but new tooling at that layer seems much more useful when cost-optimizing deployment.
        • ZeroCool2u 250 days ago
          Mojo is interesting, because I get to keep all my existing Python code and libraries for free. Then when I need to speed things up I can use Mojo syntax.
          • funks_ 248 days ago
            As far as I understand, you will only be able to speed up code that was previously written in pure Python. This excludes JAX, PyTorch, NumPy and any other Python package written in C/C++/Rust/Fortran.
  • semiinfinitely 251 days ago
    PyTorch is the javascript of ML. sadly "worse is better" software has better survival characteristics even when there is consensus that technology X is theoretically better
    • sva_ 251 days ago
      I don't think the comparison is fair. Imo PyTorch has the cleanest abstractions, which is the reason it is so popular. People can do quick prototyping without having to spend too much time figuring out the engineering details that make their hardware run it.
    • pineapple_sauce 250 days ago
      How is Jax theoretically better than PyTorch? The author is ignorant of torch.compile and biased as other commenters have pointed out.
      • n7g 250 days ago
        Curious to know how the OP is biased when they have no conflict of interest, and they explicitly mention the `torch.compile` stack like... a few dozen times in the blog?
    • bob020202 251 days ago
      Nothing is even theoretically better than Javascript for its intended use cases, web frontends and backends. Mainly because it went all-in on event loop parallelism early-on, which isn't just for usability but also performance. And didn't go all-in on OOP unlike Java, and has easy imports/packages unlike Python. It has some quirks like the "0 trinity," but that doesn't really matter. No matter how good you are with something else, it still takes more dev time ($$$) than JS.

      Now it's been forever since I used PyTorch or TF, but I only remember TF 1.x being more like "why TF isn't this working." At some point I didn't blame myself, I blamed the tooling, which TF2 later admitted. It seemed like no matter how skilled I got with TF1, it'd always take much longer than developing with PyTorch, so I switched early.

      • josephg 251 days ago
        You don't think its possible to (even theoretically) improve on javascript for its intended use case? What a terrific lack of imagination.

        Typescript and Elm would like a word

        • bob020202 251 days ago
          No, I said there's nothing that exists right now that's theoretically better. Typescript isn't. It'd be great if TS's type inference were smart enough that it basically takes no additional dev input vs JS, but until then, it's expensive to use. It's also bolted on awkwardly, but that's changing soon. Could also imagine JS getting some nice Py features like list comp.

          Also, generally when people complain that JS won the web, it's not because they prefer TS, it's cause they wanted to use something else and can't.

          Never used Elm, but... no variables, kinda like Erlang, which I've used. That has its appeal, but you're not going to find a consensus that this is better for web.

          • dwattttt 251 days ago
            I see this a lot, and I want to find the right words for it: I don't want the types automatically determined for what I write, because I write mistakes.

            I want to write the type, and for that to reveal the mistake.

            • bob020202 251 days ago
              If you explicitly mark types at lower levels, or just use typed libs, it's not very easy to pass the wrong type somewhere and have it still accidentally work. The most automatic type inference I can think of today is in Rust, which is all about safety, or maybe C++ templates count too.
              • dwattttt 251 days ago
                I actually write a fair bit of Rust, and there I'll still explicitly add types. Not always (in particular where I have an IDE that'll show me the inferred type & it's trivial), but in particular if any more complicated inference is happening I'll constrain it.
              • kaoD 250 days ago
                TS has type inference on function return types while Rust forces you to type the whole function signature (for a good reason, TS's return type inference is a footgun).

                Other than that, I haven't noticed their inference capabilities being any different.

                So I don't get your point.

                Can you give a single example where Rust has more automatic type inference compared with TS? (honest question, maybe I'm missing something)

          • josephg 251 days ago
            > No, I said there's nothing that exists right now that's theoretically better.

            That might be what you meant, but its not what you said. Thanks for clarifying.

  • patrickkidger 250 days ago
    I think a lot of the commenters here are being rather unfair.

    PyTorch has better adoption / network effects. JAX has stronger underlying abstractions.

    I use both. I like both :)

    • n7g 250 days ago
      Hey patrick, love your work!

      I think the biggest, well "con" I've seen is non-technical - the fear of JAX being killed by Google.

      I mention in the blog as well [here](https://neel04.github.io/my-website/blog/pytorch_rant/#gover...) how important having an independent governance structure is. I'm sure for many big companies and labs, the lack of a promise of long-term, stable support is a huge dealbreaker.

      I'm not sure how much Google bureaucracy would limit this, but have you raised the subject of forming an independent entity to govern JAX, very much like PyTorch? I believe XLA is protected, as its with TF governance. But perhaps, there could be one for JAX's ecosystem as well, encompassing optax, equinox, flax etc.

      • Eridrus 250 days ago
        I can personally say, I am not super concerned about it being killed. Google supported TF1 for quite a long time and all these projects have a shelf life.

        What concerned me about JAX, at a small company, is that it doesn't benefit from the network effects of almost everyone developing for it. E.g. There is no Llama 3.1 implementation in JAX afaict.

        So as long as there is a need to pull from the rest of the world the ecosystem will trump the framework.

        Activity in the LLM space is slowing down though, so there is an opportunity to take the small set of what worked and port it to JAX and show people how good that world is.

    • wafngar 250 days ago
      Probably unfair as a reaction to the unfair statements in the blog…
  • deisteve 251 days ago
    i like pytorch because all the academia release their code with it

    ive never even heard of jax nor will i have the skills to use it

    i literally just want to know two things: 1) how much vram 2) how to run it on pytorch

    • sva_ 251 days ago
      Jax is a competing computational framework that does something similar to PyTorch, so both of your questions don't really make sense.
      • etiam 251 days ago
        Maybe deisteve will answer for himself, but I don't think that's meant to mean how to run Jax on Pytorch, but rather the questions he's interested in for any published model.
        • deisteve 251 days ago
          if a paper releases code in PyTorch im not going to sit there and complain Jax is more efficient

          its like fretting about how everything should be written in C++ instead of Python/Javascript

          I don't care.

      • deisteve 251 days ago
        can i run pytorch code on jax?

        if not im not interested. i'll keep using pytorch.

        i hope this answer makes more sense for you.

  • 6gvONxR4sf7o 250 days ago
    One aspect of jax that’s rarely touched on is browser stuff. Completely aside from deep learning, it’s straightforward to compile jax to a graphics shader you can call in js, which in this insane world is actually my preferred way to put numerical computing or linear algebra code on a web page.
    • mccoyb 250 days ago
      Can you share any code or references for this? Several people I work with are interested in this.
      • 6gvONxR4sf7o 249 days ago
        Sure. I’ll have to dig it up (shits boxed up as I move), but i’ll ping you or this comment when I get to it.
  • ein0p 251 days ago
    Jax is dead, long live PyTorch. PyTorch has _twenty times_ as many users as Jax. Any rumors of its death are highly exaggerated
    • tripplyons 251 days ago
      It's definitely exaggerated, but I personally prefer JAX and have found it easier to use than PyTorch for almost everything. If you haven't already, I would give JAX a good try.
    • melling 251 days ago
      They used to say the same thing about Perl and Python

      Downvoted. Hmmm. I’m a little tired so I don’t want to go into detail. However, I was a Perl programmer when Python was rising. So, needless to say, having a big lead doesn’t matter.

      Please learn from history. A big lead means nothing.

      • ein0p 251 days ago
        It’s been years and Jax is just where it was, no growth whatsoever. And that’s with all of Google forced internally to use only Jax. Look, I like the technical side of Jax for the most part, but it’s years too late to the party and it’s harder to use than PyTorch. It just isn’t going to ever take off at this point.
      • pjmlp 247 days ago
        Had the whole Parrot VM and Perl 6 distraction never happened, and I doubt Python would have taken over on UNIX scripting part.

        So a good lesson is not to get distracted by foolish endevours.

    • deisteve 251 days ago
      lot of contrarian takes are popular but rarely implemented in reality
  • pjmlp 250 days ago
    The best thing about PyTorch is that we aren't stuck with Python, rather we can enjoy it alongside proper performance, by using the Java and C++ API surfaces instead.
  • FL33TW00D 250 days ago
    Are modern NN's really just static functions, and are they going to continue to be in the future?

    KV caching is directly in conflict with a purely functional approach.

    • spott 250 days ago
      Is it? The same inputs have the same outputs, the only thing different is the performance.

      It is basically the same as memorization.

    • visarga 250 days ago
      You could sen the KV cache and even the LoRA as input args to a static model.
  • astromaniak 249 days ago
    The article misses multi-modal thing. Which is the future. Sure they can be considered a separate things, like today. But that's probably not the best approach. Support from framework may include partial training, easy components swap, intermediate data caching, dynamic architecture, automatic work balance and scaling.
  • lostmsu 250 days ago
    From the article is seems that JAX is a non-starter for me as they don't have support for any kind of acceleration on Windows proper, and only experimental in WSL.
  • auggierose 250 days ago
    So if multi-backend is doomed, is tinygrad then doomed, too?
    • n7g 250 days ago
      I haven't used TinyGrad but I'm not really sure what its goal is. To be the best autograd framework? to be a minimal one?

      I'm glad they've removed the (rather arbitrary, and admittedly stupid) loc cap. And from the little I know, geohot is focusing on having its own internal compiler stack.

      As much as I admire geohot, I don't think rolling your own compiler is the best way. Its not that the TinyGrad team isn't smart enough, but a compiler is a huge undertaking and that you have to support and maintain for a long time. I'm sure he's well aware of this, but no big labs would touch TG seriously because of this limitation.

      XLA on the other hand is under governance seperate from Google, and is far more mature - so people trust that.

      That said, I don't know much about Tinygrad so I would appreciate if someone more knowledgable can jump in here and outline the differences and key features ¯\_(ツ)_/¯

  • hprotagonist 251 days ago
    > I believe that all infrastructure built on Torch is just a huge pile of technical debt, that will haunt the field for a long, long time.

    ... from the company that pioneered the approach with tensorflow. I've worked with worse ML frameworks, but they're by now pretty obscure; i cannot remember (and i am very happy about it) the last time i saw MXNet in the wild, for example. You'll still find Caffe on some embedded systems, but you can mostly sidestep it.

  • BaculumMeumEst 251 days ago
    Jax is well designed? That's nice. The only thing that matters is adoption. You can make run this title when Jax's adoption surpasses PyTorch. How does someone using _python_ not understand this?
  • sva_ 251 days ago
    > I’ve personally known researchers who set the seeds in the wrong file at the wrong place and they weren’t even used by torch at all - instead, were just silently ignored, thus invalidating all their experiments. (That researcher was me)

    Some assert-ing won't hurt you. Seriously. It might even help keeping your sanity.

  • cs702 251 days ago
    A more accurate title for the OP would be "I hope and wish PyTorch were dead, so Jax could become the standard."

    Leaving aside the fact that PyTorch's ecosystem is 10x to 100x larger, depending on how one measures it, PyTorch's biggest advantage, in my experience, is that it can be picked up quickly by developers who are new to it. Jax, despite its superiority, or maybe because of it, can not be picked up quickly.

    Equinox does a great job of making Jax accessible, but Jax's functional approach is in practice more difficult to learn than PyTorch's object-oriented one.

  • munchler 251 days ago
    This is such a hyperbolic headline it's hard to be interested in reading the actual article.
    • dang 250 days ago
      We've unhyperbolized it via the subtitle.
      • sundarurfriend 250 days ago
        I think that change was for the worse. (Just posted a comment asking for the original title: https://news.ycombinator.com/item?id=41275918)

        The subtitle doesn't convey the content of the article nearly as well as the title does. Perhaps you can take a sentence like "PyTorch has been a net negative for scientific computing efforts" which the article does say, or some toned down versino of the original title, but the current title makes it sound like a very different article and felt like clickbait to me.

        • dang 248 days ago
          Those are good points. Probably too late to make a difference now, but I hear you.
    • crancher 251 days ago
      All blanket statements are false.
    • bugglebeetle 251 days ago
      [flagged]
  • marcinzm 251 days ago
    My main reason to avoid Jax is Google. Google doesn't provide good support even for things you pay them for. They do things because they want to, to get their internal promotions, irrespective of their customers or the impact on them.
  • casualscience 251 days ago
    > Multi-backend is doomed

    Finally! Someone says it! This is why the C programming language will never have wide adoption. /s

  • 0cf8612b2e1e 251 days ago
    As we all know, the technically best solution always wins.
  • Hizonner 251 days ago
    [flagged]
  • manlobster 251 days ago
    [flagged]