While `uv` works amazingly well I think a lot of people don't realize that installing packages through conda (or let's say the conda-forge ecosystem) has technical advantages compared to wheels/pypi.
When you install the numpy wheel through `uv` you are likely installing a pre-compiled binary that bundles openblas inside of it. When you install numpy through conda-forge, it dynamically links against a dummy blas package that can be substituted for mkl, openblas, accelerate, whatever you prefer on your system. It's a much better solution to be able to rely on a separate package rather than having to bundle every dependency.
Then lets say you install scipy. Scipy also has to bundle openblas in their wheel, and now you have two copies of openblas sitting around. They don't conflict, but this quickly becomes an odd thing to have to do.
In this sense I personally prefer pixi because of this. It is pixi like but resolves using conda channels like conda, and similar to conda it supports PyPI packages via uv.
With a background in scientific computing where many of the dependencies I managed are compiled, conda packages gives me much more control.
P.S. I’d like to point out to others to differentiate between package index and package managers. PyPI is an index (that hosts packages in a predefined format) while pip, poetry, uv are package managers that resolve and build your environments using the index.
Similarly but a bit more confusingly, conda can be understood as the index, hosted by anaconda but can also be hosted elsewhere, with different “channels” (kinda like a GitHub organization) where conda-forge is a popular one built by communities. Conda is also a reference implementation of a package manager that uses anaconda channels to resolve. Mamba is an independent, performant, drop in replacement of conda. And pixi is a different one with a different interface by the author of mamba.
Even more confusingly, there are distributions. Distributions come with a set of predefined packages together with the package manager such that you just start running things immediately (sort of like a TeXLive distribution in relation to the package manager tlmgr.) there are anaconda distributions (if you installed anaconda instead of installing conda, that’s what you get), but also Intels distribution for Python, mini forge, mambaforge, etc.
My guess is that the difference is more that PyPI intends to be a Python package repository, and thus I don’t think you can just upload say a binary copy of MKL without accompanying Python code. It’s originally a source-based repository with binary wheels being an afterthought. (I still remember the pre-wheel nightmare `pip install numpy` used to give, when it required compiling the C/C++/Fortran pieces which often failed and was often hard to debug…)
But Anaconda and CondaForge are general package repository, they are not Python-specific but are happy to be used for R, Julia, C/C++/Fortran binaries, etc. it’s primarily a binary-based repository. For example, you can `conda install python` but you can’t `pip install python`.
I don’t know if there is any technical barrier or just a philosophical barrier. Clearly, Pip handles binary blobs inside of Python packages fine, so I would guess the latter but am happy to be corrected :).
The problem conda solved that nothing had solved before was installing binary dependencies on MS Windows.
Before conda, getting a usable scipy install up and running on MS Windows was a harrowing experience. And having two independent installations was basically impossible.
The real hard work that went into conda was reverse engineering all the nooks and crannies of the DLL loading heuristics, to allow it to ensure that you loaded what you intended.
If you are working on macOS and deploying to some *nix in the cloud, you are unlikely to find any value in this. But in ten years as lead on a large tool that was deployed to personal (Windows) laptops in a corporate environment, I did not find anything that beat conda.
Oh right, recently I started learning classic ML and “just” tried to install tensorflow, which, itself or through one of dependencies, stopped providing windows binaries since x.y.z and so my python has to be downgraded to 3.a and then other dependencies stop installing. Eventually I managed to find a proper version intersection for everything together with some shady repo, but it felt like one more requirement and I’ll get overconstrained.
You said it. I was working with an official Google library that used TF and it didn’t work at all with 3.12. I spent a day building the wheels for 3.12 only to find there was a bug with dataclasses. :|
I can’t recall the library, but there was another major project that just deprecated TF because it was the cause of so many build problems.
As someone with admittedly no formal CS education, I've been using conda for all of my grad school and never managed to break it.
I create a virtual environment for every project. I install almost all packages with pip, except for any binaries or CUDA related things from conda. I always exported the conda yaml file and managed to reproduce the code/environment including the Python version. I've seen a lot of posts over time praising poetry and other tools and complaining about conda but I could never relate to any of them.
My experience with conda is that its fine if you're the original author of whatever you're using it for and never share it with anyone else. But as a professional I usually have to pull in someone else's work and make it function on a completely different machine/environment. I've only had negative experiences with conda for that reason. IME the hard job of package management is not getting software to work in one location, but allowing that software to be moved somewhere else and used in the same way. Poetry solves that problem, conda doesn't.
Poetry isn't perfect, but it's working in an imperfect universe and at least gets the basics (lockfiles) correct to where packages can be semi-reproducible.
There's another rant to be had at the very existence of venvs as part of the solution, but that's neither poetry or anaconda's fault.
It is entirely possible to use poetry to determine the precise set of packages to install and write a requirements.txt, and then shotgun install those packages in parallel. I used a stupidly simple fish shell for loop that ran every requirements line as a pip install with an “&” to background the job and a “wait” after the loop. (iirc) Could use xargs or parallel too.
This is possible at least. Maybe it breaks in some circumstances but I haven’t hit it.
Not as an excuse for bad behavior but rather to consider infrastructure and expectations:
The packages might be cached locally.
There might be many servers – a CDN and/or mirrors.
Each server might have connection limits.
(The machine downloading the packages miiiiiight be able to serve as a mirror for others.)
If these are true, then it’s altruistically self-interested for everyone that the downloader gets all the packages as quickly as possible to be able to get stuff done.
I don’t know if they are true. I’d hope that local caching, CDNs and mirrors as well as reasonable connection limits were a self-evident and obviously minimal requirement for package distribution in something as arguably nation-sized as Python.
How is uv so much faster? My understanding is Poetry is slow sometimes because PyPi doesn't have all the metadata required to solve things, so it needs to download packages and then figure it out.
Can you recommend any good article / comparison of uv vs poetry vs conda?
We've used different combinations of pipx+lockfiles or poetry, which has been so far OK'ish. But recently discovered uv and are wondering about existing experience so far across the industry.
From my experience, uv is way better and it's also PEP compliant in terms of pyproject.toml. Which means in cas uv isn't a big player in the future, migrating away isn't too difficult.
At the same time, poetry still uses a custom format and is pretty slow.
+1. On top of that, even with the new resolver it still takes ages to resolve a dependency for me, so somethimes I end up just using pip directly. Not sure if I am doing something wrong(mb you have to manually tweak something in the configs?) but it's pretty common for me to experience this
Like sibling comments,
after using poetry for years (and pipx for tools), I tried uv a few months ago
I was so amazed of the speed, I moved all my projects to uv and have not yet looked back.
uv replaces all of pip, pipx and poetry for me, I does not do more than these tools, but it does it right and fast.
If you're at liberty to try uv, you should try it someday, you might like it.
(nothing wrong with staying with poetry or pyenv though, they get the job done)
I believe the problem is the lack of proper dependency indexing at PyPI. The SAT solvers used by poetry or pdm or uv often have to download multiple versions of the same dependencies to find a solution.
imagine being a beginner to programming and being told "use venvs"
or worse, imagine being a longtime user of shells but not python and then being presented a venv as a solution to the problem that for some reason python doesn't stash deps in a subdirectory of your project
You don't need to stash deps in a subdirectory, IMHO that's a node.js design flaw that leads to tons of duplication. I don't think there's any other package manager for a popular language that works like this by default (Bundlers does allow you to version dependencies which can be useful for deployment, but you still only ever get one version of any dependency unlike node).
You just need to have some sort of wrapper/program that knows how to figure out which dependencies to use for a project. With bundler, you just wrap everything in "bundle exec" (or use binstubs).
What was unique to node.js was the decision to not only store the dependencies in a sub-folder, but also to apply that rule, recursively, for every one of the projects you add as a dependency.
There are many dependency managers that use a project-local flat storage, and a global storage was really frowned upon until immutable versions and reliable identifiers became popular some 10 years ago.
Oh that's nice. When I last looked (quite a long time ago), local::lib seemed to be the recommended way, and that seemed a bit more fiddly than python's virtualenv.
Carton uses local::lib under the covers. I found local::lib far less fiddly than virtualenv myself, but it just doesn't try to do as much as virtualenv. These days I do PHP for a living, and for all the awfulness in php, they did nail it with composer.
julia just store the analogue
of a requirements.txt (Project.toml) and the lock file (Manifest.toml). And has its own package issues including packages regularly breaking for every minor release (although i enjoy the language and will keep using it)
All those came after Python/C/C++ etc which were all from the wild-west of the "what is package management?" dark ages. The designers of those languages almost certainly thought the exact thought of "how can we do package management better than existing technology like pip?"
I have imagined this, because I've worked on products where our first time user had never used a CLI tool or REPL before. It's a nightmare. That said, it's no less a nightmare than every other CLI tool, because even our most basic conventions are tribal knowledge that are not taught outside of our communities and it's always an uphill battle teaching ones that may be unfamiliar to someone from a different tribe.
It is true that every field (honestly every corner of most fields) has certain specific knowledge that is both incredibly necessary to get anything done, and completely arbitrary. These are usually historical reactions to problems solved in a very particular way usually without a lot of thought, simply because it was an expedient option at the time.
I feel like venv is one such solution. A workaround that doesn’t solve the problem at its root, so much as make the symptoms manageable. But there is (at least for me) a big difference between things like that and the cool ideas that underlie shell tooling like Unix pipes. Things like jq or fzf are awesome examples of new tooling that fit beautifully in the existing paradigm but make it more powerful and useful.
Beginners in Python typically don't need venvs. They can just install a few libraries (or no libraries even) to get started. If you truly need venvs then you're either past the initial learning phase or you're learning how to run Python apps instead of learning Python itself.
For some libraries, it is not acceptable to stash the dependencies for every single toy app you use. I don't know how much space TensorFlow or PyQt use but I'm guessing most people don't want to install those in many venvs.
i remember reading somewhere (on twitter iirc) an amateur sex survey statistician who decided she needed to use python to analyze her dataset, being guided toward setting up venvs pretty early on by her programmer friends and getting extremely frustrated.
Was it aella? I don't know of any other sex survey statisticians so I'm assuming you mean aella. She has a pretty funny thread here but no mention of venvs: (non-musk-link https://xcancel.com/Aella_Girl/status/1522633160483385345)
Every google for help I do is useless. Each page is full of terms I don't understand at *all*. They're like "Oh solving that error is simple, just take the library and shove it into the jenga package loader so you can execute the lab function with a pasta variation".
She probably would have been better off being pointed towards jupyter, but that's neither here nor there
Good grief there seems to be no getting away from that woman. One of my ex girlfriends was fascinated by her but to me she is quite boring. If she wasn't fairly attractive, nobody would care about her banal ramblings.
If you install a package in a fresh environment then it does actually get installed. It can be inherited from the global environment but I don't think disparate venvs that separately install a package actually share the package files. If they did, then a command executed in one tree could destroy the files in another tree. I have not done an investigation to look into this today but I think I'm right about this.
Python's venv design is not obviously unintelligent. It must work on all sorts of filesystems, which limits how many copies can be stored and how they can be associated. More advanced filesystems can support saving space explicitly for software that exploits them, and implicitly for everyone, but there is a cost to everything.
You are doing something right, author does some pretty unusual things:
- Setup custom kernels in Jupyter Notebook
- Hardlink the environments, then install same packages via pip in one and conda in others
- install conda inside conda (!!!) and enter nested environment
- Use tox within conda
I believe as long as you treat the environments as "cattle" (if it goes bad, remove it and re-create from yaml file), you should not have any problems. It's clearly not the case of for the post's author though.
Yep nuke the bad env and start over. Conda is great only problem are when a package is not available on conda forge or you have to compile and install with setup.py. But then you can blow the env away and start over.
As someone with a formal computer science, half of my friends who work in other sciences have asked me to help them fix their broken conda environments
This is exactly the kind of thing that causes python package nightmares. Pip is barely aware of packages it's installed itself, let alone packages from other package managers and especially other package repositories. Mixing conda and pip is 100% doing it wrong (not that there's an easy way to do it right, but stick to one or the other, I would generally recommend just using pip, the reasons for conda's existance are mostly irrelevant now)
I still run into cases where a pip install that fails due to some compile issue works fine via conda. It's still very relevant. It's pip that should be switched out for something like poetry.
poetry vs pip does very little for compilation-related install failures. Most likely the difference is whether you are getting a binary package or not, and conda's repository may have a binary package that pypi does not (but also vice-versa: nowadays pypi has decent binary packages, previously conda gained a lot of popularity because it had them while pypi generally did not, especially on windows). But the main badness comes from mixing them in the same environment (or mixing pypi packages with linux distribution packages, i.e. pip installing outside of a virtual environment).
(I do agree pip is still pretty lackluster, but the proposed replacements don't really get to the heart of the problem and seem to lack staying power. I'm in 'wait and see' mode on most of them)
`uv` is not a drop-in replacement for `conda` in the sense that `conda` also handles non-python dependencies, has its own distinct api server for packages, and has its own packaging yaml standard.
`pixi` basically covers `conda` while using the same solver as `uv` and is written in Rust like `uv`.
Now is it a good idea to have python's package management tool handle non-python packages? I think that's debateable.
I personally am in favor of a world where `uv` is simply the final python package management solution.
Pixi uses uv for resolving pypi deps:
https://prefix.dev/blog/uv_in_pixi
If you look closely, pixi used `resolvo to power `rip` then switched from a `rip` solver to a `uv` solver
Bookmarking. Thanks for sharing the link, looks like a great overview of that particular tragic landscape. :)
Also crossing fingers that uv ends up being the last one standing when the comprehensive amounts of dust here settle. But until then, I'll look into pixi, on the off chance it minimizes some of my workplace sorrows.
Same but I try to use conda to install everything first, and only use pip as a last resort. If pip only installs the package and no dependency it's fine
God forbid you should require conda-forge and more than three packages lest the dependency resolver take longer than the heat death of the planet to complete.
i think you got lucky and fell into best practices on your first go
> except for any binaries or CUDA related things from conda
doing the default thing with cuda related python packages used to often result in "fuck it, reinstall linux". admittedly, i dont know how it is now. i have one machine that runs python with a gpu and it runs only one python program.
> doing the default thing with cuda related python packages used to often result in "fuck it, reinstall linux"
From about 2014-17 you are correct, but it appears (on ubuntu at least), that it mostly works now. Maybe I've just gotten better at dealing with the pain though...
1. You need to run export manual while other tools you mentioned would create it automatically (the lock file)
2. Distinguishes between direct dependencies (packages you added yourself) and indirect dependencies (packages of the packages)
Really the issue is python itself, it shouldn't be treating it's installs and packages as something that's linked and intertwined to the base operating system.
People like to complain about node packages but never seen people have the trouble with them that they have with python.
What do you do though if you want to import code written in C++? Especially complex, dependency-heavy like CUDA/ML stuff?
You can just give up and say that "The proper way to do this is to use the Nvidia CUDA toolkit to write your cuda app in C++ and then invoke it as a separate process from node" [0]. That apparently works for node, but Python wants much more.
If you actually want to use high-performance native code in your slow compiled language, then no solution is going to be very good, that's because the problem is inherently hard.
You can rely on host OS as much as possible - if OS is known, provide binaries; if it's unknown, provide source code and hope user has C/C++/Rust/Fortran compilers to build it. That's what uv, pip, etc.. do.
You can create your own parallel OS, bringing your own copy of every math libray, as well as CUDA even if there are perfectly good versions installed on the system - that's what conda/minconda does.
You can implement as much as possible in your own language, so there is much less need to use "high-performance native language" - that's what Rust and Go do. Sadly, that's not an option for Python.
Can somebody please eli5 why it is so unanimously accepted that Python's package management is terrible? For personal projects venv + requirements.txt has never caused problems for me. For work projects we use poetry because of an assumption that we would need something better but I remain unconvinced (nothing was causing a problem for that decision to be made).
In your requirements.txt, do you pin the concrete versions or leave some leeway?
If you aren't precise, you're gonna get different versions of your dependencies on different machines. Oops.
Pinning concrete versions is of course better, but then there isn't a clear and easy way to upgrade all dependencies and check whether ci still passes.
You should use freeze files. Whatever language you are using, you should specify your dependencies on the loosest way possible, and use freeze files to pin them down.
The only difference from one language to another is that some make this mandatory, while in others it's only something that you should really do and there isn't any other real option you should consider.
For using packages, venv + requirements.txt works, but is a bit clunky and confusing. Virtual environments are very easy to break by moving them or by updating your OS (and getting a new Python with it). Poetry is one alternative, but there are far too many options and choices to make. For building packages, there are similarly many competing options with different qualities and issues.
- Anyone with dependencies on native/non python libraries.
Conda definitely helps with 2 and 3 above, and uv is at least a nice, fast API over pip (which is better since it started doing dependency checking and binary wheels).
More generally, lots of the issues come from the nature of python as a glue language over compiled libraries, which is a relatively harder problem in general.
There are no Windows-specific issues in venv + pip. Windows can be more painful if you need to compile C extensions, but you usually don't, since most commonly used packages have had binary wheels for Windows on PyPI for many years.
i think there might be merit to gdiamos's point that python is a popular language with a large number of users, and this might mean that python package management isn't unusually bad, but more users implies more complaints.
i think there was a significant step change improvement in python packaging around 2012, when the wheel format was introduced, which standardised distributing prebuilt platform-specific binary packages. for packages with gnarly native library dependencies / build toolchains (e.g. typical C/fortran numeric or scientific library wrapped in a layer of python bindings), once someone sets up a build server to bake wheels for target platforms, it becomes very easy to pip install them without dragging in that project's native build-from-source toolchain.
venv + pip (+ perhaps maintaining a stack of pre-built wheels for your target platform, for a commercial project where you want to be able to reproduce builds) gets most of the job done, and those ingredients have been in place for over 10 years.
around the time wheel was introduced, i was working at a company that shipped desktop software to windows machines, we used python for some of the application components. between venv + pip + wheels, it was OK.
where there were rough edges were things like: we have a dep on python wrapper library pywhatever, which requires a native library libwhatever.dll built from the c++ whatever project to be installed -- but libwhatever.dll has nothing to do with python, maybe its maintainers kindly provide an msi installer, so if you install it into a machine, it gets installed into the windows system folder, so venv isn't able to manage it & offer isolation if you need to install multiple versions for different projects / product lines, as venv only manages python packages, not arbitrary library dependencies from other ecosystems
but it's a bit much blame python for such difficulties: if you have a python library that has a native dependency on something that isnt a python package, you need to do something else to manage that dep. that's life. if you're trying to do it on windows, which doesn't have an O/S level package manager.. well, that's life.
Try building a package and you will get hundreds of little paper cuts. Need a different index for some packages? It will work with a cli "pip install -from-index", but pip will not let you add an index in a requirement.txt for... security reasons. That means, good luck trying to "enforce" the CUDA version of pytorch without using third party tooling. So you either hard code a direct link (breaks platform portability), as that will work, or give up trying to make your project installable with "Pip install " Or "python build". Remember, pytorch basically has no CUDA builds anymore in its pypi index and no way to get CUDA torch from there (but I think this might have changed recently? )
Oh, and if some package you are using has a bug or something that requires you to vendor it in your repo, well then good luck because again, PEP 508 does not support installing another package from a relative link. You either need to put all the code inside the same package, vendored dependency included, and do some weird stuff to make sure that the module you wanted to vendor is used first, or... you just have to use the broken package, again for some sort of security reasons apparently.
Again, all of that might even work when using pip from the cli, but good luck trying to make a requirements.txt or define dependencies in a standard way that is even slightly outside of a certain workflow.
I can sort of the the argument, if you really really need to lock down your dependencies to very specific version, which I don't recommend you do.
For development I use venv and pip, sometimes pyenv if I need a specific Python version. For production, I install Python packages with apt. The operating system can deal with upgrading minor library versions.
I really hate most other package managers, they are all to confusing and to hard to use. You need to remember to pull in library update, rebuild and release. Poetry sucks too, it's way to complicated to use.
The technical arguments against Python packages managers are completely valid, but when people bring up Maven, NPM or even Go as role models I check out. The ergonomics of those tools are worse than venv and pip. I also think that's why we put up with pip and venv, they are so much easier to use than the alternative (maybe excluding uv). If a project uses Poetry, I just know that I'm going to be spending half a day upgrading dependencies, because someone locked them down a year ago and there's now 15 security holes that needs to be plugged.
No, what Python needs is to pull in requests and a web framework into the standard library and then we can start build 50% of our projects without any dependencies at all. They could pull in Django, it only has two or three dependencies anyway.
venv + requirements.txt has worked for every single python project I made for the last 2 years (I'm new to python). Only issue I had was when using a newish python version and not having a specific library released yet for this new version, but downgrading python solved this.
Being new to the ecosystem I have no clue why people would use Conda and why it matters. I tried it, but was left bewildered, not understanding the benefits.
The big thing to realise is that when Conda first was released it was the only packaging solution that truly treated Windows as a first class citizen and for a long time was really the only way to easily install python packages on Windows. This got it a huge following in the scientific community where many people don't have a solid programming/computer background and generally still ran Windows on their desktops.
Conda also not only manages your python interpreter and python libraries, it manages your entire dependency chain down to the C level in a cross platform way. If a python library is a wrapper around a C library then pip generally won't also install the C library, Conda (often) will. If you have two different projects that need two different versions of GDAL or one needs OpenBLAS and one that needs MKL, or two different versions of CUDA then Conda (attempts to) solve that in a way that transparently works on Windows, Linux and MacOS. Using venv + requirements.txt you're out of luck and will have to fall back on doing everything in its own docker container.
Conda lets you mix private and public repos as well as mirroring public packages on-perm in a transparent way much smoother than pip, and has tools for things like audit logging, find grained access control, package signing and centralised controls and policy management.
Conda also has support for managing multi-language projects. Does your python project need nodejs installed to build the front-end? Conda can also manage your nodejs install. Using R for some statistical analysis in some part of your data pipeline? Conda will mange your R install. Using a Java library for something? Conda will make sure everybody has the right version of Java installed.
Also, it at least used to be common for people writing numeric and scientific libraries to release Conda packages first and then only eventually publish on PyPi once the library was 'done' (which could very well be never). So if you wanted the latest cutting edge packages in many fields you needed Conda.
Now there are obviously a huge class a projects where none of these features are needed and mean nothing. If you don't need Conda, then Conda is no longer the best answer. But there are still a lot of niche things Conda still does better than any other tool.
> it manages your entire dependency chain down to the C level in a cross platform way.
I love conda, but this isn't true. You need to opt-in to a bunch of optional compiler flags to get a portable yml file, and then it can often fail on different OS's/versions anyway.
I haven't done too much of this since 2021 (gave up and used containers instead) but it was a nightmare getting windows/mac builds to work correctly with conda back then.
Conda used to be a life saver when years and years ago, compiled extensions were hard to install because you had to compile them yourself.
Nowadays, thanks to wheels being numerous and robust, the appeal of anaconda is disappearing for most users except for some exotic mixes.
conda itself now causes more trouble than it solves as it's slow, and lives in its own incompatible world.
But anaconda solves a different problem now that nobody else solves, and that's managing Python for big corporation. This is worth a lot of money to big structures that need to control packages origin, permissions, updates, and so on, at scale.
Nothing in the "article" seems to support the title. A lot of it is just about Python packaging in general, or about problems when mixing conda- and pip-installed packages.
In my experience conda is enormously superior to the standard Python packaging tools.
If we're doing anecdotal evidence, then mine is that conda is by far the worst of the main Python packaging solutions in use. The absurd slowness and incompatibility with the entire rest of the Python world are only the visible tip of that iceberg. To the best of my ability to tell, conda largely exists to make up for endemic deficiencies in Windows software distribution toolchains (not Python specific) and sadly it's not even good at that either.
Mind you, glad it works for you. Warms my grey heart to know there's some balance in this universe. :)
I think Pixi mostly solves the main issues of conda by forcing users to have project-specific environments. It also solves environments incredibly fast, so it’s really quick to create new projects/environments. https://pixi.sh/
Conda is the only package manager I've used on Ubuntu that intermittently and inexplicably gets stuck when installing or uninstalling. It will sometimes resolve itself if left alone for hours, but often won't.
It's because of the SAT solver for dependencies. Unlike Pip, it keeps track of every package you installed and goes out of its way to avoid installing incompatible packages.
Why go through all this trouble? Because originally it was meant to be a basic "scientific Python" distribution, and needed to be strict around what's installed for reproducibility reasons.
It's IMO overkill for most users, and I suspect most scientific users don't care either - most of the time I see grads and researchers just say "fuck it" and use Pip whenever Conda refuses to get done in a timely fashion.
And the ones who do care about reproducibility are using R anyway, since there's a perception those libraries are "more correct" (read: more faithful to the original publication) than Pythonland. And TBH I can't blame them when the poster child of it is Sklearn's RandomForestRegressor not even being correctly named - it's bagged trees under the default settings, and you don't get any indication of this unless you look at that specific kwarg in the docs.
Personally, I use Conda not for reproducibility, but so all of my projects have independent environments without having to mess with containers
> And the ones who do care about reproducibility are using R anyway
I worked in a pharma company with lots of R code and this comment is bringing up some PTSD. One time we spent weeks trying to recreate an "environment" to reproduce a set of results. Try installing a specific version of a package, and all the dependencies it pulls in are the latest version, whether or not they are compatible. Nobody actually records the package versions they used.
The R community are only now realising that reproducible environments are a good thing, and not everybody simply wants the latest version of a package. Packrat was a disaster, renv is slightly better.
> Personally, I use Conda not for reproducibility, but so all of my projects have independent environments without having to mess with containers
A perfectly reasonable goal, yup! Thankfully not one that, in fact, requires conda. Automated per-project environments are increasingly the default way of doing things in Python, thank goodness. It's been a long time coming.
As far as the solver is concerned, there should be no difference as it has been streamed. But I personally can’t see a reason to go back, as mamba is supposed to be a drop in replacement of conda. I default to use mamba and switch to conda only when necessary. There are some cases mamba can’t handle correctly, such as the case where you want to roll back to an earlier revision: https://github.com/mamba-org/mamba/issues/803
I feel like a major selling point of Nix is "solving the Python dependency-hell problem" (as well as that of pretty much every other stack)
I've seen so many issues with different Python venvs from different Python project directories stepping on each others' dependencies somehow (probably because there are some global ones) that the fact that I can now just stick a basic and barely-modified-per-project Python flake.nix file in each one and be always guaranteed to have the entirely of the same dependencies available when I run it 6 months later is a win.
This seems to be an aggregation of some posts on python-list. Basically, extra-random opinions.
I'll offer mine: I won't say that Python packaging is generally excellent, but it's gotten much better over the years. The pyproject.toml is a godsend, there's the venv module built-in to Python, pip will by default no longer install package outside of a venv. Dependency groups are being added, meaning that the requirements.txt files can also be specified in the project.toml. Documentation is pretty good, especially if you avoid blog posts from 5+ years ago.
People here focus on Python, but to me, a bioinformatician, conda is much more, it provides 99.99% of the tools I need. Like bwa, samtools, rsem, salmon, fastqc, R. And many, many obscure tools.
I wish you luck with tracking down versions of software used when you're writing papers... especially if you're using multiple conda environments. This is pretty much the example used in the article -- version mismatches.
But, I think this illustrates the problem very well.
Conda isn't just used for Python. It's used for general tools and libraries that Python scripts depend on. They could be C/C++ that needs to be compiled. It could be a Cython library. It could be...
When you're trying to be a package manager that operates on-top of the operating system's package manager, you're always going to have issues. And that is why Conda is such a mess, it's trying to do too much. Installation issues are one of the reason why I stopped writing so many projects in Python. For now, I'm only doing smaller scripts in Python. Anything larger than a module gets written in something else.
People here have mentioned Rust as an example of a language with a solid dependency toolchain. I've used more Go, which similarly has had dependency management tooling from the begining. By and large, these languages aren't trying to bring in C libraries that need to be compiled and linked into Python accessible code (it's probably possible, but not the main use-case).
For Python code though, when I do need to import a package, I always start with a fresh venv virtual environment, install whatever libraries are needed in that venv, and then always run the python from that absolute path (ex: `venv/bin/python3 script.py`). This has solved 99% of my dependency issues. If you can separate yourself from the system python as much as possible, you're 90% of the way there.
Side rant: Which, is why I think there is a problem with Python to begin with -- *nix OSes all include a system level Python install. Dependencies only become a problem when you're installing libraries in a global path. If you can have separate dependency trees for individual projects, you're largely safe. It's not very storage efficient, but that's a different issue.
> I wish you luck with tracking down versions of software used when you're writing papers... especially if you're using multiple conda environments.
How would you do this otherwise? I find `conda list` to be terribly helpful.
As a tool developer for bioinformaticians, I can't imagine trying to work with OS package managers, so that would leave vendoring multiple languages and libraries in a home-grown scheme slightly worse and more brittle than conda.
I also don't think it's realistic to imagine that any single language (and thus language-specific build tools or pkg manager) is sufficient. Since we're still using fortran deep in the guts of many higher level libraries (recent tensor stuff is disrupting this a bit, but it's not like openBLAS isn't still there as a default backend).
> home-grown scheme slightly worse and more brittle than conda
I think you might be surprised as to how long this has been going on (or maybe you already know...). When I started with HPC and bioinformatics, Modules were already well established as a mechanism for keeping track of versioning and multiple libraries and tools. And this was over 20 years ago.
The trick to all of this is to be meticulous in how data and programs are organized. If you're organized, then all of the tracking and trails are easy. It's just soooo easy to be disorganized. This is especially true with non-devs who are trying to use a Conda installed tool. You certainly can be organized and use Conda, but more often than not, for me, tools published with Conda have been a $WORKSFORME situation. If it works, great. If it doesn't... well, good luck trying to figure out what went wrong.
I generally try to keep my dependency trees light and if I need to install a tool, I'll manually install the version I need. If I need multiple versions, modules are still a thing. I generally am hesitant to trust most academic code and pipelines, so blindly installing with Conda is usually my last resort.
I'm far more comfortable with Docker-ized pipelines though. At least then you know when the dev says $WORKSFORME, it will also $WORKFORYOU.
Besides the horrendous formatting, some stuff in this article seem incorrect or irrelevant. Like, is this even possible?
> A single Anaconda distribution may have multiple NumPy versions installed at the same time, although only one will be available to the Python process (note that this means that sub-processes created in this Python process won’t necessarily have the same version of NumPy!).
I’m pretty sure there’s not, but maybe there is some insane way to cause subprocesses to do this. Besides that, under the authors definition, different Python virtualenvs also install multiple copies of libraries in the same way conda does.
The comments about Jupyter also seem very confused. It’s hard to make heads or tails of exactly what the author is saying. There might be some misunderstandings of how Jupyter kernels select environments.
> Final warning: no matter how ridiculous this is: the current directory in Python is added to the module lookup path, and it precedes every other lookup location. If, accidentally, you placed a numpy.py in the current directory of your Python process – that is going to be the numpy module you import.
Conda: a package manager disaster that became paid license required for companies over 200 employees. It worked 5 years ago, we can no longer legally use it
I honestly have no idea why anyone still uses Conda, it's a right pain in the ass. Python package management in general is a nightmare, but whenever I run up a project that uses Conda I immediately disregard it and use uv / pyenv.
conda was for scientific python, but had to solve for everything below python to make that work. There was no generic binary solution before python for multiple architectures and operating systems.
I tried Conda a number of time over the years, regretted it every time.
These days, when I absolutely have to use it because some obscure piece of software can't run unless Conda, I install it in a VM so that:
- I protect my working system from the damage of installing Conda on it
- I can throw the whole garbage fire away without long term brain damage to my system once I'm done
> The traditional setup.py install command may install multiple versions of the same package into the same directory
Wait, what? In what situation would that ever happen? Especially given the directories for packages are not versioned, so setuptools should never do two different versions in any way.
I think Python had a pretty good idea in standardizing a packaging protocol and then allowing competing implementations, but I would have preferred a single "blessed" solution. More than one package management option in an ecosystem always adds some kind of "can't get there from here" friction and an additional maintenance burden on package maintainers.
poetry has been working well enough for me as of late, but it'd be nice if I didn't have to pick.
It's rare to see something as systematically broken as Python package/dependencies ecosystem.
What I don't understand - what makes this so difficult to solve in Python? It seems that many other platforms solved this a long time ago - maven 2.0 was released almost 20 years ago. While it wasn't / isn't by no means perfect, its fundamentals were decent already back then.
One thing which I think messed this up from the beginning was applying the Unix philosophy with several/many individual tools as opposed to one cohesive system - requirements.txt, setuptools, pip, pipx, pipenv, venv... were always woefully inadequate, but produced a myriad of possible combinations to support. It seems like simplicity was the main motivation for such design, but these certainly seems like examples of being too simplistic for the job.
I recently tried to run a Python app (after having a couple of years break from Python) which used conda and I got lost there quickly. Project README described using conda, mamba, anaconda, conda-forge, mini-forge, mini-conda ... In the end, nothing I tried worked.
> what makes this so difficult to solve in Python?
Python creates the perfect storm for package management hell:
- Most the valuable libraries are natively compiled (so you get all the fun of distributing binaries for every platform without any of the traditional benefits of native compilation)
- The dynamic nature makes it challenging to understand the non-local impacts of changes without a full integration test suite (library developers break each other all the time without realizing it, semantic versioning is a farce)
- Too many fractured packaging solutions, not a single one well designed. And they all conflict.
- A bifurcated culture of interactive use vs production code - while they both ostensibly use the same language, they have wildly different sub-cultures and best practices.
- Churn: a culture that largely disavows strong backwards compatibility guarantees, in favor of the "move fast and break things" approach. (Consequence: you have to move fast too just to keep up with all the breakage)
- A culture that values ease of use above simplicity of implementation. Python developers would rather save 1 line of code in the moment, even if it pushes the complexity off to another part of the system. The quite obvious consequence is an ever-growing backlog of complexity.
Some of the issues are technical. But I'd argue that the final bullet is why all of the above problems are getting worse, not better.
> Too many fractured packaging solutions, not a single one well designed. And they all conflict.
100% this.
Last 4 years, one of the most frustrating parts of SWE that I need to deal with on a daily basis is packaging data science & machine learning applications and APIs in Python.
Maybe this is a very mid-solution, but one solution that I found was to use dockerized local environments with all dependencies pinned via Poetry [1]. The start setup is not easy, but now using some other Make file, it's something that I take only 4 hours with a DS to explain and run together and save tons of hours of in debugging and dependency conflict.
> Python developers would rather save 1 line of code in the moment, even if it pushes the complexity off to another part of the system.
Sounds odd to me in several projects that I worked on that folks bring the entire dependency on Scikit-Learn due to the train_test_split function [2] because the team thought that it would be simpler and easier to write a function that splits the dataset.
I'm trying to do the same but with uv instead of poetry. So far so good, and it helps that for me delivering as a docker container is a requirement, but I have no idea what's going to happen if I need to run "real" ML stuff. (Just doing a lot of plotting so far.)
I agree with all of these and it makes me wonder as I do from time to time,
has anyone managed to make a viable P#, a clean break which retains most of what most people love about the language and environment; and cheerfully asserts new and immutable change in things like <the technical parts of the above>.
When I have looked into this it seems people can't help but improve one-more-thing or one-other-thing and end up just enjoying vaguely-pythonic language design.
IronPython? The problem with that is compatibility with, and easy access to, existing libraries which is the main reason to use Python in the first place.
I also think some of the criticisms in the GP comment are not accurate. most of the valuable libraries are native compiled? Some important ones are, but not all.
I think a lot of the problem is that Python's usage has changed. Its great for a wide range of uses (scripting, web apps and other server stuff, even GUIs) but its really not a great match for scientific computing and the like but has become widely used there because it is easy to learn (and has lots of libraries for that now!).
The problem is that Python refuses to take responsibility for the whole ecosystem. One of the biggest success stories in programming language development has been Rust's realization that all of it matters: language, version management, package management, and build tools. To have a truly outstanding experience you need to take responsibility for the whole ecosystem. Python and many other older languages just focus on one part of the ecosystem, while letting others take care of different parts.
If Python leadership had true visionaries they would sit down, analyze every publicly available Python project and build a single set of tools that could gradually and seamlessly replace the existing clusterfuck.
Python developers will pretend the language is all about simplicity and then hand you over to the most deranged ecosystem imaginable. It sure is easy to pretend that you have a really simple ecosystem when you cover your eyes and focus on a small segment of the overall experience.
You can kind of see this in golang. Originally it came with stuff to download dependencies, but it had major issues with more complex projects and some community-made tools became popular instead. But it meant that multiple tools were used in different places and it was kind of a mess. Later on a new system was done in the default toolchain and even though it has problems it's good enough that it's now surprising for somebody to use non-default tools.
I don't know, but are we going to pretend that it would be particularly difficult to get funding for drastically simplifying and improving the tooling for one of the world's most popular programming languages?
I'm not sure how Rust is doing it, but the problem is hardly insurmountable.
> What I don't understand - what makes this so difficult to solve in Python?
I think there are many answers to this, and there are many factors contributing to it, but if I had to pick one: The setup.py file. It needs to be executed to determine the dependencies of a project. Since it's a script, that allows any maintainer of any package you are using to do arbitrarily complex/dumb stuff in it like e.g. conditionally adding dependencies based on host system specific environment markers, or introduce assumptions on the environment it is being installed to. That makes trying to achieve all the things you'f want from a modern package manager so much harder.
This also means that the problem isn't just concentrated in 1-2 central package management projects, but scattered throughout the ecosystem (and some of the worst offenders are some of Python's most popular sub-ecosystems).
There is some light with the introduction of the pyproject.toml, and now uv as a tool taking advantage of it.
setup.py allowed arbitrary things, but at least it always went through setuptools (or closely related predecessors, such as distribute or distlib). There is now pyproject.toml, but at the same time, there are tons of build backends that can do different things. And one of the most popular modern packaging tools, poetry, uses a non-standard section for the package data.
I think at least part of it is that there are so many solutions for Python packaging, which are often intermixed or only half-supported by developers. It's a tough ask to provide dedicated support for pip, conda, poetry and what else is there plus a couple different ways to create virtual environments. Of course if you do everything right, you set it up once (if even that) and it just keeps working forever, but it is never like that. Someone will use a tool you haven't and it will not work correctly and they will find a workaround and the mess starts.
Also I think that Python packages are sometimes distributed as shared libraries is a problem. When I think about conan or vcpkg (package managers for C and C++), they usually suck because some dependencies are available on some platforms and not on others or even in one version on one platform and in another version on another and you get messes all around if you need to support multiple platforms.
I think generally binary package managers are almost always bad* and source based package managers almost always work well (I think those are essentially easy mode).
*: unless they maintain a source package of their own that they actually support and have a fixed set of well-supported platforms (like system package managers on most Linux distros do).
The problem is a lot of Python source is actually a C/C++ file, so simply having "source based package manager for Python" is very annoying, as you'd have to manage your C/C++ sources with some other mechanisms.
This is exactly the reason I've moved from pip to conda for some projects: "pip" was acting a source-based package manager, and thus asking for C tools, libraries and dev headers to be installed - but not providing them as they were non-Python and thus declared out of scope. Especially on older Linux distributions, getting dependencies right can be quite a task.
This used to be a big headache for me, especially having developers on Windows but deployment targets in Linux, but a lot of the libraries I commonly use these days are either pure python or ship wheels for the platforms I use.
Were your issues recent or from several years ago?
The issues were recent (as of few months ago), but the OS's were pretty old - Ubuntu 20.04 and even 18.04. Those are still officially supported with Ubuntu Pro (free for individuals), but have ancient libraries and Python versions.
1. A good python solution needs to support native extensions. Few other languages solve this well, especially across unix + windows.
2. Python itself does not have package manager included.
I am not sure solving 2 alone is enough, because it will be hard to fix 1 then. And ofc 2 would needs to have solution for older python versions.
My guess is that we're stuck in a local maximum for a while, with uv looking like a decent contender.
PHP and composer do. You can specify native extensions in the composer.json file, along with an optional version requirement, and install them using composer just fine. Dependencies can in turn depend on specific extensions, or just recommend them without mandating an installation.
This works across UNIX and Windows, as far as I’m aware.
Is that a new feature? Pretty sure it didn't a few years ago. If the thing I need needed the libfoo C library then I first had to install libfoo on my computer using apt/brew/etc. If a new version of the PHP extension comes out that uses libfoo 2.0, then it was up to me to update libfoo first. There was no way for composer to install and manage libfoo.
> Php-yaml can be installed using PHP's PECL package manager. This extension requires the LibYAML C library version 0.1.0 or higher to be installed.
$ sudo apt-get install libyaml-dev
This is basically how "pip" works, and while it's fine for basic stuff, it gets pretty bad if you want to install fancy numerical of cryptography package on a LTS linux system that's at the end of the support period.
I am guessing that PHP might simply have less need for native packages, being more web-oriented.
Nix solves it for me. Takes a bit more effort upfront, but the payoff is "Python dependency determinism," which is pretty much unachievable in any other way, so...
The answer is not Yet Another Tool In The Chain. Python community itself needs to address this. Because if they don’t then you’ll have requirements.txt, setuptools, pyproject, pip, pipx, pipenv, pyenv, venv, nix.
Agreed. Often there's a quite tight coupling between the core platform devs and package management - node.js has its npm, rust cargo, go has one as well and for the most part it seems to have worked out fine for them. Java and .NET (and I think PHP) are different in the sense that the package management systems have no relation to the platform developers, but industry standards (maven, gradle, NuGET, Composer) still appeared and are widely accepted.
But with Python it seems completely fractured - everyone tries to solve it their own way, with nothing becoming a truly widely used solution. More involvement from the Python project could make a difference. From my perspective, this mess is currently Python's biggest problem and should be prioritized accordingly.
Nix isn't 'yet another tool in the chain'; Nix demands to run the whole show, and in the Nix world native dependencies in all programming language are first class citizens that the ecosystem is already committed to handling.
> Python community itself needs to address this.
The Python community can't address it, really, because that would make the Python community responsible for a general-purpose package management system not at all limited to Python, but including packages written in C, C++, and Rust to start, and also Fortran, maybe Haskell and Go, too.
The only role the Python community can realistically play in such a solution is making Python packages well-behaved (i.e., no more arbitrary code at build time or install time) and standardizing a source format rich with metadata about all dependencies (including non-Python dependencies). There seems to be some interest in this in the Python community, but not much.
The truth, perhaps bitter, is that for languages whose most important packages all have dependencies foreign to the ecosystem, the only sane package management strategy is slotting yourself into polyglot software distributions like Nix, Guix, Spack, Conda, Pkgsrc, MacPorts, MSYS2, your favorite Linux distro, whatever. Python doesn't need a grand, unifying Python package manager so much as a limited, unified source package format.
Well, there is no way to address it then, no magic will eliminate everything from the list.
So another tool isn't meaningfully different (and it can be the answer): if "the community" migrates to the new tool it wouldn't matter that there's a dozen of other unused tools.
Same thing if "the community" fixes an existing tool and migrates to it: other unused tools will still exist
Docker is kinda the opposite of Nix in this respect— Docker is fundamentally parasitic on other tools for dependency management, and Nix handles dependencies itself.
That parasitism is also Docker's strength: bring along whatever knowledge you have of your favorite language ecosystem's toolchain; it'll not only apply but it'll likely be largely sufficient.
Build systems like Buck and Bazel are more like Nix in this respect: they take over the responsibilities of soke tools in your language's toolchain (usually high-level build tools, sometimes also dependency managers) so they can impose a certain discipline and yield certain benefits (crucially fine-grained, incremental compilation).
Anyway, Docker doesn't fetch or resolve the dependencies of Python packages. It leaves that to other tools (Nix, apt-get, whatever) and just does you the favor of freezing the result as a binary artifact. Immensely useful, but solves a different problem than the main one here, even if it eases some of the same burdens.
I'm enjoying uv but I wouldn't say the problem is "fully" solved -- for starters it's not uncommon to do `uv add foo` and then 5K lines of gobbledygook later it says "missing foo-esoterica.dll" and I have to go back to the multiplatform drawing board.
It is not a new discovery that Python is terrible for packaging and distribution. Unfortunately, very little has been done about this. The fact that Python is used on particular environments controlled by the developers, mainly machine learning, makes this even more difficult to fix.
It's not really true to say "very little has been done." Thousands of person-hours have been invested into this problem! But the results have been mixed.
Time was spent, but on what? Creating 15+ different, competing tools? That won’t improve things. Blessing one tool and adopting something equivalent to node_modules could, but the core team is not interested in improving things this way.
> what makes this so difficult to solve in Python?
I think the answer is the same thing that makes it difficult to make a good package manager for C++.
When a language doesn't start with decent package management, it becomes really hard to retrofit a good one later in the lifespan of that language. Everyone can see "this sucks" but there's simply no good route to change the status quo.
I think Java is the one language I've seen that has successfully done the switch.
Java, C#, JavaScript (node) all disagree. If the Python core team wanted good packaging, they could have done it ages ago. Sure, a good solution might not be applicable for past Python versions, but they aren’t doing anything to make it any better.
PyPI was always broken due to weird ideas for problems that were long solved in other languages or distributions. They had/have the backing of fastly.net, which created an arrogant and incompetent environment where people listed to no one.
Conda suffers from the virtual environment syndrome. Virtual environments are always imperfect and confusing. System libraries sometime leak through. The "scientific" Python stack has horrible mixtures of C/C++/Cython etc., all poorly written ad difficult to build.
Projects deteriorated in their ability to build from source due to the availability of binary wheels and the explosion of build systems. In 2010 there was a good chance that building a C project worked. Now you fight with meson versions, meson-python, cython versions, libc versions and so forth.
There is no longer any culture of correctness and code cleanliness in the Python ecosystem. A lot of good developers have left. Some current developers work for the companies who sell solutions for the chaos in the ecosystem.
Python packaging’s complexities are difficult to attribute to any single cause. But path dependency, extremely broad adoption, and social conventions with the Python community (which has historically preferred standards over picking single tools) are all contributing factors.
Most of these aspects have significantly improved over the last decade, at least for the standard packaging ecosystem. I don’t know about Conda, which has always been its own separate thing.
Python packaging is broken mostly because bootstrapping is broken, and it cascades to packaging but people don't know the bootstrapping is responsible and blame packaging.
Not saying packaging doesn't have faults, but on it's own, on a good Python setup, it's actually better than average. But few people have a good setup. In fact most people don't know what a good setup looks like.
Yes, that's one of the most important success of the tool. Being in rust, it is completely independent from the Python setup, and therefore it doesn't care if you botched it. And with the indy greg build, it can even avoid the pyenv pitfall of compiling on your machine on linux.
My single setup routine has served me well for years, with little to no change: pipx as the tools manager, miniconda for env bootstrap and management, poetry (installed with pipx) for project management (works great with conda envs) and autoenv to ensure the correct env is always active for any project I'm currently in. The only issue I may potentially have is if I install anything apart from Python via conda, as that won't be reflected in the pyproject file.
>> One thing which I think messed this up from the beginning was applying the Unix philosophy with several/many individual tools as opposed to one cohesive system
I don’t think the Python community has a culture of thinking about software engineering in a principled and systematic way like you would see in places like Haskell, Rust or Clojure communities.
Pythons strength (and weakness) is an emphasis on quick scripts, data science and statistics.
There’s simply not the right people with the right mindset.
No it's wrong because of the mess it makes. Which makes even the things that that crowd of people wants to focus on, like wuick scripts or data science, harder.
my approach is to ignore all the *conda stuff and:
yay -S python-virtualenv # I'm on arch, do not confuse with 12 similarly named alternatives
pyenv virtualenv 3.10 random-python-crap
pyenv local 3.10.6/envs/random-python-crap
pip install -r requirements.txt
and it works (sometimes deps are in some other places, or you have to pass -c constraints.txt or there is no file and you need to create it in various ways)
At least by not using local .env directories, I always know where to find them.
I install a lot of AI projecst so I have around 1TB just for the same python dependencies installed over and over again.
Sometimes I can get away with trying to use the same venv for two different projects but 8/10 deps get broken.
Seconded. How about we don't write a blog trashing an implementation of something when our own design is missing some very basic accessibility and ux features.
Though I agree with the premise, Conda is an absolute pest when you start customising an environment with a number of packages. Dependency resolution hell.
I see this on Firefox on desktop too. I usually try not to criticize the presentation of things posted here, but this is completely unreadable. I've tried several times to get through the first couple paragraphs, but it's just not worth the extra mental effort.
The blog author explicitly requested it, with `word-break: break-all`.
Now why you would do that … IDK.
Weirdly, it's the second post on HN this quarter to do it, from a completely different site. Makes me wonder if there's some viral piece of CSS advice out there …? (And nobody looks at their site…?) Bad LLM output?
The author put "word-break:break-all" on a parent element of the text content, which itself is a <ul> containing <p>, probably for "layout" purposes. Methinks some CSS education is desperately needed.
When you install the numpy wheel through `uv` you are likely installing a pre-compiled binary that bundles openblas inside of it. When you install numpy through conda-forge, it dynamically links against a dummy blas package that can be substituted for mkl, openblas, accelerate, whatever you prefer on your system. It's a much better solution to be able to rely on a separate package rather than having to bundle every dependency.
Then lets say you install scipy. Scipy also has to bundle openblas in their wheel, and now you have two copies of openblas sitting around. They don't conflict, but this quickly becomes an odd thing to have to do.
With a background in scientific computing where many of the dependencies I managed are compiled, conda packages gives me much more control.
P.S. I’d like to point out to others to differentiate between package index and package managers. PyPI is an index (that hosts packages in a predefined format) while pip, poetry, uv are package managers that resolve and build your environments using the index.
Similarly but a bit more confusingly, conda can be understood as the index, hosted by anaconda but can also be hosted elsewhere, with different “channels” (kinda like a GitHub organization) where conda-forge is a popular one built by communities. Conda is also a reference implementation of a package manager that uses anaconda channels to resolve. Mamba is an independent, performant, drop in replacement of conda. And pixi is a different one with a different interface by the author of mamba.
Even more confusingly, there are distributions. Distributions come with a set of predefined packages together with the package manager such that you just start running things immediately (sort of like a TeXLive distribution in relation to the package manager tlmgr.) there are anaconda distributions (if you installed anaconda instead of installing conda, that’s what you get), but also Intels distribution for Python, mini forge, mambaforge, etc.
Is this beyond what the pyproject.toml spec supports?
But Anaconda and CondaForge are general package repository, they are not Python-specific but are happy to be used for R, Julia, C/C++/Fortran binaries, etc. it’s primarily a binary-based repository. For example, you can `conda install python` but you can’t `pip install python`.
I don’t know if there is any technical barrier or just a philosophical barrier. Clearly, Pip handles binary blobs inside of Python packages fine, so I would guess the latter but am happy to be corrected :).
Before conda, getting a usable scipy install up and running on MS Windows was a harrowing experience. And having two independent installations was basically impossible. The real hard work that went into conda was reverse engineering all the nooks and crannies of the DLL loading heuristics, to allow it to ensure that you loaded what you intended.
If you are working on macOS and deploying to some *nix in the cloud, you are unlikely to find any value in this. But in ten years as lead on a large tool that was deployed to personal (Windows) laptops in a corporate environment, I did not find anything that beat conda.
Today you can just "pip install scipy" on Windows at it will just work.
I can’t recall the library, but there was another major project that just deprecated TF because it was the cause of so many build problems.
I create a virtual environment for every project. I install almost all packages with pip, except for any binaries or CUDA related things from conda. I always exported the conda yaml file and managed to reproduce the code/environment including the Python version. I've seen a lot of posts over time praising poetry and other tools and complaining about conda but I could never relate to any of them.
Am i doing something wrong? Or something right?
Poetry isn't perfect, but it's working in an imperfect universe and at least gets the basics (lockfiles) correct to where packages can be semi-reproducible.
There's another rant to be had at the very existence of venvs as part of the solution, but that's neither poetry or anaconda's fault.
It is entirely possible to use poetry to determine the precise set of packages to install and write a requirements.txt, and then shotgun install those packages in parallel. I used a stupidly simple fish shell for loop that ran every requirements line as a pip install with an “&” to background the job and a “wait” after the loop. (iirc) Could use xargs or parallel too.
This is possible at least. Maybe it breaks in some circumstances but I haven’t hit it.
Not as an excuse for bad behavior but rather to consider infrastructure and expectations:
The packages might be cached locally.
There might be many servers – a CDN and/or mirrors.
Each server might have connection limits.
(The machine downloading the packages miiiiiight be able to serve as a mirror for others.)
If these are true, then it’s altruistically self-interested for everyone that the downloader gets all the packages as quickly as possible to be able to get stuff done.
I don’t know if they are true. I’d hope that local caching, CDNs and mirrors as well as reasonable connection limits were a self-evident and obviously minimal requirement for package distribution in something as arguably nation-sized as Python.
And… just… everywhere, really.
We've used different combinations of pipx+lockfiles or poetry, which has been so far OK'ish. But recently discovered uv and are wondering about existing experience so far across the industry.
At the same time, poetry still uses a custom format and is pretty slow.
I was so amazed of the speed, I moved all my projects to uv and have not yet looked back.
uv replaces all of pip, pipx and poetry for me, I does not do more than these tools, but it does it right and fast.
If you're at liberty to try uv, you should try it someday, you might like it. (nothing wrong with staying with poetry or pyenv though, they get the job done)
or worse, imagine being a longtime user of shells but not python and then being presented a venv as a solution to the problem that for some reason python doesn't stash deps in a subdirectory of your project
You just need to have some sort of wrapper/program that knows how to figure out which dependencies to use for a project. With bundler, you just wrap everything in "bundle exec" (or use binstubs).
There are many dependency managers that use a project-local flat storage, and a global storage was really frowned upon until immutable versions and reliable identifiers became popular some 10 years ago.
Ruby and Perl certainly didn't have it - although Ruby did subsequently add Bundler to gems and gems supported multiversioning.
I feel like venv is one such solution. A workaround that doesn’t solve the problem at its root, so much as make the symptoms manageable. But there is (at least for me) a big difference between things like that and the cool ideas that underlie shell tooling like Unix pipes. Things like jq or fzf are awesome examples of new tooling that fit beautifully in the existing paradigm but make it more powerful and useful.
For some libraries, it is not acceptable to stash the dependencies for every single toy app you use. I don't know how much space TensorFlow or PyQt use but I'm guessing most people don't want to install those in many venvs.
Also installing everything with pip is a great way to enjoy unexplainable breakage when a Doesn't work with v1 and b doesn't work with v2.
It also leads to breaking Linux systems where a large part of the system is python code. Especially where user upgrades system python for no reason.
- Setup custom kernels in Jupyter Notebook
- Hardlink the environments, then install same packages via pip in one and conda in others
- install conda inside conda (!!!) and enter nested environment
- Use tox within conda
I believe as long as you treat the environments as "cattle" (if it goes bad, remove it and re-create from yaml file), you should not have any problems. It's clearly not the case of for the post's author though.
(I do agree pip is still pretty lackluster, but the proposed replacements don't really get to the heart of the problem and seem to lack staying power. I'm in 'wait and see' mode on most of them)
`pixi` basically covers `conda` while using the same solver as `uv` and is written in Rust like `uv`.
Now is it a good idea to have python's package management tool handle non-python packages? I think that's debateable. I personally am in favor of a world where `uv` is simply the final python package management solution.
Wrote an article on it here: https://dublog.net/blog/so-many-python-package-managers/
It’s fast, takes yml files as an input (which is super convenient) and super intuitive
Quite surprised it isn’t more popular
Also crossing fingers that uv ends up being the last one standing when the comprehensive amounts of dust here settle. But until then, I'll look into pixi, on the off chance it minimizes some of my workplace sorrows.
> except for any binaries or CUDA related things from conda
doing the default thing with cuda related python packages used to often result in "fuck it, reinstall linux". admittedly, i dont know how it is now. i have one machine that runs python with a gpu and it runs only one python program.
From about 2014-17 you are correct, but it appears (on ubuntu at least), that it mostly works now. Maybe I've just gotten better at dealing with the pain though...
People like to complain about node packages but never seen people have the trouble with them that they have with python.
You can just give up and say that "The proper way to do this is to use the Nvidia CUDA toolkit to write your cuda app in C++ and then invoke it as a separate process from node" [0]. That apparently works for node, but Python wants much more.
If you actually want to use high-performance native code in your slow compiled language, then no solution is going to be very good, that's because the problem is inherently hard.
You can rely on host OS as much as possible - if OS is known, provide binaries; if it's unknown, provide source code and hope user has C/C++/Rust/Fortran compilers to build it. That's what uv, pip, etc.. do.
You can create your own parallel OS, bringing your own copy of every math libray, as well as CUDA even if there are perfectly good versions installed on the system - that's what conda/minconda does.
You can implement as much as possible in your own language, so there is much less need to use "high-performance native language" - that's what Rust and Go do. Sadly, that's not an option for Python.
[0] https://stackoverflow.com/questions/20875456/how-can-i-use-c...
If you aren't precise, you're gonna get different versions of your dependencies on different machines. Oops.
Pinning concrete versions is of course better, but then there isn't a clear and easy way to upgrade all dependencies and check whether ci still passes.
The only difference from one language to another is that some make this mandatory, while in others it's only something that you should really do and there isn't any other real option you should consider.
That means you don't use Windows.
What is great. Keep not using it. But most people will have a different experience.
- DS/compiled libs users (mostly Fortran/Cuda/C++)
- Anyone with dependencies on native/non python libraries.
Conda definitely helps with 2 and 3 above, and uv is at least a nice, fast API over pip (which is better since it started doing dependency checking and binary wheels).
More generally, lots of the issues come from the nature of python as a glue language over compiled libraries, which is a relatively harder problem in general.
i think there was a significant step change improvement in python packaging around 2012, when the wheel format was introduced, which standardised distributing prebuilt platform-specific binary packages. for packages with gnarly native library dependencies / build toolchains (e.g. typical C/fortran numeric or scientific library wrapped in a layer of python bindings), once someone sets up a build server to bake wheels for target platforms, it becomes very easy to pip install them without dragging in that project's native build-from-source toolchain.
venv + pip (+ perhaps maintaining a stack of pre-built wheels for your target platform, for a commercial project where you want to be able to reproduce builds) gets most of the job done, and those ingredients have been in place for over 10 years.
around the time wheel was introduced, i was working at a company that shipped desktop software to windows machines, we used python for some of the application components. between venv + pip + wheels, it was OK.
where there were rough edges were things like: we have a dep on python wrapper library pywhatever, which requires a native library libwhatever.dll built from the c++ whatever project to be installed -- but libwhatever.dll has nothing to do with python, maybe its maintainers kindly provide an msi installer, so if you install it into a machine, it gets installed into the windows system folder, so venv isn't able to manage it & offer isolation if you need to install multiple versions for different projects / product lines, as venv only manages python packages, not arbitrary library dependencies from other ecosystems
but it's a bit much blame python for such difficulties: if you have a python library that has a native dependency on something that isnt a python package, you need to do something else to manage that dep. that's life. if you're trying to do it on windows, which doesn't have an O/S level package manager.. well, that's life.
Oh, and if some package you are using has a bug or something that requires you to vendor it in your repo, well then good luck because again, PEP 508 does not support installing another package from a relative link. You either need to put all the code inside the same package, vendored dependency included, and do some weird stuff to make sure that the module you wanted to vendor is used first, or... you just have to use the broken package, again for some sort of security reasons apparently.
Again, all of that might even work when using pip from the cli, but good luck trying to make a requirements.txt or define dependencies in a standard way that is even slightly outside of a certain workflow.
For development I use venv and pip, sometimes pyenv if I need a specific Python version. For production, I install Python packages with apt. The operating system can deal with upgrading minor library versions.
I really hate most other package managers, they are all to confusing and to hard to use. You need to remember to pull in library update, rebuild and release. Poetry sucks too, it's way to complicated to use.
The technical arguments against Python packages managers are completely valid, but when people bring up Maven, NPM or even Go as role models I check out. The ergonomics of those tools are worse than venv and pip. I also think that's why we put up with pip and venv, they are so much easier to use than the alternative (maybe excluding uv). If a project uses Poetry, I just know that I'm going to be spending half a day upgrading dependencies, because someone locked them down a year ago and there's now 15 security holes that needs to be plugged.
No, what Python needs is to pull in requests and a web framework into the standard library and then we can start build 50% of our projects without any dependencies at all. They could pull in Django, it only has two or three dependencies anyway.
Being new to the ecosystem I have no clue why people would use Conda and why it matters. I tried it, but was left bewildered, not understanding the benefits.
The big thing to realise is that when Conda first was released it was the only packaging solution that truly treated Windows as a first class citizen and for a long time was really the only way to easily install python packages on Windows. This got it a huge following in the scientific community where many people don't have a solid programming/computer background and generally still ran Windows on their desktops.
Conda also not only manages your python interpreter and python libraries, it manages your entire dependency chain down to the C level in a cross platform way. If a python library is a wrapper around a C library then pip generally won't also install the C library, Conda (often) will. If you have two different projects that need two different versions of GDAL or one needs OpenBLAS and one that needs MKL, or two different versions of CUDA then Conda (attempts to) solve that in a way that transparently works on Windows, Linux and MacOS. Using venv + requirements.txt you're out of luck and will have to fall back on doing everything in its own docker container.
Conda lets you mix private and public repos as well as mirroring public packages on-perm in a transparent way much smoother than pip, and has tools for things like audit logging, find grained access control, package signing and centralised controls and policy management.
Conda also has support for managing multi-language projects. Does your python project need nodejs installed to build the front-end? Conda can also manage your nodejs install. Using R for some statistical analysis in some part of your data pipeline? Conda will mange your R install. Using a Java library for something? Conda will make sure everybody has the right version of Java installed.
Also, it at least used to be common for people writing numeric and scientific libraries to release Conda packages first and then only eventually publish on PyPi once the library was 'done' (which could very well be never). So if you wanted the latest cutting edge packages in many fields you needed Conda.
Now there are obviously a huge class a projects where none of these features are needed and mean nothing. If you don't need Conda, then Conda is no longer the best answer. But there are still a lot of niche things Conda still does better than any other tool.
I love conda, but this isn't true. You need to opt-in to a bunch of optional compiler flags to get a portable yml file, and then it can often fail on different OS's/versions anyway.
I haven't done too much of this since 2021 (gave up and used containers instead) but it was a nightmare getting windows/mac builds to work correctly with conda back then.
As user of the modules, venv is sufficient.
Coming from C++, IMO, it is vastly better.
Nowadays, thanks to wheels being numerous and robust, the appeal of anaconda is disappearing for most users except for some exotic mixes.
conda itself now causes more trouble than it solves as it's slow, and lives in its own incompatible world.
But anaconda solves a different problem now that nobody else solves, and that's managing Python for big corporation. This is worth a lot of money to big structures that need to control packages origin, permissions, updates, and so on, at scale.
So it thrives there.
In my experience conda is enormously superior to the standard Python packaging tools.
Mind you, glad it works for you. Warms my grey heart to know there's some balance in this universe. :)
I avoid it as much as possible.
Why go through all this trouble? Because originally it was meant to be a basic "scientific Python" distribution, and needed to be strict around what's installed for reproducibility reasons.
It's IMO overkill for most users, and I suspect most scientific users don't care either - most of the time I see grads and researchers just say "fuck it" and use Pip whenever Conda refuses to get done in a timely fashion.
And the ones who do care about reproducibility are using R anyway, since there's a perception those libraries are "more correct" (read: more faithful to the original publication) than Pythonland. And TBH I can't blame them when the poster child of it is Sklearn's RandomForestRegressor not even being correctly named - it's bagged trees under the default settings, and you don't get any indication of this unless you look at that specific kwarg in the docs.
Personally, I use Conda not for reproducibility, but so all of my projects have independent environments without having to mess with containers
I worked in a pharma company with lots of R code and this comment is bringing up some PTSD. One time we spent weeks trying to recreate an "environment" to reproduce a set of results. Try installing a specific version of a package, and all the dependencies it pulls in are the latest version, whether or not they are compatible. Nobody actually records the package versions they used.
The R community are only now realising that reproducible environments are a good thing, and not everybody simply wants the latest version of a package. Packrat was a disaster, renv is slightly better.
A perfectly reasonable goal, yup! Thankfully not one that, in fact, requires conda. Automated per-project environments are increasingly the default way of doing things in Python, thank goodness. It's been a long time coming.
Neat idea, but sounds like a lot of work.
I've seen so many issues with different Python venvs from different Python project directories stepping on each others' dependencies somehow (probably because there are some global ones) that the fact that I can now just stick a basic and barely-modified-per-project Python flake.nix file in each one and be always guaranteed to have the entirely of the same dependencies available when I run it 6 months later is a win.
https://devenv.sh/
I'll offer mine: I won't say that Python packaging is generally excellent, but it's gotten much better over the years. The pyproject.toml is a godsend, there's the venv module built-in to Python, pip will by default no longer install package outside of a venv. Dependency groups are being added, meaning that the requirements.txt files can also be specified in the project.toml. Documentation is pretty good, especially if you avoid blog posts from 5+ years ago.
I don’t mind conda. It has a lot of caveats and weird quirks
But, I think this illustrates the problem very well.
Conda isn't just used for Python. It's used for general tools and libraries that Python scripts depend on. They could be C/C++ that needs to be compiled. It could be a Cython library. It could be...
When you're trying to be a package manager that operates on-top of the operating system's package manager, you're always going to have issues. And that is why Conda is such a mess, it's trying to do too much. Installation issues are one of the reason why I stopped writing so many projects in Python. For now, I'm only doing smaller scripts in Python. Anything larger than a module gets written in something else.
People here have mentioned Rust as an example of a language with a solid dependency toolchain. I've used more Go, which similarly has had dependency management tooling from the begining. By and large, these languages aren't trying to bring in C libraries that need to be compiled and linked into Python accessible code (it's probably possible, but not the main use-case).
For Python code though, when I do need to import a package, I always start with a fresh venv virtual environment, install whatever libraries are needed in that venv, and then always run the python from that absolute path (ex: `venv/bin/python3 script.py`). This has solved 99% of my dependency issues. If you can separate yourself from the system python as much as possible, you're 90% of the way there.
Side rant: Which, is why I think there is a problem with Python to begin with -- *nix OSes all include a system level Python install. Dependencies only become a problem when you're installing libraries in a global path. If you can have separate dependency trees for individual projects, you're largely safe. It's not very storage efficient, but that's a different issue.
How would you do this otherwise? I find `conda list` to be terribly helpful.
As a tool developer for bioinformaticians, I can't imagine trying to work with OS package managers, so that would leave vendoring multiple languages and libraries in a home-grown scheme slightly worse and more brittle than conda.
I also don't think it's realistic to imagine that any single language (and thus language-specific build tools or pkg manager) is sufficient. Since we're still using fortran deep in the guts of many higher level libraries (recent tensor stuff is disrupting this a bit, but it's not like openBLAS isn't still there as a default backend).
I think you might be surprised as to how long this has been going on (or maybe you already know...). When I started with HPC and bioinformatics, Modules were already well established as a mechanism for keeping track of versioning and multiple libraries and tools. And this was over 20 years ago.
The trick to all of this is to be meticulous in how data and programs are organized. If you're organized, then all of the tracking and trails are easy. It's just soooo easy to be disorganized. This is especially true with non-devs who are trying to use a Conda installed tool. You certainly can be organized and use Conda, but more often than not, for me, tools published with Conda have been a $WORKSFORME situation. If it works, great. If it doesn't... well, good luck trying to figure out what went wrong.
I generally try to keep my dependency trees light and if I need to install a tool, I'll manually install the version I need. If I need multiple versions, modules are still a thing. I generally am hesitant to trust most academic code and pipelines, so blindly installing with Conda is usually my last resort.
I'm far more comfortable with Docker-ized pipelines though. At least then you know when the dev says $WORKSFORME, it will also $WORKFORYOU.
> A single Anaconda distribution may have multiple NumPy versions installed at the same time, although only one will be available to the Python process (note that this means that sub-processes created in this Python process won’t necessarily have the same version of NumPy!).
I’m pretty sure there’s not, but maybe there is some insane way to cause subprocesses to do this. Besides that, under the authors definition, different Python virtualenvs also install multiple copies of libraries in the same way conda does.
The comments about Jupyter also seem very confused. It’s hard to make heads or tails of exactly what the author is saying. There might be some misunderstandings of how Jupyter kernels select environments.
> Final warning: no matter how ridiculous this is: the current directory in Python is added to the module lookup path, and it precedes every other lookup location. If, accidentally, you placed a numpy.py in the current directory of your Python process – that is going to be the numpy module you import.
This has nothing to do with conda.
uv is here to kick ass and chew bubblegum. And it’s all out of gum.
These days, when I absolutely have to use it because some obscure piece of software can't run unless Conda, I install it in a VM so that:
Wait, what? In what situation would that ever happen? Especially given the directories for packages are not versioned, so setuptools should never do two different versions in any way.
poetry has been working well enough for me as of late, but it'd be nice if I didn't have to pick.
What I don't understand - what makes this so difficult to solve in Python? It seems that many other platforms solved this a long time ago - maven 2.0 was released almost 20 years ago. While it wasn't / isn't by no means perfect, its fundamentals were decent already back then.
One thing which I think messed this up from the beginning was applying the Unix philosophy with several/many individual tools as opposed to one cohesive system - requirements.txt, setuptools, pip, pipx, pipenv, venv... were always woefully inadequate, but produced a myriad of possible combinations to support. It seems like simplicity was the main motivation for such design, but these certainly seems like examples of being too simplistic for the job.
I recently tried to run a Python app (after having a couple of years break from Python) which used conda and I got lost there quickly. Project README described using conda, mamba, anaconda, conda-forge, mini-forge, mini-conda ... In the end, nothing I tried worked.
Python creates the perfect storm for package management hell:
- Most the valuable libraries are natively compiled (so you get all the fun of distributing binaries for every platform without any of the traditional benefits of native compilation)
- The dynamic nature makes it challenging to understand the non-local impacts of changes without a full integration test suite (library developers break each other all the time without realizing it, semantic versioning is a farce)
- Too many fractured packaging solutions, not a single one well designed. And they all conflict.
- A bifurcated culture of interactive use vs production code - while they both ostensibly use the same language, they have wildly different sub-cultures and best practices.
- Churn: a culture that largely disavows strong backwards compatibility guarantees, in favor of the "move fast and break things" approach. (Consequence: you have to move fast too just to keep up with all the breakage)
- A culture that values ease of use above simplicity of implementation. Python developers would rather save 1 line of code in the moment, even if it pushes the complexity off to another part of the system. The quite obvious consequence is an ever-growing backlog of complexity.
Some of the issues are technical. But I'd argue that the final bullet is why all of the above problems are getting worse, not better.
100% this.
Last 4 years, one of the most frustrating parts of SWE that I need to deal with on a daily basis is packaging data science & machine learning applications and APIs in Python.
Maybe this is a very mid-solution, but one solution that I found was to use dockerized local environments with all dependencies pinned via Poetry [1]. The start setup is not easy, but now using some other Make file, it's something that I take only 4 hours with a DS to explain and run together and save tons of hours of in debugging and dependency conflict.
> Python developers would rather save 1 line of code in the moment, even if it pushes the complexity off to another part of the system.
Sounds odd to me in several projects that I worked on that folks bring the entire dependency on Scikit-Learn due to the train_test_split function [2] because the team thought that it would be simpler and easier to write a function that splits the dataset.
[1] - https://github.com/orgs/python-poetry/discussions/1879 [2] - https://scikit-learn.org/1.5/modules/generated/sklearn.model...
has anyone managed to make a viable P#, a clean break which retains most of what most people love about the language and environment; and cheerfully asserts new and immutable change in things like <the technical parts of the above>.
When I have looked into this it seems people can't help but improve one-more-thing or one-other-thing and end up just enjoying vaguely-pythonic language design.
I also think some of the criticisms in the GP comment are not accurate. most of the valuable libraries are native compiled? Some important ones are, but not all.
I think a lot of the problem is that Python's usage has changed. Its great for a wide range of uses (scripting, web apps and other server stuff, even GUIs) but its really not a great match for scientific computing and the like but has become widely used there because it is easy to learn (and has lots of libraries for that now!).
https://couragetotremble.blog/2007/08/09/p-language/
If Python leadership had true visionaries they would sit down, analyze every publicly available Python project and build a single set of tools that could gradually and seamlessly replace the existing clusterfuck.
Python developers will pretend the language is all about simplicity and then hand you over to the most deranged ecosystem imaginable. It sure is easy to pretend that you have a really simple ecosystem when you cover your eyes and focus on a small segment of the overall experience.
I'm not sure how Rust is doing it, but the problem is hardly insurmountable.
I think there are many answers to this, and there are many factors contributing to it, but if I had to pick one: The setup.py file. It needs to be executed to determine the dependencies of a project. Since it's a script, that allows any maintainer of any package you are using to do arbitrarily complex/dumb stuff in it like e.g. conditionally adding dependencies based on host system specific environment markers, or introduce assumptions on the environment it is being installed to. That makes trying to achieve all the things you'f want from a modern package manager so much harder.
This also means that the problem isn't just concentrated in 1-2 central package management projects, but scattered throughout the ecosystem (and some of the worst offenders are some of Python's most popular sub-ecosystems).
There is some light with the introduction of the pyproject.toml, and now uv as a tool taking advantage of it.
Yes, this should never have been allowed. It solved a problem in the short term but in the long term has caused no end of pain.
Also I think that Python packages are sometimes distributed as shared libraries is a problem. When I think about conan or vcpkg (package managers for C and C++), they usually suck because some dependencies are available on some platforms and not on others or even in one version on one platform and in another version on another and you get messes all around if you need to support multiple platforms.
I think generally binary package managers are almost always bad* and source based package managers almost always work well (I think those are essentially easy mode).
*: unless they maintain a source package of their own that they actually support and have a fixed set of well-supported platforms (like system package managers on most Linux distros do).
This is exactly the reason I've moved from pip to conda for some projects: "pip" was acting a source-based package manager, and thus asking for C tools, libraries and dev headers to be installed - but not providing them as they were non-Python and thus declared out of scope. Especially on older Linux distributions, getting dependencies right can be quite a task.
Were your issues recent or from several years ago?
My guess is that we're stuck in a local maximum for a while, with uv looking like a decent contender.
Is that a new feature? Pretty sure it didn't a few years ago. If the thing I need needed the libfoo C library then I first had to install libfoo on my computer using apt/brew/etc. If a new version of the PHP extension comes out that uses libfoo 2.0, then it was up to me to update libfoo first. There was no way for composer to install and manage libfoo.
> Php-yaml can be installed using PHP's PECL package manager. This extension requires the LibYAML C library version 0.1.0 or higher to be installed.
This is basically how "pip" works, and while it's fine for basic stuff, it gets pretty bad if you want to install fancy numerical of cryptography package on a LTS linux system that's at the end of the support period.I am guessing that PHP might simply have less need for native packages, being more web-oriented.
But with Python it seems completely fractured - everyone tries to solve it their own way, with nothing becoming a truly widely used solution. More involvement from the Python project could make a difference. From my perspective, this mess is currently Python's biggest problem and should be prioritized accordingly.
Even the CLI workflow is identical: dotnet add package / cargo add (.NET had it earlier too, it's nice that Cargo now also has it).
> Python community itself needs to address this.
The Python community can't address it, really, because that would make the Python community responsible for a general-purpose package management system not at all limited to Python, but including packages written in C, C++, and Rust to start, and also Fortran, maybe Haskell and Go, too.
The only role the Python community can realistically play in such a solution is making Python packages well-behaved (i.e., no more arbitrary code at build time or install time) and standardizing a source format rich with metadata about all dependencies (including non-Python dependencies). There seems to be some interest in this in the Python community, but not much.
The truth, perhaps bitter, is that for languages whose most important packages all have dependencies foreign to the ecosystem, the only sane package management strategy is slotting yourself into polyglot software distributions like Nix, Guix, Spack, Conda, Pkgsrc, MacPorts, MSYS2, your favorite Linux distro, whatever. Python doesn't need a grand, unifying Python package manager so much as a limited, unified source package format.
So another tool isn't meaningfully different (and it can be the answer): if "the community" migrates to the new tool it wouldn't matter that there's a dozen of other unused tools.
Same thing if "the community" fixes an existing tool and migrates to it: other unused tools will still exist
That parasitism is also Docker's strength: bring along whatever knowledge you have of your favorite language ecosystem's toolchain; it'll not only apply but it'll likely be largely sufficient.
Build systems like Buck and Bazel are more like Nix in this respect: they take over the responsibilities of soke tools in your language's toolchain (usually high-level build tools, sometimes also dependency managers) so they can impose a certain discipline and yield certain benefits (crucially fine-grained, incremental compilation).
Anyway, Docker doesn't fetch or resolve the dependencies of Python packages. It leaves that to other tools (Nix, apt-get, whatever) and just does you the favor of freezing the result as a binary artifact. Immensely useful, but solves a different problem than the main one here, even if it eases some of the same burdens.
also it doesn't always work, I got stuck with some dependencies when it works it's amazing
At least uv is nice! https://docs.astral.sh/uv/
I think the answer is the same thing that makes it difficult to make a good package manager for C++.
When a language doesn't start with decent package management, it becomes really hard to retrofit a good one later in the lifespan of that language. Everyone can see "this sucks" but there's simply no good route to change the status quo.
I think Java is the one language I've seen that has successfully done the switch.
Conda suffers from the virtual environment syndrome. Virtual environments are always imperfect and confusing. System libraries sometime leak through. The "scientific" Python stack has horrible mixtures of C/C++/Cython etc., all poorly written ad difficult to build.
Projects deteriorated in their ability to build from source due to the availability of binary wheels and the explosion of build systems. In 2010 there was a good chance that building a C project worked. Now you fight with meson versions, meson-python, cython versions, libc versions and so forth.
There is no longer any culture of correctness and code cleanliness in the Python ecosystem. A lot of good developers have left. Some current developers work for the companies who sell solutions for the chaos in the ecosystem.
Don't forget a whole lot of FORTRAN :)
Most of these aspects have significantly improved over the last decade, at least for the standard packaging ecosystem. I don’t know about Conda, which has always been its own separate thing.
Not saying packaging doesn't have faults, but on it's own, on a good Python setup, it's actually better than average. But few people have a good setup. In fact most people don't know what a good setup looks like.
And here is why bootstrapping is broken: https://www.bitecode.dev/p/why-is-the-python-installation-pr...
Well, Unix IS the cohesive system..
https://www.youtube.com/watch?v=zOY9mc-zRxk
Pythons strength (and weakness) is an emphasis on quick scripts, data science and statistics.
There’s simply not the right people with the right mindset.
yay -S python-virtualenv # I'm on arch, do not confuse with 12 similarly named alternatives pyenv virtualenv 3.10 random-python-crap pyenv local 3.10.6/envs/random-python-crap pip install -r requirements.txt
and it works (sometimes deps are in some other places, or you have to pass -c constraints.txt or there is no file and you need to create it in various ways)
At least by not using local .env directories, I always know where to find them.
I install a lot of AI projecst so I have around 1TB just for the same python dependencies installed over and over again.
Sometimes I can get away with trying to use the same venv for two different projects but 8/10 deps get broken.
Though I agree with the premise, Conda is an absolute pest when you start customising an environment with a number of packages. Dependency resolution hell.
Now why you would do that … IDK.
Weirdly, it's the second post on HN this quarter to do it, from a completely different site. Makes me wonder if there's some viral piece of CSS advice out there …? (And nobody looks at their site…?) Bad LLM output?
When 30% of your page is junk, it makes you wonder about the other 30%...
https://mail.python.org/pipermail/python-list/2024-May/91230...
[oof, never mind, it's worse in some ways]