Package Management Is a Wicked Problem

(nesbitt.io)

77 points | by zdw 4 days ago

12 comments

nacozarina 4 days ago
Naming things, cache invalidation, and off-by-one errors: package management heavily emphasizes the hardest ‘blue-collar’ problems in CS.
[-]
- bradgessler 2 hours ago
  Today, sales and marketing are the two hardest problems in computer science.
- dizhn 4 hours ago
  Feature creep and not invented here too. (Bikeshedding?)
  [-]
  - taeric 3 hours ago
    I confess "not invented here" is a problem I think too many people focus on. Lots of things are redone all of the time.
    That said, feature creep is absolutely a killer. And it is easy to see how these will stack on each other where people will insist that for this project, they need to try and reinvent the state of the art in solvers to get a product out the door.
- iberator 3 hours ago
  This is stupid and unproven quote. Citation needed. I hate that HN is repeating this over and over and it snot even real nor funny not new joke.
  Try to say that at job interview if you don't believe
  [-]
  - pixl97 3 hours ago
    And to add further to the joke here the full saying goes more like
    >There are only two hard things in Computer Science: cache invalidation, naming things, and off-by-one errors.
    And, if you actually work in software a very large portion of your hard to troubleshoot/fix issues are going to be the above.
    [-]
    - troupo 2 hours ago
      It's not DNS
      It can't be DNS
      There's no chance in hell it's DNS
      ...
      It was DNS
      [-]
      - swiftcoder 1 hour ago
        DNS is a special hell: naming things and caching rolled into one!
  - AlotOfReading 3 hours ago
    It's not to be taken as a serious assessment of actual "hardest problems", but they're all difficult. Naming things is obviously impossible. Everyone gets cache invalidation wrong at first, from Intel/AMD to your build system.
  - anyonecancode 2 hours ago
    Well, there's the variation I heard recently:
    There are only two problems in computer science. We only have one joke, and it's not very funny.
  - swiftcoder 3 hours ago
    > Try to say that at job interview if you don't believe
    If your interviewer doesn't at least crack a smile when you make the off-by-one joke, run, do not walk, to the nearest exit. You don't want to work with that dude
  - bena 3 hours ago
    Naming things is one of the hardest problems we have. In general. Taxonomy is incredibly difficult because it is essentially classification.
    And things never fit neatly into boxes. Giving us such bangers as: Tomatoes are fruit; Everything is a fish or nothing is a fish; and Trees aren't real.
  - lo_zamoyski 3 hours ago
    To spell it out for you...
    1. It's a joke. The hyperbole is intentional, but it does communicate something relatable.
    2. You don't need a citation. Probably anyone with enough software development experience understands the substance of the claim and understands that it is (1).
  - razingeden 2 hours ago
    In case you need to hear this again,
    > “Sarcasm is difficult to grasp on the internet, but some people apparently have more visceral reactions to their misunderstanding than others.”
  - antonvs 1 hour ago
    Yes, and we also need a citation about that quote about a horse and a duck walking into a bar. It doesn't sound very likely to me.
    Martin Fowler has some history of this joke: https://martinfowler.com/bliki/TwoHardThings.html
8organicbits 4 hours ago
Andrew has been writing a ton of interesting blog posts related to package management (https://nesbitt.io/posts/). He's had some great ideas, like testing package managers similar to database Jepsen testing.
[-]
- cbsmith 2 hours ago
  Not to take credit away from Andrew for his ideas and writing, because at least he came up with the idea and wrote about it, but I don't understand how that idea of Jepsen style testing of package managers is a novel idea. Like... what testing would you want to do if you were building a package manager?
finally7394 4 hours ago
I like that the author calls out the naming overloading, cause when I hear package management I think `pacman winget and apt`
[-]
- pxc 4 hours ago
  All three of those are "system package managers" (if you count winget as a package manager at all, which I would not). Pacman and APT are binary package managers while Homebrew is a source-based package manager. Cargo and NPM are language-specific package managers, which is a name I've settled on but don't love.
  Imo there's an identifiable core common to all of these kinds of package managers, and it's not terribly hard to work out a reasonably good hierarchical ontology. I think OP's greater insight in this section is that internally, every package manager has its own ontology with its own semantics and lexicon:
  > Even within a single ecosystem, the naming is contested: is the unit a package, a module, a crate, a distribution? These aren’t synonyms. They encode different assumptions about what gets versioned, what gets published, and what gets installed.
  [-]
  - morpheuskafka 3 hours ago
    The confusing part is that in many cases, end users are using NPM, pip, Go packaging, and to a lesser extent cargo etc to install finished end-user software. I've never written a line of JS but have installed all kinds of command line utilities with npm/npx.
    Normally with an system package manager you would have a -lib package for using in your own code (or simply required by another package), a -src, and then a package without these suffixes would be some kind of executable binary.
    But with npm and pip, I'm never sure whether a package installs binaries or not, and if it does, is it also usable as a library for other code or is it compiled? (Homebrew as you mentioned is source based but typically uses precompiled "bottles" in most cases, I think?) And then there is some stuff that's installed with npm but is not even javascript like font packages for webdev.
    The other interesting thing about these language package managers is that they complete eliminate the role of the distribution in packaging a lot of end user software. Which ironically, in the oldest days you would download a source tarball and compile it yourself. So I guess its just a return to that approach but with go or cargo replacing wget and make.
    [-]
    - cozzyd 3 hours ago
      And plenty of people use pip for programs not even written in python!
  - RetroTechie 3 hours ago
    > Imo there's an identifiable core common to all of these kinds of package managers (..)
    Indeed. It's hard to see why eg. a prog language would need its own package management system.
    Separate mechanics from policy. Different groups of software components in a system could have different policies for when to update what, what repositories are allowed etc. But then use the same (1, the system's) package manager to do the work.
mooracle 4 hours ago
cargo works because rust was young enough to be opinionated. try that with npm and enjoy your mass exodus to the next thing that will also betray you
"but bun!" — faster shovel, same hole
[-]
- skrebbel 2 hours ago
  NPM is plenty opinionated. For all its mistakes, it got lots of things uniquely right too. For example it’s very uncommon in JS land to have version conflicts (“dependency hell”). If two deps both need SuperFoo but different versions, NPM just installs both and things Generally Just Work. Exceptions are gross libraries with lots of global state (such as React) but fortunately those are very uncommon in JS land.
  People love to complain about node_modules being a black hole but that size bought JS land an advantage that’s not very common among popular languages.
  [-]
  - spankalee 2 hours ago
    Yeah, npm never has "version lock" where it can't figure out a valid solution to the version constraints.
    This is mostly good, but version lock does encourage packages to accept wide ranges of dependencies, and to update their dependency ranges frequently, instead of just sitting there on old versions.
- pjmlp 3 hours ago
  And only to the extent it is a pure Rust codebase, add a few other languages to the mix, and it becomes a build.rs mess as well.
- ragall 2 hours ago
  Cargo doesn't work. I'm trying to use it in a monorepo and its cacheing story is horrible. The devs refused when I proposed to switch it to Bazel years ago and now they're regretting it.
DarkNova6 4 hours ago
Is it not curious that languages known for their rigor have solid package manager/build tools while the remakning languages do not?
This is not a technical problem. It’s a cultural one.
[-]
- no_wizard 4 hours ago
  I don’t think those have much to do with it.
  Certainly Go is a more rigorous language than say JavaScript but it’s package mangement was abysmal for years. It’s not even all the great now.
  C/C++ is the same deal. The way it handles anything resembling packages is quite dated (though I think Conan has attempted to solve at least some of this)
  I think Cargo and others have the hindsight of their peers, rather than it being due to any rigorous attribution of the language
  [-]
  - the__alchemist 1 hour ago
    Concur: C and C++ are a great example of being both used for rigorous uses, but building/packaging being a mess. And I think the big adv Cargo/Rust has is learning from past mistakes, and taking good ideas that have come up; discarding bad.
  - pjmlp 3 hours ago
    And vcpkg, not only Conan.
- AnthonyMouse 2 hours ago
  Tacking package management onto a language is feature creep to begin with. You can pretty obviously have a program in one language that uses a library or other dependency written in a different one.
  The real problem is that system package managers need to be made easier to use and have better documentation, so that everyone stops trying to reinvent the wheel.
  [-]
- bee_rider 3 hours ago
  Yes, we can even see—the languages with the best culture and superior rigor have the best package manager: C and Fortran, which just use the filesystem and the user to manage their packages.
  [-]
  - pklausler 2 hours ago
    https://fpm.fortran-lang.org
meisel 2 hours ago
This all just sounds like problems we see when making new features, of any sort, for customers. A feature is never objectively done, there are many opinions on its goodness or badness, once it’s released its mistakes can last with it, etc.
If this is a wicked problem, then so is much of other real-world engineering.
[-]
fridder 2 hours ago
Honestly just look at the dismal history of Python and package management. easy_install, setuptools, pip(x), conda, poetry, uv. Hell I might even be missing one.
[-]
- the__alchemist 2 hours ago
  UV (And a similar tool I built earlier) does solve it. With the important note: This was made feasible due to standardizing on pyproject toml, and wheel files. And being able to compile a diff wheel for each OS/Arch combo, and have the correct one download and installed automatically. And in the case of linux, the manylinux target. I think the old python libs that did arbitrary things in setup.py was a lost cause.
  [-]
  - fridder 1 hour ago
    I hope it solves it, but I've seen that stated before
    [-]
    - the__alchemist 1 hour ago
      Hah yea I agree with that mindset. Poetry, Pipenv, pyenv, venv and Conda were all fakers for me!
pxc 4 hours ago
All this, and yet package management is still so much better than managing software any other way, and there are continually real advancements both in foundations and in UX. It is indeed full of wicked problems in a way that suggests there can be no clear "endgame". But it's also a space where the tools and improvements to them regularly make huge positive differences in people's computing experiences.
The uneven terrain also makes package managers more interesting to compare to each other than many other kinds of software, imo.
mystraline 4 hours ago
It is and isnt.
Version hell is a thing. But Nix's solution is to trade storage space for solving the version problem.
And I think its probably the right way to go.
[-]
- nitwit-se 3 hours ago
  Agreed - Nix feels very well thought through.
  I found Eelco Dolstra‘a doctoral thesis (https://edolstra.github.io/pubs/phd-thesis.pdf) to be a great read and it certainly doesn’t paint the picture of a wicked problem.
pydry 4 hours ago
I dont really agree. Package management has a number of pretty well defined patterns (e.g. lockfiles, isolation, semver, transactionality, etc) which solve common use cases that are largely common across package management.
It is unfortunately one of the most thankless tasks in software engineering, so these are not applied consistently.
This was symbolized quite nicely by google pushing out a steaming turd of a version 1 golang package management putting while simultaneously putting the creator of brew in the no hire pile coz he couldnt reverse a binary tree.
In this respect it is a bit like QA - neglected because it is disrespected.
What makes it seem like a wicked problem is probably that it is the tip of the software iceberg.
It is the front line for every security issue and/or bug, especially the nastiest class of bug - "no man's land" bugs where package A blames B for using it incorrectly and vice versa.
[-]
- cxr 3 hours ago
  Every package manager lock file format or requirements file is an inferior, ad hoc, formally-specified, error-prone, incompatible reimplementation of half of Git.
  Supply chain vulnerabilities are a choice. It's a problem you have to opt in to.
  <https://news.ycombinator.com/item?id=46008744>
  [-]
  - spankalee 1 hour ago
    There is actually a huge difference between checking in all of your dependencies and checking in a lock-file. Some people work with hundreds of repositories on their local machine and checking in dependencies would lead to massive bloat. It really only works if you primarily work in a single monorepo.
    [-]
    - cxr 1 hour ago
      > It really only works if you primarily work in a single monorepo.
      That's simply not true; it doesn't come down to "monorepo-or-not?"
      It comes down to whether or not the code size of an app's dependencies and transitive dependencies are still reasonable or have gotten out of control.
      The trend of language package managers to store stuff out of repo (and their recent, reluctant adoption of lockfiles to mitigate the obvious problems this causes) is and always has been designed to paper over the dependency-size-is-out-of-control problem. That's _the_ reason that this package management strategy exists.
      You can work on dozens of projects (unrelated; from disjoint domains) that you maintain or contribute to while having all the source for the libraries/routines needed to be able to build an app all right there checked into source control—but it means actually having a handle on things instead of just throwing caution to the wind and sucking down a hundred megabytes or more of simultaneously over- and under-engineered third-party dependencies right before build time.
      It's no different from, "Our app consumes way too much RAM", or, "We don't have a way to build the app aside from installing a monstrously large IDE". (Both in the category of, "We could do something about it if we cared to, but we don't.")
      > There is actually a huge difference between checking in all of your dependencies and checking in a lock-file.
      Yes, huge difference indeed: the hugeness of YOLO maintainers' dependency trees.
- hansvm 3 hours ago
  Assuming the binary tree thing is the whole story, that still doesn't sound like a terrible choice on Google's part. Your first few years at Google you won't have enough leeway to do something like "make homebrew," and you will have to interact with an arcane codebase.
  For tree reversal in particular, it shouldn't be any harder than:
  1. If you don't know what a binary tree is then ask the interviewer (you probably _ought_ to know that Google asks you questions about those since their interview packet tells you as much, but let's assume you wanted to wing it instead).
  2. Spend 5-10min exploring what that means with some small trees.
  3. Then start somewhere and ask what needs to change. Clearly the bigger data needs to go left, and the smaller data needs to go right (using an ascending tree as whatever small example you're working on).
  4. Examine what's left, and see what's out of order. Oh, interesting, I again need to swap left and right on this node. And this one. And this one.
  5. Wait, does that actually work? Do I just swap left/right at every node? <5-10min of frantically trying to prove that to yourself in an interview>
  6. Throw together the 1-5 lines of code implementing the algorithm.
  It's a fizzbuzz problem, not a LeetCode Hard. Even with significant evidence to the contrary, I'd be skeptical of their potential next 1-3 years of SWE performance with just that interview to go off of.
  That said, do they actually know that was the issue? With 4+ interviews I wouldn't ordinarily reject somebody just because of one algorithms brain-fart. As the interviewer I'd pivot to another question to try to get evidence of positive abilities, and as the hiring manager I'd consider strong evidence of positive abilities from other interviews much more highly than this one lack of evidence. My understanding is that Google (at least from their published performance research) behaves similarly.
tonyhart7 2 hours ago
so what is the "best" package manager humankind have right now ?????
[-]
- the__alchemist 2 hours ago
  GPOS software: Static-linked executables
  Programming languages: Cargo
iberator 3 hours ago
apt-get solved this 'problem' like 25 years ago.
[-]
- EvanAnderson 3 hours ago
  RPM "solved" it too.
  I hate package management so much. I hate installing unnecessary cruft to get a box with what I want on it.
  It makes me pine for tarballs built on boxes w/ compilers installed and deployed directly onto the filesystem of the target machines.
  Edit: I'd love to see package management abstracted to a set of interfaces so I could use my OS package manager for all of the bespoke package management that every programming language seems hell-bent on re-implementing.
  [-]
  - dzr0001 2 hours ago
    I think there's a fundamental difference between programming language repos and package repositories like the official RPM, deb, and ports trees.
    These (typically) operating system repos have oversight and are tested to work within a set of versions. Repositories with public contribution and publishing don't have any compatibility guarantees, so the cruft described in the article must be kept indefinitely.
    Unfortunately, I don't think abstracting those repositories to work within the OS package ecosystem would solve that problem and I suspect the package manager SAT solvers would have a hard time calculating dependencies.
    [-]
    - EvanAnderson 1 hour ago
      I agree re: the fundamental difference when it comes to compiled languages. I wrote rashly and out of frustration without thinking about it too deeply.
      re: interpreted languages, though, I think it's still a shit show. I don't want to run "composer" or "npm" or whatever the Ruby and Python equivalents are on my production environment. I just want packages analogous to binaries that I can cleanly deploy / remove with OS package management functionality.
- Am4TIfIsER0ppos 3 hours ago
  Isn't it `apt` these days?
  [-]
  - droopyEyelids 2 hours ago
    Your parent comment is referring to its inception, 25 years ago.