I confess "not invented here" is a problem I think too many people focus on. Lots of things are redone all of the time.
That said, feature creep is absolutely a killer. And it is easy to see how these will stack on each other where people will insist that for this project, they need to try and reinvent the state of the art in solvers to get a product out the door.
It's not to be taken as a serious assessment of actual "hardest problems", but they're all difficult. Naming things is obviously impossible. Everyone gets cache invalidation wrong at first, from Intel/AMD to your build system.
> Try to say that at job interview if you don't believe
If your interviewer doesn't at least crack a smile when you make the off-by-one joke, run, do not walk, to the nearest exit. You don't want to work with that dude
Naming things is one of the hardest problems we have. In general. Taxonomy is incredibly difficult because it is essentially classification.
And things never fit neatly into boxes. Giving us such bangers as: Tomatoes are fruit; Everything is a fish or nothing is a fish; and Trees aren't real.
1. It's a joke. The hyperbole is intentional, but it does communicate something relatable.
2. You don't need a citation. Probably anyone with enough software development experience understands the substance of the claim and understands that it is (1).
Andrew has been writing a ton of interesting blog posts related to package management (https://nesbitt.io/posts/). He's had some great ideas, like testing package managers similar to database Jepsen testing.
Not to take credit away from Andrew for his ideas and writing, because at least he came up with the idea and wrote about it, but I don't understand how that idea of Jepsen style testing of package managers is a novel idea. Like... what testing would you want to do if you were building a package manager?
All three of those are "system package managers" (if you count winget as a package manager at all, which I would not). Pacman and APT are binary package managers while Homebrew is a source-based package manager. Cargo and NPM are language-specific package managers, which is a name I've settled on but don't love.
Imo there's an identifiable core common to all of these kinds of package managers, and it's not terribly hard to work out a reasonably good hierarchical ontology. I think OP's greater insight in this section is that internally, every package manager has its own ontology with its own semantics and lexicon:
> Even within a single ecosystem, the naming is contested: is the unit a package, a module, a crate, a distribution? These aren’t synonyms. They encode different assumptions about what gets versioned, what gets published, and what gets installed.
The confusing part is that in many cases, end users are using NPM, pip, Go packaging, and to a lesser extent cargo etc to install finished end-user software. I've never written a line of JS but have installed all kinds of command line utilities with npm/npx.
Normally with an system package manager you would have a -lib package for using in your own code (or simply required by another package), a -src, and then a package without these suffixes would be some kind of executable binary.
But with npm and pip, I'm never sure whether a package installs binaries or not, and if it does, is it also usable as a library for other code or is it compiled? (Homebrew as you mentioned is source based but typically uses precompiled "bottles" in most cases, I think?) And then there is some stuff that's installed with npm but is not even javascript like font packages for webdev.
The other interesting thing about these language package managers is that they complete eliminate the role of the distribution in packaging a lot of end user software. Which ironically, in the oldest days you would download a source tarball and compile it yourself. So I guess its just a return to that approach but with go or cargo replacing wget and make.
> Imo there's an identifiable core common to all of these kinds of package managers (..)
Indeed. It's hard to see why eg. a prog language would need its own package management system.
Separate mechanics from policy. Different groups of software components in a system could have different policies for when to update what, what repositories are allowed etc. But then use the same (1, the system's) package manager to do the work.
NPM is plenty opinionated. For all its mistakes, it got lots of things uniquely right too. For example it’s very uncommon in JS land to have version conflicts (“dependency hell”). If two deps both need SuperFoo but different versions, NPM just installs both and things Generally Just Work. Exceptions are gross libraries with lots of global state (such as React) but fortunately those are very uncommon in JS land.
People love to complain about node_modules being a black hole but that size bought JS land an advantage that’s not very common among popular languages.
Yeah, npm never has "version lock" where it can't figure out a valid solution to the version constraints.
This is mostly good, but version lock does encourage packages to accept wide ranges of dependencies, and to update their dependency ranges frequently, instead of just sitting there on old versions.
Cargo doesn't work. I'm trying to use it in a monorepo and its cacheing story is horrible. The devs refused when I proposed to switch it to Bazel years ago and now they're regretting it.
Certainly Go is a more rigorous language than say JavaScript but it’s package mangement was abysmal for years. It’s not even all the great now.
C/C++ is the same deal. The way it handles anything resembling packages is quite dated (though I think Conan has attempted to solve at least some of this)
I think Cargo and others have the hindsight of their peers, rather than it being due to any rigorous attribution of the language
Concur: C and C++ are a great example of being both used for rigorous uses, but building/packaging being a mess. And I think the big adv Cargo/Rust has is learning from past mistakes, and taking good ideas that have come up; discarding bad.
Tacking package management onto a language is feature creep to begin with. You can pretty obviously have a program in one language that uses a library or other dependency written in a different one.
The real problem is that system package managers need to be made easier to use and have better documentation, so that everyone stops trying to reinvent the wheel.
Yes, we can even see—the languages with the best culture and superior rigor have the best package manager: C and Fortran, which just use the filesystem and the user to manage their packages.
This all just sounds like problems we see when making new features, of any sort, for customers. A feature is never objectively done, there are many opinions on its goodness or badness, once it’s released its mistakes can last with it, etc.
If this is a wicked problem, then so is much of other real-world engineering.
Honestly just look at the dismal history of Python and package management. easy_install, setuptools, pip(x), conda, poetry, uv. Hell I might even be missing one.
UV (And a similar tool I built earlier) does solve it. With the important note: This was made feasible due to standardizing on pyproject toml, and wheel files. And being able to compile a diff wheel for each OS/Arch combo, and have the correct one download and installed automatically. And in the case of linux, the manylinux target. I think the old python libs that did arbitrary things in setup.py was a lost cause.
All this, and yet package management is still so much better than managing software any other way, and there are continually real advancements both in foundations and in UX. It is indeed full of wicked problems in a way that suggests there can be no clear "endgame". But it's also a space where the tools and improvements to them regularly make huge positive differences in people's computing experiences.
The uneven terrain also makes package managers more interesting to compare to each other than many other kinds of software, imo.
I dont really agree. Package management has a number of pretty well defined patterns (e.g. lockfiles, isolation, semver, transactionality, etc) which solve common use cases that are largely common across package management.
It is unfortunately one of the most thankless tasks in software engineering, so these are not applied consistently.
This was symbolized quite nicely by google pushing out a steaming turd of a version 1 golang package management putting while simultaneously putting the creator of brew in the no hire pile coz he couldnt reverse a binary tree.
In this respect it is a bit like QA - neglected because it is disrespected.
What makes it seem like a wicked problem is probably that it is the tip of the software iceberg.
It is the front line for every security issue and/or bug, especially the nastiest class of bug - "no man's land" bugs where package A blames B for using it incorrectly and vice versa.
Every package manager lock file format or requirements file is an inferior, ad hoc, formally-specified, error-prone, incompatible reimplementation of half of Git.
Supply chain vulnerabilities are a choice. It's a problem you have to opt in to.
There is actually a huge difference between checking in all of your dependencies and checking in a lock-file. Some people work with hundreds of repositories on their local machine and checking in dependencies would lead to massive bloat. It really only works if you primarily work in a single monorepo.
> It really only works if you primarily work in a single monorepo.
That's simply not true; it doesn't come down to "monorepo-or-not?"
It comes down to whether or not the code size of an app's dependencies and transitive dependencies are still reasonable or have gotten out of control.
The trend of language package managers to store stuff out of repo (and their recent, reluctant adoption of lockfiles to mitigate the obvious problems this causes) is and always has been designed to paper over the dependency-size-is-out-of-control problem. That's _the_ reason that this package management strategy exists.
You can work on dozens of projects (unrelated; from disjoint domains) that you maintain or contribute to while having all the source for the libraries/routines needed to be able to build an app all right there checked into source control—but it means actually having a handle on things instead of just throwing caution to the wind and sucking down a hundred megabytes or more of simultaneously over- and under-engineered third-party dependencies right before build time.
It's no different from, "Our app consumes way too much RAM", or, "We don't have a way to build the app aside from installing a monstrously large IDE". (Both in the category of, "We could do something about it if we cared to, but we don't.")
> There is actually a huge difference between checking in all of your dependencies and checking in a lock-file.
Yes, huge difference indeed: the hugeness of YOLO maintainers' dependency trees.
Assuming the binary tree thing is the whole story, that still doesn't sound like a terrible choice on Google's part. Your first few years at Google you won't have enough leeway to do something like "make homebrew," and you will have to interact with an arcane codebase.
For tree reversal in particular, it shouldn't be any harder than:
1. If you don't know what a binary tree is then ask the interviewer (you probably _ought_ to know that Google asks you questions about those since their interview packet tells you as much, but let's assume you wanted to wing it instead).
2. Spend 5-10min exploring what that means with some small trees.
3. Then start somewhere and ask what needs to change. Clearly the bigger data needs to go left, and the smaller data needs to go right (using an ascending tree as whatever small example you're working on).
4. Examine what's left, and see what's out of order. Oh, interesting, I again need to swap left and right on this node. And this one. And this one.
5. Wait, does that actually work? Do I just swap left/right at every node? <5-10min of frantically trying to prove that to yourself in an interview>
6. Throw together the 1-5 lines of code implementing the algorithm.
It's a fizzbuzz problem, not a LeetCode Hard. Even with significant evidence to the contrary, I'd be skeptical of their potential next 1-3 years of SWE performance with just that interview to go off of.
That said, do they actually know that was the issue? With 4+ interviews I wouldn't ordinarily reject somebody just because of one algorithms brain-fart. As the interviewer I'd pivot to another question to try to get evidence of positive abilities, and as the hiring manager I'd consider strong evidence of positive abilities from other interviews much more highly than this one lack of evidence. My understanding is that Google (at least from their published performance research) behaves similarly.
I hate package management so much. I hate installing unnecessary cruft to get a box with what I want on it.
It makes me pine for tarballs built on boxes w/ compilers installed and deployed directly onto the filesystem of the target machines.
Edit: I'd love to see package management abstracted to a set of interfaces so I could use my OS package manager for all of the bespoke package management that every programming language seems hell-bent on re-implementing.
I think there's a fundamental difference between programming language repos and package repositories like the official RPM, deb, and ports trees.
These (typically) operating system repos have oversight and are tested to work within a set of versions. Repositories with public contribution and publishing don't have any compatibility guarantees, so the cruft described in the article must be kept indefinitely.
Unfortunately, I don't think abstracting those repositories to work within the OS package ecosystem would solve that problem and I suspect the package manager SAT solvers would have a hard time calculating dependencies.
I agree re: the fundamental difference when it comes to compiled languages. I wrote rashly and out of frustration without thinking about it too deeply.
re: interpreted languages, though, I think it's still a shit show. I don't want to run "composer" or "npm" or whatever the Ruby and Python equivalents are on my production environment. I just want packages analogous to binaries that I can cleanly deploy / remove with OS package management functionality.
That said, feature creep is absolutely a killer. And it is easy to see how these will stack on each other where people will insist that for this project, they need to try and reinvent the state of the art in solvers to get a product out the door.
Try to say that at job interview if you don't believe
>There are only two hard things in Computer Science: cache invalidation, naming things, and off-by-one errors.
And, if you actually work in software a very large portion of your hard to troubleshoot/fix issues are going to be the above.
It can't be DNS
There's no chance in hell it's DNS
...
It was DNS
There are only two problems in computer science. We only have one joke, and it's not very funny.
If your interviewer doesn't at least crack a smile when you make the off-by-one joke, run, do not walk, to the nearest exit. You don't want to work with that dude
And things never fit neatly into boxes. Giving us such bangers as: Tomatoes are fruit; Everything is a fish or nothing is a fish; and Trees aren't real.
1. It's a joke. The hyperbole is intentional, but it does communicate something relatable.
2. You don't need a citation. Probably anyone with enough software development experience understands the substance of the claim and understands that it is (1).
> “Sarcasm is difficult to grasp on the internet, but some people apparently have more visceral reactions to their misunderstanding than others.”
Martin Fowler has some history of this joke: https://martinfowler.com/bliki/TwoHardThings.html
Imo there's an identifiable core common to all of these kinds of package managers, and it's not terribly hard to work out a reasonably good hierarchical ontology. I think OP's greater insight in this section is that internally, every package manager has its own ontology with its own semantics and lexicon:
> Even within a single ecosystem, the naming is contested: is the unit a package, a module, a crate, a distribution? These aren’t synonyms. They encode different assumptions about what gets versioned, what gets published, and what gets installed.
Normally with an system package manager you would have a -lib package for using in your own code (or simply required by another package), a -src, and then a package without these suffixes would be some kind of executable binary.
But with npm and pip, I'm never sure whether a package installs binaries or not, and if it does, is it also usable as a library for other code or is it compiled? (Homebrew as you mentioned is source based but typically uses precompiled "bottles" in most cases, I think?) And then there is some stuff that's installed with npm but is not even javascript like font packages for webdev.
The other interesting thing about these language package managers is that they complete eliminate the role of the distribution in packaging a lot of end user software. Which ironically, in the oldest days you would download a source tarball and compile it yourself. So I guess its just a return to that approach but with go or cargo replacing wget and make.
Indeed. It's hard to see why eg. a prog language would need its own package management system.
Separate mechanics from policy. Different groups of software components in a system could have different policies for when to update what, what repositories are allowed etc. But then use the same (1, the system's) package manager to do the work.
"but bun!" — faster shovel, same hole
People love to complain about node_modules being a black hole but that size bought JS land an advantage that’s not very common among popular languages.
This is mostly good, but version lock does encourage packages to accept wide ranges of dependencies, and to update their dependency ranges frequently, instead of just sitting there on old versions.
This is not a technical problem. It’s a cultural one.
Certainly Go is a more rigorous language than say JavaScript but it’s package mangement was abysmal for years. It’s not even all the great now.
C/C++ is the same deal. The way it handles anything resembling packages is quite dated (though I think Conan has attempted to solve at least some of this)
I think Cargo and others have the hindsight of their peers, rather than it being due to any rigorous attribution of the language
The real problem is that system package managers need to be made easier to use and have better documentation, so that everyone stops trying to reinvent the wheel.
If this is a wicked problem, then so is much of other real-world engineering.
The uneven terrain also makes package managers more interesting to compare to each other than many other kinds of software, imo.
Version hell is a thing. But Nix's solution is to trade storage space for solving the version problem.
And I think its probably the right way to go.
I found Eelco Dolstra‘a doctoral thesis (https://edolstra.github.io/pubs/phd-thesis.pdf) to be a great read and it certainly doesn’t paint the picture of a wicked problem.
It is unfortunately one of the most thankless tasks in software engineering, so these are not applied consistently.
This was symbolized quite nicely by google pushing out a steaming turd of a version 1 golang package management putting while simultaneously putting the creator of brew in the no hire pile coz he couldnt reverse a binary tree.
In this respect it is a bit like QA - neglected because it is disrespected.
What makes it seem like a wicked problem is probably that it is the tip of the software iceberg.
It is the front line for every security issue and/or bug, especially the nastiest class of bug - "no man's land" bugs where package A blames B for using it incorrectly and vice versa.
Supply chain vulnerabilities are a choice. It's a problem you have to opt in to.
<https://news.ycombinator.com/item?id=46008744>
That's simply not true; it doesn't come down to "monorepo-or-not?"
It comes down to whether or not the code size of an app's dependencies and transitive dependencies are still reasonable or have gotten out of control.
The trend of language package managers to store stuff out of repo (and their recent, reluctant adoption of lockfiles to mitigate the obvious problems this causes) is and always has been designed to paper over the dependency-size-is-out-of-control problem. That's _the_ reason that this package management strategy exists.
You can work on dozens of projects (unrelated; from disjoint domains) that you maintain or contribute to while having all the source for the libraries/routines needed to be able to build an app all right there checked into source control—but it means actually having a handle on things instead of just throwing caution to the wind and sucking down a hundred megabytes or more of simultaneously over- and under-engineered third-party dependencies right before build time.
It's no different from, "Our app consumes way too much RAM", or, "We don't have a way to build the app aside from installing a monstrously large IDE". (Both in the category of, "We could do something about it if we cared to, but we don't.")
> There is actually a huge difference between checking in all of your dependencies and checking in a lock-file.
Yes, huge difference indeed: the hugeness of YOLO maintainers' dependency trees.
For tree reversal in particular, it shouldn't be any harder than:
1. If you don't know what a binary tree is then ask the interviewer (you probably _ought_ to know that Google asks you questions about those since their interview packet tells you as much, but let's assume you wanted to wing it instead).
2. Spend 5-10min exploring what that means with some small trees.
3. Then start somewhere and ask what needs to change. Clearly the bigger data needs to go left, and the smaller data needs to go right (using an ascending tree as whatever small example you're working on).
4. Examine what's left, and see what's out of order. Oh, interesting, I again need to swap left and right on this node. And this one. And this one.
5. Wait, does that actually work? Do I just swap left/right at every node? <5-10min of frantically trying to prove that to yourself in an interview>
6. Throw together the 1-5 lines of code implementing the algorithm.
It's a fizzbuzz problem, not a LeetCode Hard. Even with significant evidence to the contrary, I'd be skeptical of their potential next 1-3 years of SWE performance with just that interview to go off of.
That said, do they actually know that was the issue? With 4+ interviews I wouldn't ordinarily reject somebody just because of one algorithms brain-fart. As the interviewer I'd pivot to another question to try to get evidence of positive abilities, and as the hiring manager I'd consider strong evidence of positive abilities from other interviews much more highly than this one lack of evidence. My understanding is that Google (at least from their published performance research) behaves similarly.
Programming languages: Cargo
I hate package management so much. I hate installing unnecessary cruft to get a box with what I want on it.
It makes me pine for tarballs built on boxes w/ compilers installed and deployed directly onto the filesystem of the target machines.
Edit: I'd love to see package management abstracted to a set of interfaces so I could use my OS package manager for all of the bespoke package management that every programming language seems hell-bent on re-implementing.
These (typically) operating system repos have oversight and are tested to work within a set of versions. Repositories with public contribution and publishing don't have any compatibility guarantees, so the cruft described in the article must be kept indefinitely.
Unfortunately, I don't think abstracting those repositories to work within the OS package ecosystem would solve that problem and I suspect the package manager SAT solvers would have a hard time calculating dependencies.
re: interpreted languages, though, I think it's still a shit show. I don't want to run "composer" or "npm" or whatever the Ruby and Python equivalents are on my production environment. I just want packages analogous to binaries that I can cleanly deploy / remove with OS package management functionality.