Do you have staged rollouts and rollbacks for dotfiles changes? Also do you support publishing Prometheus metrics and health probes for dotfiles? I have been looking for an enterprise ready dotfiles setup.
I've used variations for the last ~10 years for a number of orgs. Some for 500+ million dollar companies with enterprise protocols and other smaller shops. Most were relatively small dev teams (< 30) where everyone upgraded at their own discretion. We never had an incident or a need to roll back.
The apps developers were working on were running in Docker so all dependencies and things were handled in those projects, not the dotfiles. From a dotfiles perspective, we're talking about installing various packages and modifying either system or home dir config files, it wasn't complete device management. Device management was always handled by teams outside of our engineering team's control.
Keep in mind, it was a mixture of macOS and Windows with WSL 2. My dotfiles approach worked well, but I didn't use them directly since the companies I did work for didn't want to directly depend on my open source work but I used the same design principles and patterns.
No one used Arch in WSL 2 but for my own stuff if I need to lock a package I just use Mise instead of Arch's repo for that package. For example, this lets me have 3 different versions of Ansible available for different client work, same goes for terraform or kubectl, etc..
At one org, pre-Mise, I just rolled a tiny curl based solution that downloaded a release directly from GitHub, and we locked versions to what we wanted so we controlled upgrade cadence since it was important to keep a few CLI tools in sync.
I always tried to pick OS agnostic approaches so it all works on macOS and WSL 2 / native Linux (including CI). Whenever I rolled out these solutions for companies, it was always a thing to do on the side where I allocated maybe a week to come up with the solution, it wasn't my full time role to work on it. Just develop it and own the project for keeping it in a workable state or making ad hoc adjustments as needed. It never got to the point where things like Prometheus or health check metrics were thought about.
I've worked at places that are SOC 2 Type 2 compliant with similar ways of installing tools on dev boxes. I would say yes but like anything SOC 2 related, "it depends". The compliance requirement is on the org being compliant.
Sure, no problem. If you have any questions or issues let me know.
> I didn't know I needed this until now.
Haha yeah I know the feeling. My main workstation is still a desktop computer I built in 2014, I do all of my dev work from it.
Around 8 years ago I thought to myself if I ever upgrade my hardware, it can't be a painful experience to set everything up again so I started the dotfiles project. That evolved into its current state.
I've always used rsync to back up my user files but I open sourced https://github.com/nickjj/bmsu recently which is based on a script from 2018 to make it more robust. Long story short, this fully handles offline backups and restores (and a side topic of syncing files between my desktop, laptop and phone). All is does it help you directly call rsync.
Between that and the dotfiles project, if my computer blew up tomorrow I'd be really upset for having to spend a lot of money on new parts but I could get everything up and running really quickly with zero dependence on cloud storage for any data.
I've written over ~10k lines of Ansible playbooks and roles to fully automate setting up servers to deploy Docker based web apps, so I do like the concept of declaring the state of a system in configuration and then having that become a reality. I know NixOS is not directly comparable to Ansible but in general I think IaC is a good idea.
It was important to me that my dotfiles work on a number of systems so I avoided NixOS. For example, the command line version works on Arch, Debian and Ubuntu based distros along with WSL 2 support and macOS too. The desktop version works on Arch and Arch based distros.
Beyond that, I also use my dotfiles on 2 different Linux systems so I wanted a way to differentiate certain configs for certain things. I also have a company issued laptop running macOS where I want everything to work, but it's a managed device so I can't go hog wild with full system level management.
Beyond that, since I make video courses I wanted to make it easy for anyone to replicate my set up if they wanted but also make it super easy for them to personalize any part of the set up without forking my repo (but they can still fork it if they want).
All of the above was achievable with shell scripts and symlinks. I might be wrong since I didn't research it in depth but I'm not sure NixOS can handle all of the above use cases in an easy to configure manner.
To have your Nix-based setup reproducible across different OS (Arch, Debian, Ubuntu, WSL2, MacOS, and NixOS), and have an extensible base config that can be customized to different situations, the go-to framework is home-manager (not NixOS, which only works on NixOS, or NixOS on WSL 2).
Nix offers a trade-off: near-perfect reproducibility in exchange for longer builds. Sometimes it's nice to just build a new .so for some library and let the rest of your binaries link to it without recompiling everything.
I'm not convinced about building whole systems around it. I can't remember the last time I ran into a reproducibility issue in practice, but I upgrade my system packages every day and that's definitely faster without Nix.
Migrated from archlinux to nixos. I don't think I can use anything else now...
I have a CI at home that builds my nixos config on a weekly basis with the latest flake. The artifacts are pushed to atticd. With this setup, when I actually need to update my machines, its almost instantaneous.
Care to share some scripts on how you do it? I'm in similar position, maintaining multiple desktops, laptops, servers, but i do not know how to share the build artifacts.
I have never been more stress-free than when I was running nixos as a daily driver. Had to return to macos as primary unfortunately but still use nix as much as possible.
reproducible images are one of those features where the payoff is mostly emotional until the day it isn't. we had an incident where two supposedly identical images on two machines had a three byte delta in a timestamp and it cost us an afternoon to bisect from the wrong end. boring win, but a real one.
Gill probaby already knows this but for the uninitiated: something logged in, did a thing to potentially every container, and then deleted any sign of it doing the thing.
all that's left is a single timestamp of a log or something getting deleted
You are screwed either way. If you don't update your container has a ton of known security issues, if you do the container is not reproducable. reproducable is neat with some useful security benefits, but it is something a non goal if the container is more than a month old - day might even be a better max age.
So if i have a docker container which needs a handful of packages, you would handle it how?
I'm handling it by using a slim debian or ubuntu, then using apt to install these packages with necessary dependencies.
For everything easy, like one basic binary, I use the most minimal image but as soon as it gets just a little bit annoying to set it up and keep it maintained, i start using apt and a nightly build of the image.
IMO—package manager outside the container. You just want the packages inside the container; the manager can sit outside and install packages into the container.
For the package management, it depends on the package manager, but most have some mechanism for installing into a root other than the currently running system.
Even without explicit support in the pacakage manager, you could also roll your own solution by running the package manager in a chroot environment, which would then need to be seeded with the package manager's own dependencies, of course (and use user-mode qemu to run pre- and post-installation scripts within the chroot in the case of cross-architecture builds).
Whether this yields a minimal container when pointed at a repository intended to be used to deploy a full OS is another question, but using a package manager to build a root filesystem offline isn't hard to pull off.
As for how to do this in the context of building an OCI container, tools like Buildah[1] exist to support container workflows beyond the conventional Dockerfile approach, providing straightforward command line tools to create containers, work with layers, mount and unmount container filesystems, etc.
There have got to be a million ways to do this by now. Some of the more principled approaches are tools like Nix (https://xeiaso.net/talks/2024/nix-docker-build/) and Bazel (https://github.com/bazel-contrib/rules_oci). But if you want to use an existing package manager like apt, you can pick it apart. Apt calls dpkg, and dpkg extracts files and runs post-install scripts. Only the post-install script needs to run inside the container.
I may be a little out of touch here, because the last time I did this, we used a wholly custom package manager.
The same way you may require something like cmake as a build dependency but not have it be part of the resulting binary - separate build time and run time dependencies so you only distribute the relevant ones.
Your question feels insane to me for production environments. Why aren't you doing a version cutoff of your packages and either pulling them from some network/local cache or baking them into your images?
I don't just run a java spring boot application. I run other things on my production system.
It doesn't matter much were i pull them from though, i only do this with packages which have plenty of dependencies and i don't want to assemble my own minimal image.
Friend, considering the supply chain attacks going on these days, automatically updating everything, immediately, probably isn't the perfect move either.
That local cache is often implemented as a drop-in replacement for the upstream package repository, and packages are still installed with the same package manager (yum,apt,pip,npm).
Minimal might or might not pe your goal. A large container sometimes is correct - at that point you have to ask if maybe using a container twice so you only need to download it once and then installing the one missing part makes more sense.
If you are on github/gitlab, renovate bot is a good option for automating dependency updates via PRs while still maintaining pinned versions in your source.
I know it's an anti-pattern, but what is the alternative if you need to install some software? Pulling its tagged source code, gcc and compile everything?
> the old snapshot has security holes attackers know how to exploit.
So is running `docker build` and the `RUN apt update` line doing a cache hit, except the latter is silent.
The problem solved by pinning to the snapshot is not to magically be secure, it's knowing what a given image is made of so you can trivially assert which ones are safe and which ones aren't.
In both cases you have to rebuild an image anyway so updating the snapshot is just a step that makes it explicit in code instead of implicit.
where does the apt update connect to? If it is an up to date package repo you get fixes. Howerer there are lots of reasons it would not. You better know if this is your plan.
You get fixes that were current at docker build time, but I think GP is referring to fixes that appear in the apt repo after your docker container is deployed.
If you've pulled in a dependency from outside the base image, there will be no new base image version to alert you to an update of that external dependency. Unless your container regularly runs something like apt update && apt list --upgradable, you will be unaware of security fixes newly available from apt.
Run “nix flake update”. Commit the lockfile. Build a docker image from that; the software you need is almost certainly there, and there’s a handy docker helper.
Recently I’ve been noticing that Nix software has been falling behind. So “the software you need is almost certainly there” is less true these days. Recently = April 2026.
That's been an issue for years from my impression of the state of NixOS. There are other problems too, like a lot of open source packages doing straight binary downloads instead of actually building the software.
Are you referring to how the nixpkgs-unstable branch hasn't been updated in the past five days? Or do you have some specific software in mind? (not arguing, just curious)
It’s a variety of different software that just isn’t updated very often.
I don’t mind being somewhat behind, but it seems like there are a lot of packages that don’t get regular updates. It’s okay to have packages that aren’t updated, but those packages should be clearly distinguishable.
I don't really see how that's different from a normal binary install of a reproducible package. Especially with the lacking quality of a lot of Nix packages.
The problem is distros often remove older versions from the repo as soon as the new version is available. Granted there is an archive that you can pull from.
I disagree with that as a hard rule and with the opinion that it's an anti-pattern. Reproducible containers are fine, but not always necessary. There's enough times when I do want to run apt-get in a container and don't care about reproducibility.
This is to solve such issues that I am using and running StableBuild.
It is a managed service that keeps a cached copy of your dependencies at a specific time.
You can pin your dependencies within a Dockerfile and have reproducible docker images.
I wonder if well designed "mutable" operating systems like Arch and Alpine that are going to beat NixOS etc. in the long run. An install script is strictly more powerful that a declarative config language, and typically less verbose.
I've been long fascinated by the rolling release model. But aren't you guys worried about supply chain attacks? Seems those on the bleeding edge serve as canaries in the coalmine for the rest of us.
That's the purpose of reproducible build initiatives like TFA. The idea is to ensure that identical source produces bit-for-bit identical builds on multiple machines when the packages are built.
Sure, if the source itself gets got, then it does nothing. But it at least puts up one more barrier against tampering with the artifacts.
A totally unrelated comment; but — there is an animation on that page that moves practically everything on the page about 20 pixels down over the course of 1 second.
I thought that would completely trash the Cumulative Layout Shift core web vital. Because, hey! the layout is shifting in front of my very eyes. But no, the CLS on the page is 0.
It's happening as a result of a deliberate animation. The CLS metric relates to initial render. So yes, there is layout shift, but it's not CLS per se.
It's just that the spirit of Google's core web vitals has been to measure the properties of a web page that have the most impact on users. How quickly content appears on a page, how visually stable the content is, and how long it takes the page to respond to an interaction.
In the case of this page, I don't think it can be considered visually stable at all in the first second after it's loaded.
This is a really interesting accomplishment - I am also working heavily on reproducible builds for my firmware projects, and .. lo and behold .. the package manager key administrivia is the final bone to be broken.
I wonder if Arch leading the way on this will prompt other distro's to attempt the same feat. Reproducible builds are important for certification, security and safety-critical applications .. it'd be great to see Linux distros become more conformant to this method.
This is a huge accomplishment! But it wouldn't be so huge if compilers were trivially deterministic. It took 5 decades of development for compilers to get here. I'm sure ChatGPT in 2073 is going to be more deterministic than it was in 2023.
I ran Arch Linux for almost a year in WSL 2, it was really good.
Then I ran Arch natively for ~5 months, it's really good.
Now I still run Arch natively, but I also use the Arch Docker image to test my dotfiles[0] with a fresh file system.
Also, for when I want to run end to end tests for my dotfiles that set up a complete desktop environment I run Arch in a VM.
I have 99 problems but running Arch isn't one of them.
[0]: https://github.com/nickjj/dotfiles
The apps developers were working on were running in Docker so all dependencies and things were handled in those projects, not the dotfiles. From a dotfiles perspective, we're talking about installing various packages and modifying either system or home dir config files, it wasn't complete device management. Device management was always handled by teams outside of our engineering team's control.
Keep in mind, it was a mixture of macOS and Windows with WSL 2. My dotfiles approach worked well, but I didn't use them directly since the companies I did work for didn't want to directly depend on my open source work but I used the same design principles and patterns.
No one used Arch in WSL 2 but for my own stuff if I need to lock a package I just use Mise instead of Arch's repo for that package. For example, this lets me have 3 different versions of Ansible available for different client work, same goes for terraform or kubectl, etc..
At one org, pre-Mise, I just rolled a tiny curl based solution that downloaded a release directly from GitHub, and we locked versions to what we wanted so we controlled upgrade cadence since it was important to keep a few CLI tools in sync.
I always tried to pick OS agnostic approaches so it all works on macOS and WSL 2 / native Linux (including CI). Whenever I rolled out these solutions for companies, it was always a thing to do on the side where I allocated maybe a week to come up with the solution, it wasn't my full time role to work on it. Just develop it and own the project for keeping it in a workable state or making ad hoc adjustments as needed. It never got to the point where things like Prometheus or health check metrics were thought about.
> I didn't know I needed this until now.
Haha yeah I know the feeling. My main workstation is still a desktop computer I built in 2014, I do all of my dev work from it.
Around 8 years ago I thought to myself if I ever upgrade my hardware, it can't be a painful experience to set everything up again so I started the dotfiles project. That evolved into its current state.
I've always used rsync to back up my user files but I open sourced https://github.com/nickjj/bmsu recently which is based on a script from 2018 to make it more robust. Long story short, this fully handles offline backups and restores (and a side topic of syncing files between my desktop, laptop and phone). All is does it help you directly call rsync.
Between that and the dotfiles project, if my computer blew up tomorrow I'd be really upset for having to spend a lot of money on new parts but I could get everything up and running really quickly with zero dependence on cloud storage for any data.
I've written over ~10k lines of Ansible playbooks and roles to fully automate setting up servers to deploy Docker based web apps, so I do like the concept of declaring the state of a system in configuration and then having that become a reality. I know NixOS is not directly comparable to Ansible but in general I think IaC is a good idea.
It was important to me that my dotfiles work on a number of systems so I avoided NixOS. For example, the command line version works on Arch, Debian and Ubuntu based distros along with WSL 2 support and macOS too. The desktop version works on Arch and Arch based distros.
Beyond that, I also use my dotfiles on 2 different Linux systems so I wanted a way to differentiate certain configs for certain things. I also have a company issued laptop running macOS where I want everything to work, but it's a managed device so I can't go hog wild with full system level management.
Beyond that, since I make video courses I wanted to make it easy for anyone to replicate my set up if they wanted but also make it super easy for them to personalize any part of the set up without forking my repo (but they can still fork it if they want).
All of the above was achievable with shell scripts and symlinks. I might be wrong since I didn't research it in depth but I'm not sure NixOS can handle all of the above use cases in an easy to configure manner.
https://github.com/nix-community/home-manager
I'm not convinced about building whole systems around it. I can't remember the last time I ran into a reproducibility issue in practice, but I upgrade my system packages every day and that's definitely faster without Nix.
I have a CI at home that builds my nixos config on a weekly basis with the latest flake. The artifacts are pushed to atticd. With this setup, when I actually need to update my machines, its almost instantaneous.
edit: Using nixos ofc, otherwise I would never do this.
all that's left is a single timestamp of a log or something getting deleted
Build your container/vm image elsewhere and deploy updates as entirely new images or snapshots or whatever you want.
Personally I prefer buildroot and consider VM as another target for embedded o/s images.
I'm handling it by using a slim debian or ubuntu, then using apt to install these packages with necessary dependencies.
For everything easy, like one basic binary, I use the most minimal image but as soon as it gets just a little bit annoying to set it up and keep it maintained, i start using apt and a nightly build of the image.
Even without explicit support in the pacakage manager, you could also roll your own solution by running the package manager in a chroot environment, which would then need to be seeded with the package manager's own dependencies, of course (and use user-mode qemu to run pre- and post-installation scripts within the chroot in the case of cross-architecture builds).
Whether this yields a minimal container when pointed at a repository intended to be used to deploy a full OS is another question, but using a package manager to build a root filesystem offline isn't hard to pull off.
As for how to do this in the context of building an OCI container, tools like Buildah[1] exist to support container workflows beyond the conventional Dockerfile approach, providing straightforward command line tools to create containers, work with layers, mount and unmount container filesystems, etc.
[1] https://github.com/containers/buildah/blob/main/README.md
I may be a little out of touch here, because the last time I did this, we used a wholly custom package manager.
Most Makefiles allow you to specify an alternate DESTDIR on install.
For example i run a gcs fuse driver, it has other dependencies apt 'just' resolves.
It doesn't matter much were i pull them from though, i only do this with packages which have plenty of dependencies and i don't want to assemble my own minimal image.
FROM ubuntu:24.04
COPY --from=ghcr.io/owner/image:latest /usr/local/bin/somebinary /usr/local/bin/somebinary
CMD ["somebinary"]
Not as simple when you need shared dependencies
So is running `docker build` and the `RUN apt update` line doing a cache hit, except the latter is silent.
The problem solved by pinning to the snapshot is not to magically be secure, it's knowing what a given image is made of so you can trivially assert which ones are safe and which ones aren't.
In both cases you have to rebuild an image anyway so updating the snapshot is just a step that makes it explicit in code instead of implicit.
If you've pulled in a dependency from outside the base image, there will be no new base image version to alert you to an update of that external dependency. Unless your container regularly runs something like apt update && apt list --upgradable, you will be unaware of security fixes newly available from apt.
Also I'm tired of doing these hacks:
Pinning to a snapshot just makes so many things easier.I don’t mind being somewhat behind, but it seems like there are a lot of packages that don’t get regular updates. It’s okay to have packages that aren’t updated, but those packages should be clearly distinguishable.
software component image
both should be version pinned for auditing
Reproducible can sometimes be a goal, but repeatable is always important.
I do think for this case specifically (base images for a specific distro), they should be reproducible.
It is a managed service that keeps a cached copy of your dependencies at a specific time. You can pin your dependencies within a Dockerfile and have reproducible docker images.
NIX FIXES THIS.
(and, also presumably, that you do Crossfit, etc.)
You meet a vegan crossfitter that uses Arch, what does it tell you about first?
https://reproducible-builds.org/
Closely related is the Boostrappable Builds community:
https://bootstrappable.org/
Sure, if the source itself gets got, then it does nothing. But it at least puts up one more barrier against tampering with the artifacts.
They have a tracker for what percent of the distro is reproducible: https://reproducible.archlinux.org/
I thought that would completely trash the Cumulative Layout Shift core web vital. Because, hey! the layout is shifting in front of my very eyes. But no, the CLS on the page is 0.
Is CLS a misleading metric then?
The CLS measures the total sum of layout shifts over the entire lifespan of a page, not just during initial render.
And it's not unexpected, because it comes from a css transition.
It's just that the spirit of Google's core web vitals has been to measure the properties of a web page that have the most impact on users. How quickly content appears on a page, how visually stable the content is, and how long it takes the page to respond to an interaction.
In the case of this page, I don't think it can be considered visually stable at all in the first second after it's loaded.
And yet, core web vitals cannot demonstrate this.
I wonder if Arch leading the way on this will prompt other distro's to attempt the same feat. Reproducible builds are important for certification, security and safety-critical applications .. it'd be great to see Linux distros become more conformant to this method.
https://reproducible-builds.org/
This is a huge accomplishment! But it wouldn't be so huge if compilers were trivially deterministic. It took 5 decades of development for compilers to get here. I'm sure ChatGPT in 2073 is going to be more deterministic than it was in 2023.