Arch Linux Now Has a Bit-for-Bit Reproducible Docker Image

(antiz.fr)

296 points | by maxloh 22 hours ago

13 comments

nickjj 12 hours ago
It is nice to have this confidence.
I ran Arch Linux for almost a year in WSL 2, it was really good.
Then I ran Arch natively for ~5 months, it's really good.
Now I still run Arch natively, but I also use the Arch Docker image to test my dotfiles[0] with a fresh file system.
Also, for when I want to run end to end tests for my dotfiles that set up a complete desktop environment I run Arch in a VM.
I have 99 problems but running Arch isn't one of them.
[0]: https://github.com/nickjj/dotfiles
[-]
- eddythompson80 5 hours ago
  Do you have staged rollouts and rollbacks for dotfiles changes? Also do you support publishing Prometheus metrics and health probes for dotfiles? I have been looking for an enterprise ready dotfiles setup.
  [-]
  - nickjj 2 hours ago
    I've used variations for the last ~10 years for a number of orgs. Some for 500+ million dollar companies with enterprise protocols and other smaller shops. Most were relatively small dev teams (< 30) where everyone upgraded at their own discretion. We never had an incident or a need to roll back.
    The apps developers were working on were running in Docker so all dependencies and things were handled in those projects, not the dotfiles. From a dotfiles perspective, we're talking about installing various packages and modifying either system or home dir config files, it wasn't complete device management. Device management was always handled by teams outside of our engineering team's control.
    Keep in mind, it was a mixture of macOS and Windows with WSL 2. My dotfiles approach worked well, but I didn't use them directly since the companies I did work for didn't want to directly depend on my open source work but I used the same design principles and patterns.
    No one used Arch in WSL 2 but for my own stuff if I need to lock a package I just use Mise instead of Arch's repo for that package. For example, this lets me have 3 different versions of Ansible available for different client work, same goes for terraform or kubectl, etc..
    At one org, pre-Mise, I just rolled a tiny curl based solution that downloaded a release directly from GitHub, and we locked versions to what we wanted so we controlled upgrade cadence since it was important to keep a few CLI tools in sync.
    I always tried to pick OS agnostic approaches so it all works on macOS and WSL 2 / native Linux (including CI). Whenever I rolled out these solutions for companies, it was always a thing to do on the side where I allocated maybe a week to come up with the solution, it wasn't my full time role to work on it. Just develop it and own the project for keeping it in a workable state or making ad hoc adjustments as needed. It never got to the point where things like Prometheus or health check metrics were thought about.
    [-]
    - SahAssar 1 hour ago
      But are the dotfiles SOC-2 compliant?
      [-]
      - nickjj 20 minutes ago
        I've worked at places that are SOC 2 Type 2 compliant with similar ways of installing tools on dev boxes. I would say yes but like anything SOC 2 related, "it depends". The compliance requirement is on the org being compliant.
  - edvinbesic 3 hours ago
    Zero downtime dotfile rotation is still an unsolved problem AFAIK
- fhn 3 hours ago
  thanks for sharing and thanks for supporting other distros. I didn't know I needed this until now.
  [-]
  - nickjj 2 hours ago
    Sure, no problem. If you have any questions or issues let me know.
    > I didn't know I needed this until now.
    Haha yeah I know the feeling. My main workstation is still a desktop computer I built in 2014, I do all of my dev work from it.
    Around 8 years ago I thought to myself if I ever upgrade my hardware, it can't be a painful experience to set everything up again so I started the dotfiles project. That evolved into its current state.
    I've always used rsync to back up my user files but I open sourced https://github.com/nickjj/bmsu recently which is based on a script from 2018 to make it more robust. Long story short, this fully handles offline backups and restores (and a side topic of syncing files between my desktop, laptop and phone). All is does it help you directly call rsync.
    Between that and the dotfiles project, if my computer blew up tomorrow I'd be really upset for having to spend a lot of money on new parts but I could get everything up and running really quickly with zero dependence on cloud storage for any data.
- MuffinFlavored 11 hours ago
  Have you tried NixOS/flakes? What was your reaction?
  [-]
  - nickjj 10 hours ago
    I haven't tried it first hand.
    I've written over ~10k lines of Ansible playbooks and roles to fully automate setting up servers to deploy Docker based web apps, so I do like the concept of declaring the state of a system in configuration and then having that become a reality. I know NixOS is not directly comparable to Ansible but in general I think IaC is a good idea.
    It was important to me that my dotfiles work on a number of systems so I avoided NixOS. For example, the command line version works on Arch, Debian and Ubuntu based distros along with WSL 2 support and macOS too. The desktop version works on Arch and Arch based distros.
    Beyond that, I also use my dotfiles on 2 different Linux systems so I wanted a way to differentiate certain configs for certain things. I also have a company issued laptop running macOS where I want everything to work, but it's a managed device so I can't go hog wild with full system level management.
    Beyond that, since I make video courses I wanted to make it easy for anyone to replicate my set up if they wanted but also make it super easy for them to personalize any part of the set up without forking my repo (but they can still fork it if they want).
    All of the above was achievable with shell scripts and symlinks. I might be wrong since I didn't research it in depth but I'm not sure NixOS can handle all of the above use cases in an easy to configure manner.
    [-]
    - one-punch 9 hours ago
      To have your Nix-based setup reproducible across different OS (Arch, Debian, Ubuntu, WSL2, MacOS, and NixOS), and have an extensible base config that can be customized to different situations, the go-to framework is home-manager (not NixOS, which only works on NixOS, or NixOS on WSL 2).
      https://github.com/nix-community/home-manager
  - bloppe 8 hours ago
    Nix offers a trade-off: near-perfect reproducibility in exchange for longer builds. Sometimes it's nice to just build a new .so for some library and let the rest of your binaries link to it without recompiling everything.
    I'm not convinced about building whole systems around it. I can't remember the last time I ran into a reproducibility issue in practice, but I upgrade my system packages every day and that's definitely faster without Nix.
    [-]
    - snovv_crash 3 hours ago
      ABI stability exists for a reason.
  - thisisthenewme 8 hours ago
    Migrated from archlinux to nixos. I don't think I can use anything else now...
    I have a CI at home that builds my nixos config on a weekly basis with the latest flake. The artifacts are pushed to atticd. With this setup, when I actually need to update my machines, its almost instantaneous.
    [-]
    - stratosgear 5 hours ago
      Care to share some scripts on how you do it? I'm in similar position, maintaining multiple desktops, laptops, servers, but i do not know how to share the build artifacts.
  - srik 11 hours ago
    I have never been more stress-free than when I was running nixos as a daily driver. Had to return to macos as primary unfortunately but still use nix as much as possible.
    [-]
    - luz666 6 hours ago
      I am even so stress-free, that I once rebuilt my kernel (including simple patches) under the hood of my daily production/home pc.
      edit: Using nixos ofc, otherwise I would never do this.
kippinsula 16 hours ago
reproducible images are one of those features where the payoff is mostly emotional until the day it isn't. we had an incident where two supposedly identical images on two machines had a three byte delta in a timestamp and it cost us an afternoon to bisect from the wrong end. boring win, but a real one.
[-]
- loloquwowndueo 13 hours ago
  How did a differing timestamp cause an incident in the first place? Curious.
  [-]
  - bluGill 13 hours ago
    My guess is it was the only obvious evidence of an attack.
    [-]
    - red-iron-pine 10 hours ago
      Gill probaby already knows this but for the uninitiated: something logged in, did a thing to potentially every container, and then deleted any sign of it doing the thing.
      all that's left is a single timestamp of a log or something getting deleted
dev_l1x_be 16 hours ago
All docker containers should have been like that. apt-get update in a docker build step is an anti pattern.
[-]
- bluGill 13 hours ago
  You are screwed either way. If you don't update your container has a ton of known security issues, if you do the container is not reproducable. reproducable is neat with some useful security benefits, but it is something a non goal if the container is more than a month old - day might even be a better max age.
  [-]
  - tosti 11 hours ago
    Why is there a need for a package manager inside a container at all? Aren't they supposed to be minimal?
    Build your container/vm image elsewhere and deploy updates as entirely new images or snapshots or whatever you want.
    Personally I prefer buildroot and consider VM as another target for embedded o/s images.
    [-]
    - AntiUSAbah 10 hours ago
      So if i have a docker container which needs a handful of packages, you would handle it how?
      I'm handling it by using a slim debian or ubuntu, then using apt to install these packages with necessary dependencies.
      For everything easy, like one basic binary, I use the most minimal image but as soon as it gets just a little bit annoying to set it up and keep it maintained, i start using apt and a nightly build of the image.
      [-]
      - klodolph 8 hours ago
        IMO—package manager outside the container. You just want the packages inside the container; the manager can sit outside and install packages into the container.
        [-]
        AntiUSAbah 6 hours ago
        Yes, how?
        [-]
        Qerub 5 hours ago
        With Red Hat's UBI Micro:
        microcontainer=$(buildah from registry.access.redhat.com/ubi8/ubi-micro) micromount=$(buildah mount $microcontainer) yum install \ --installroot $micromount \ --releasever 8 \ --setopt install_weak_deps=false \ --nodocs -y \ httpd
        (from https://www.redhat.com/en/blog/introduction-ubi-micro published in 2021)
        [-]
        fhn 3 hours ago
        great. Now I have to install and learn another tool when having yum inside the container will just work?
        jasomill 5 hours ago
        For the package management, it depends on the package manager, but most have some mechanism for installing into a root other than the currently running system.
        Even without explicit support in the pacakage manager, you could also roll your own solution by running the package manager in a chroot environment, which would then need to be seeded with the package manager's own dependencies, of course (and use user-mode qemu to run pre- and post-installation scripts within the chroot in the case of cross-architecture builds).
        Whether this yields a minimal container when pointed at a repository intended to be used to deploy a full OS is another question, but using a package manager to build a root filesystem offline isn't hard to pull off.
        As for how to do this in the context of building an OCI container, tools like Buildah[1] exist to support container workflows beyond the conventional Dockerfile approach, providing straightforward command line tools to create containers, work with layers, mount and unmount container filesystems, etc.
        [1] https://github.com/containers/buildah/blob/main/README.md
        klodolph 5 hours ago
        There have got to be a million ways to do this by now. Some of the more principled approaches are tools like Nix (https://xeiaso.net/talks/2024/nix-docker-build/) and Bazel (https://github.com/bazel-contrib/rules_oci). But if you want to use an existing package manager like apt, you can pick it apart. Apt calls dpkg, and dpkg extracts files and runs post-install scripts. Only the post-install script needs to run inside the container.
        I may be a little out of touch here, because the last time I did this, we used a wholly custom package manager.
        tosti 6 hours ago
        apk and xbps can do this. You specify a different root to work in.
        Most Makefiles allow you to specify an alternate DESTDIR on install.
      - zamadatix 8 hours ago
        The same way you may require something like cmake as a build dependency but not have it be part of the resulting binary - separate build time and run time dependencies so you only distribute the relevant ones.
        [-]
        AntiUSAbah 6 hours ago
        I do not talk about multi-step container images.
        For example i run a gcs fuse driver, it has other dependencies apt 'just' resolves.
      - chaps 9 hours ago
        Your question feels insane to me for production environments. Why aren't you doing a version cutoff of your packages and either pulling them from some network/local cache or baking them into your images?
        [-]
        AntiUSAbah 6 hours ago
        I don't just run a java spring boot application. I run other things on my production system.
        It doesn't matter much were i pull them from though, i only do this with packages which have plenty of dependencies and i don't want to assemble my own minimal image.
        arandomhuman 8 hours ago
        Aforementioned security vulnerabilities don’t strike as a potential reason to you?
        [-]
        chaps 8 hours ago
        Friend, considering the supply chain attacks going on these days, automatically updating everything, immediately, probably isn't the perfect move either.
        [-]
        bluGill 8 hours ago
        You need to automatically update from a trusted source. That source better audit and update constantly. Which is hard.
        [-]
        LtWorf 3 hours ago
        Stable distributions have security teams.
        askl 8 hours ago
        Ignoring the real benefits of security updates to prevent the unlikely event of supply chain attacks sounds like a weird tradeoff.
        pavon 8 hours ago
        That local cache is often implemented as a drop-in replacement for the upstream package repository, and packages are still installed with the same package manager (yum,apt,pip,npm).
        [-]
        chaps 5 hours ago
        No, this is not always the case. Regulated industries pin their package versions and store those versions for pulling.
    - bluGill 8 hours ago
      Minimal might or might not pe your goal. A large container sometimes is correct - at that point you have to ask if maybe using a container twice so you only need to download it once and then installing the one missing part makes more sense.
  - teaearlgraycold 4 hours ago
    For some scenarios simply building a new image every day is reasonable.
  - dev_l1x_be 12 hours ago
    I update my docker containers regularly but doing it in a reproducible, auditable, predictable way
    [-]
    - tom1337 12 hours ago
      Could you explain how you achieve this?
      [-]
      - beart 10 hours ago
        If you are on github/gitlab, renovate bot is a good option for automating dependency updates via PRs while still maintaining pinned versions in your source.
      - oefrha 12 hours ago
        Chainguard, Docker Inc’s DHI etc. There’s a whole industry for this.
- DuncanCoffee 15 hours ago
  I know it's an anti-pattern, but what is the alternative if you need to install some software? Pulling its tagged source code, gcc and compile everything?
  [-]
  - kandros 13 hours ago
    Copying from another image is an under appreciated feature
    FROM ubuntu:24.04
    COPY --from=ghcr.io/owner/image:latest /usr/local/bin/somebinary /usr/local/bin/somebinary
    CMD ["somebinary"]
    Not as simple when you need shared dependencies
  - bennofs 14 hours ago
    Both Debian and Ubuntu provide snapshot mirrors where you can specify a date to get the package lists as they looked at that time.
    [-]
    - bluGill 13 hours ago
      Which is only useful for historical invesigation - the old snapshot has security holes attackers know how to exploit.
      [-]
      - lloeki 12 hours ago
        > the old snapshot has security holes attackers know how to exploit.
        So is running `docker build` and the `RUN apt update` line doing a cache hit, except the latter is silent.
        The problem solved by pinning to the snapshot is not to magically be secure, it's knowing what a given image is made of so you can trivially assert which ones are safe and which ones aren't.
        In both cases you have to rebuild an image anyway so updating the snapshot is just a step that makes it explicit in code instead of implicit.
        [-]
        bluGill 9 hours ago
        where does the apt update connect to? If it is an up to date package repo you get fixes. Howerer there are lots of reasons it would not. You better know if this is your plan.
        [-]
        foresto 2 hours ago
        You get fixes that were current at docker build time, but I think GP is referring to fixes that appear in the apt repo after your docker container is deployed.
        If you've pulled in a dependency from outside the base image, there will be no new base image version to alert you to an update of that external dependency. Unless your container regularly runs something like apt update && apt list --upgradable, you will be unaware of security fixes newly available from apt.
        lloeki 8 hours ago
        Yeah that's yet another annoying thing to consider
        Also I'm tired of doing these hacks:
        # increase to bust cache entry RUN true 42 && apt update
        Pinning to a snapshot just makes so many things easier.
  - Filligree 12 hours ago
    Run “nix flake update”. Commit the lockfile. Build a docker image from that; the software you need is almost certainly there, and there’s a handy docker helper.
    [-]
    - klodolph 12 hours ago
      Recently I’ve been noticing that Nix software has been falling behind. So “the software you need is almost certainly there” is less true these days. Recently = April 2026.
      [-]
      - Pay08 3 hours ago
        That's been an issue for years from my impression of the state of NixOS. There are other problems too, like a lot of open source packages doing straight binary downloads instead of actually building the software.
      - sestep 11 hours ago
        Are you referring to how the nixpkgs-unstable branch hasn't been updated in the past five days? Or do you have some specific software in mind? (not arguing, just curious)
        [-]
        klodolph 10 hours ago
        It’s a variety of different software that just isn’t updated very often.
        I don’t mind being somewhat behind, but it seems like there are a lot of packages that don’t get regular updates. It’s okay to have packages that aren’t updated, but those packages should be clearly distinguishable.
    - PunchyHamster 12 hours ago
      oh, great, adding more dependency, and one that just had serious security problem
      [-]
      - hexa555 12 hours ago
        as if other sandboxing software is perfect
        [-]
        tosti 11 hours ago
        Nothing is perfect. (FreeBSD jails come close but still no.)
  - liveoneggs 13 hours ago
    pretend you don't do it and add your extra software to the layer above
  - rowanG077 15 hours ago
    With a binary cache that is not so bad, see for example what nix does.
    [-]
    - Pay08 15 hours ago
      I don't really see how that's different from a normal binary install of a reproducible package. Especially with the lacking quality of a lot of Nix packages.
      [-]
      - bandrami 13 hours ago
        If you're in a situation where you want reproducibility you're using nix to build your own packages anyways, not relying on their packages
      - rowanG077 15 hours ago
        It's not if you can pin the package. It gives you reproducable docker containers without having to rebuild the world. Wasn't that the entire question?
        [-]
        Pay08 5 hours ago
        Wasn't the question about modifying an existing image?
  - dev_l1x_be 12 hours ago
    base image
    software component image
    both should be version pinned for auditing
- cpuguy83 7 hours ago
  The problem is distros often remove older versions from the repo as soon as the new version is available. Granted there is an archive that you can pull from.
- rascul 11 hours ago
  I disagree with that as a hard rule and with the opinion that it's an anti-pattern. Reproducible containers are fine, but not always necessary. There's enough times when I do want to run apt-get in a container and don't care about reproducibility.
  [-]
  - mikedelago 6 hours ago
    I agree with your opinion.
    Reproducible can sometimes be a goal, but repeatable is always important.
    I do think for this case specifically (base images for a specific distro), they should be reproducible.
- malikolivier 15 hours ago
  This is to solve such issues that I am using and running StableBuild.
  It is a managed service that keeps a cached copy of your dependencies at a specific time. You can pin your dependencies within a Dockerfile and have reproducible docker images.
  [-]
  - schonfinkel 14 hours ago
    I don't wanna be that guy but...
    NIX FIXES THIS.
    [-]
    - dijit 14 hours ago
      So does Bazel. :p
      [-]
      - evanjrowley 11 hours ago
        adding to the list, one exotic approach to this problem is stagex https://codeberg.org/stagex/stagex
- bandrami 13 hours ago
  This has been a solved problem for over two decades now with Nix but people can't be asked
  [-]
  - dev_l1x_be 12 hours ago
    It has been solved even without Nix for a long time, just laziness is probably why we are not doing it
red-iron-pine 10 hours ago
Presumably there is something you can plug into the CI/CD pipeline that informs everyone one you use Arch
(and, also presumably, that you do Crossfit, etc.)
[-]
- juancn 9 hours ago
  Here's a koan:
  You meet a vegan crossfitter that uses Arch, what does it tell you about first?
  [-]
  - sanufar 8 hours ago
    Hmm, probably about the marathon they ran the other day
  - iExalt 5 hours ago
    Arch, but only because they want to rewrite it in Rust.
- bloppe 8 hours ago
  I never understood why someone would proud of the fact they're not on Slackware
conorbergin 7 hours ago
I wonder if well designed "mutable" operating systems like Arch and Alpine that are going to beat NixOS etc. in the long run. An install script is strictly more powerful that a declarative config language, and typically less verbose.
[-]
- Pay08 3 hours ago
  Might as well use Guix then. You still have the declarative config language, but also a turing-complete (and convenient) programming language.
vbezhenar 10 hours ago
Nitpick, but I'd suggest to use "OCI Image" terminology. It runs with podman just fine.
[-]
- crumpled 7 hours ago
  I'm a bit out of my depth in this comment section (not an arch user, am a novice docker admin). But, to me, this comment is eye-opening.
pabs3 11 hours ago
More info about Reproducible Builds here:
https://reproducible-builds.org/
Closely related is the Boostrappable Builds community:
https://bootstrappable.org/
bastawhiz 4 hours ago
Huge respect to the people who made this happen. The amount of time and effort that goes into a headline like this is unreal.
kid64 8 hours ago
I've been long fascinated by the rolling release model. But aren't you guys worried about supply chain attacks? Seems those on the bleeding edge serve as canaries in the coalmine for the rest of us.
[-]
- StableAlkyne 7 hours ago
  That's the purpose of reproducible build initiatives like TFA. The idea is to ensure that identical source produces bit-for-bit identical builds on multiple machines when the packages are built.
  Sure, if the source itself gets got, then it does nothing. But it at least puts up one more barrier against tampering with the artifacts.
  They have a tracker for what percent of the distro is reproducible: https://reproducible.archlinux.org/
azangru 14 hours ago
A totally unrelated comment; but — there is an animation on that page that moves practically everything on the page about 20 pixels down over the course of 1 second.
I thought that would completely trash the Cumulative Layout Shift core web vital. Because, hey! the layout is shifting in front of my very eyes. But no, the CLS on the page is 0.
Is CLS a misleading metric then?
[-]
- chrisweekly 13 hours ago
  It's happening as a result of a deliberate animation. The CLS metric relates to initial render. So yes, there is layout shift, but it's not CLS per se.
  [-]
  - azangru 11 hours ago
    > The CLS metric relates to initial render.
    The CLS measures the total sum of layout shifts over the entire lifespan of a page, not just during initial render.
- epolanski 13 hours ago
  The layout isn't shifting, so it's not a layout shift.
  And it's not unexpected, because it comes from a css transition.
  [-]
  - azangru 12 hours ago
    Sure.
    It's just that the spirit of Google's core web vitals has been to measure the properties of a web page that have the most impact on users. How quickly content appears on a page, how visually stable the content is, and how long it takes the page to respond to an interaction.
    In the case of this page, I don't think it can be considered visually stable at all in the first second after it's loaded.
    And yet, core web vitals cannot demonstrate this.
aa-jv 16 hours ago
This is a really interesting accomplishment - I am also working heavily on reproducible builds for my firmware projects, and .. lo and behold .. the package manager key administrivia is the final bone to be broken.
I wonder if Arch leading the way on this will prompt other distro's to attempt the same feat. Reproducible builds are important for certification, security and safety-critical applications .. it'd be great to see Linux distros become more conformant to this method.
[-]
- Pay08 15 hours ago
  Debian already has an ongoing project for this: https://wiki.debian.org/ReproducibleBuilds.
  [-]
  - pabs3 11 hours ago
    And spawned a cross-distro community working on this stuff:
    https://reproducible-builds.org/
fragmede 15 hours ago
and they said compilers are deterministic...
This is a huge accomplishment! But it wouldn't be so huge if compilers were trivially deterministic. It took 5 decades of development for compilers to get here. I'm sure ChatGPT in 2073 is going to be more deterministic than it was in 2023.
cmxch 11 hours ago
[flagged]