From Rust to reality: The hidden journey of fetch_max

(questdb.com)

71 points | by bluestreak 2 hours ago

4 comments

tux3 17 minutes ago
This blog sent me into a memory models rabbit hole again. Each time I end up feeling like I'm finally starting to get it, only for a 6 line litmus test with 4 loads and 2 stores to send me crashing back down.
It makes me feel a little better reading about the history of memory models in CPUs. If this stuff wasn't intuitive to Intel either, I'm at least in good company in being confused (https://research.swtch.com/hwmm#path_to_x86-tso)
I actually knew about fetch_max from "implementing" the corresponding instruction (risc-v amomax), but I haven't done any of the fun parts yet since my soft-CPU still only has a single core.
jerrinot 1 hour ago
Hi, author here. My superpower is spending unreasonable amounts of time researching things with no practical purpose. Occasionally I blog about it - as a warning to others.
[-]
- trws 26 minutes ago
  I liked the article. I saw your PS that we added it to the working draft for c++26, we also made it part of OpenMP as of 5.0 I think. It’s sometimes a hardware atomic like on arm, but what made the case was that it’s common to implement it sub-optimally even on x86 or LL-SC architectures. Often the generic cas loop gets used, like in your lambda example, but it lacks an early cutout since you can ignore any input value that’s on the wrong side of the op by doing a cheap atomic read or just cutting out of the loop after the first failed CAS if the read back shows it can’t matter. Also can benefit from using slightly different memory orders than the default on architectures like ppc64. It’s a surprisingly useful op to support that way.
  If this kind of thing floats your boat, you might be interested in the non-reading variants of these as well. Mostly for things like add, max, etc but some recent architectures actually offer alternate operations to skip the read-back. The paper calls them “atomic reduction operations” https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2025/p31...
- Ethee 59 minutes ago
  It's these kinds of posts that I appreciate reading the most, so thank you for sharing!
- owls-on-wires 38 minutes ago
  “…no practical purpose” Nonsense, I learned something about compilation today. Thank you for sharing.
yshui 1 hour ago
That's a cool find. I wonder if LLVM also does the other way around operation, where it pattern matches handwritten CAS loops and transform them into native ARM64 instructions.
[-]
- jerrinot 57 minutes ago
  That's a very good question. A proper compiler engineer would know, but I will do my best to find something and report back.
  Edit: I could not find any pass with a pattern matching to replace CAS loops. The closest thing I could find is this pass: https://github.com/llvm/llvm-project/blob/06fb26c3a4ede66755... I reckon one could write a similar pass to recognize CAS idioms, but its usefulness would be probably rather limited and not worth the effort/risks.
IshKebab 1 hour ago
Yeah this comes from ARM and AXI, which has atomic max (and min, add, set, clear and xor). I assume ARM has all the corresponding instructions. RISC-V also has all of these in Zaamo.