This suggests the checksum is used to identify whether the binary is known to BOT, and thus whether BOT can optimize the binary.
I do wonder what this "optimize" step actually entails; does it just replace the binary with one that Intel themselves carefully decompiled and then hand-optimised? If it's a general "decompile-analyse-optimise-recompile" (perhaps something similar to what the https://en.wikipedia.org/wiki/Transmeta_Crusoe does), why restrict it?
Post link optimization (PLO) tools have been around for quite a while. In particular, Meta’s BOLT (fully upstream in LLVM) and Google’s Propeller (somewhat upstream in LLVM, but fully open source) have been around for 5+ years at this point.
It doesn’t seem like Intel’s BOT delivers more performance gains, and it is closed source.
Intel BOT seems to be patches for specific binaries (hence why they didn't see a difference for Geekbench 6.7), unlike BOLT/Propeller which are for arbitrary programs. The second image from their help page [1] showcases this.
Propeller can’t really do many instruction level modifications due to how it works (constructs a layout file that then gets passed to the linker).
BOLT could do this, but does not as far as I’m aware.
Most of vectorization like this is also probably better done in a compiler middle end. At least in LLVM, the loop vectorizer and especially the SLP Vectorizer do a decent job of picking up most of the gains.
You might be able to pick up some gains by doing it post-link at the MC level, but writing an IR level SLP Vectorizer is already quite difficult.
Well in this case, there's a number of CPU which are almost exactly instruction set compatible. So if anyone can extract the binary, it might help your competitors too.
Would be neat to see what makes the Intel-optimized Geekbench incompatible with the 265K, 14900K and 9950X but work on the 270K.
> BOT optimizations are poorly documented, aggressive in scope, and damage comparability with other CPUs. For example, BOT allows Intel processors to run vector instructions while other processors continue to run scalar instructions. This provides an unfair advantage to Intel
I do wonder what this "optimize" step actually entails; does it just replace the binary with one that Intel themselves carefully decompiled and then hand-optimised? If it's a general "decompile-analyse-optimise-recompile" (perhaps something similar to what the https://en.wikipedia.org/wiki/Transmeta_Crusoe does), why restrict it?
It doesn’t seem like Intel’s BOT delivers more performance gains, and it is closed source.
[1] https://www.intel.com/content/www/us/en/support/articles/000...
I swore Intel had their own PLO tool, but I can only find https://github.com/clearlinux/distribution/issues/2996.
It was open source, but has since been deprecated.
BOLT could do this, but does not as far as I’m aware.
Most of vectorization like this is also probably better done in a compiler middle end. At least in LLVM, the loop vectorizer and especially the SLP Vectorizer do a decent job of picking up most of the gains.
You might be able to pick up some gains by doing it post-link at the MC level, but writing an IR level SLP Vectorizer is already quite difficult.
Would be neat to see what makes the Intel-optimized Geekbench incompatible with the 265K, 14900K and 9950X but work on the 270K.
Wait until they hear about branch predictors.