Video encoding. To get the best quality for bitrate you need to use a software encoder, and to get the best encode time you need to give it CPU resources.
It's been impressive how much SVT-AV1 has increased performance between releases. SVT-AV1 2.2 is a significant step up from 2.1:
I’m fascinated by the idea of my future music making DAW computer having one of those manycore monsters.
That’s also because my favourite software synthesizers are increasingly modelling instruments, rather than sample based instruments. And that reduces the need for RAM and storage, but increases the thirst for CPU cycles.
I compile most of my OS and I would like it faster. I also like being able to compile and game at the same time. Or run many OSs in VMs at the same time.
So many developer workloads can be sped up significantly by throwing more cores at the problem. Classic examples are compilation and linting, but even unit/integration/end-to-end testing can be made feasible locally if you have a strong machine and architect your tests properly.
I’ve done this before with the testcontainers library and it enables some really nice workflows.
I want to get this out first: an OS is not a single unit. It is a collection of parts. Operating... System. With that out of the way, modification!
It actually becomes my OS, not at the mercy of The Build or provider. Licensing gets tricky here. You have to be careful if intending to redistribute the work.
Open source allows you to compile and not care about much else for your use. The code lets you do what you want or need. Compiled binaries limit your options, one can't as readily change them or find what they do.
Imagine BigCo gives you something to support a widget or tool you use. Nay, need. It stops working, the job is no longer being done. What do you do? What you can or what they allow.
While everyone may not do this, they absolutely benefit from the ability.
Many compile simply to get the most-representative CPU optimizations. See '-march=native', for example. Binary distributions have to make assumptions. Compiling source lets you correct them.
On the Linux side I prefer Fedora. Binary distribution with excellent packaging tools. I can rebuild anything with the same commands, but rarely have to. Good defaults with build options and patches.
I’m not asking what’s the benefit of open source. I’m asking what’s the benefit of compiling it yourself versus using distro-provided binary packages, and based on this answer you’re confirming what I learned from two years of Gentoo twentyish years ago: pretty much not a damn thing.
Spending dozens of hours compiling packages from source over and over is a comically poor tradeoff in time, energy, and dollars for getting a few minor CPU-hyperspecific optimizations that are utterly unnoticeable for everything outside of extremely specific circumstances. And in
those cases compiling one or two things yourself can get you 98% of the sought-after benefits.
I cover both, but you're right about choice. Not looking to sway. I explained background, my position, and some reasoning. I'll try again.
I don't compile the whole OS, or as you say, spend dozens of hours compiling packages from source.
In the small cases where I do compile something, it's this process:
fedpkg clone -a -b ... packagename
cd packagename
# toy around, make customizations that justify the compilation
fedpkg mockbuild
dnf up ./results*/*/*.rpm
You're making a strawman with this literal position of compiling everything, bathing it in hyperbole. I agree, that's a waste of time, so I don't run Gentoo. I run Fedora where I can compile what I need to. Reliably. Or I'd still be on Arch.
I apply patches, test them, suggest fixes upstream. That's what I accomplish. I can't speak for everyone but I can try to help. You're welcome; it's literal maintenance. Again, not everyone does it. Remember when I said this?
> While everyone may not do this, they absolutely benefit from the ability.
Another reminder, this was the question:
> what do you actually accomplish by compiling your OS?
I do it [for components] so others don't have to. Christ. I'm explaining how it's used, not why you should change anything. I feel we mostly agree, I rambled - like I did here. It's rarely worth it. I found a decent middle ground. Build/test what I have to, most easily.
Getting back on topic: more CPU cores help with the time this requires. Machine time is spent so human time isn't.
Sorry, but I wasn’t making a straw man: the GP I responded to explicitly made the point that they wanted a manycore monster CPU to speed up compiling their entire OS from scratch.
If all you’re talking about is compiling a few targeted packages for customization or performance reasons, that’s absolutely a reasonable and sensible approach to things.
My question was targeted at the GP’s use case which I wholeheartedly believe is a largely senseless waste of time, effort, CPU cycles, and money.
> Sorry, but I wasn’t making a straw man: the GP I responded to explicitly made the point that they wanted a manycore monster CPU to speed up compiling their entire OS from scratch.
This is what they said, emphasis mine:
> **most** of my OS
Want to try again? It's so arbitrary, I'm good. "49% would be acceptable, 51% - hah!". Nobody dare modify/test 'glibc' or any of the precious libraries.
Later, test police :) No hard feelings, I'll admit I don't communicate well and like this stuff a little too much
Every continuum has arbitrary points, a la “when is a pile a heap”.
In this case it becomes insensible somewhere between selecting a few packages which are genuine performance bottlenecks for your workloads or need specific compiler flags for necessary features, and spec’ing out hardware to fit a $5,000+ 192-core CPU so you can feel good about your kernel being able to take advantage of AVX512 for your Linux desktop box.
Imagine one of this 128-core CPUs utilizing several TBs of RAM connected via CXL and several PBs of storage combine with real-time Linux, and I'm in software-defined-X (SD-x for SDN, SDR, etc) workstation wonderland [1], [2].
[1] Samsung Unveils CXL Memory Module Box: Up to 16 TB at 60 GB/s:
Distributed memory parallel computing library. Most common in supercomputers with a high performance interconnect where you want to run one program that coordinates separate instances of itself via message passing (“SPMD” - single program multiple data parallelism). Slightly different than distributed computing that generally doesn’t assume a tightly coupled system with a low latency, high bandwidth interconnect. Allows you to scale up beyond what you can do within shared memory spaces where you are limited to low core counts (hundreds) and low memory (<1TB) - MPI is what you use when you’re in the many thousands of cores, many TB of RAM regime. In practice, supercomputers require a mixture: shared memory for many core parallelism, some hybrid host programming model for bringing accelerators in, and then MPI to coordinate between these shared memory/accelerator enabled hosts.
As a fresh example, we (a University) have just this month had delivered about 9000 cores (and about 80 GPUs) of machines, these are being added to our existing 6000 cores (approx 5 year old) machines. This is a modest HPC system.
Supercomputers, yes, but also genomics workloads. The vast majority of those workloads are effectively tied to one machine, since they don’t support cross-machine scaling, nor things like MPI.
Do they not support MPI and scaling across machines because of the features of the workloads or because of the features of the systems performing the work?
If the latter, then maybe they can use multiple cores using things like OpenMP and the like to get some "free" scaling, but they could be made to work with MPI or across machines. However, if it's the former, more cores won't help them.
It looks to me Intel will cannibalize its older, 5th, generation Xeon processors. Here's a comparison of two 64 core processors, one is 5th generation, launched in Q4 '23 [1], and one is 6th generation, launched in Q2 '24 [2]. The first one as an MSRP of $12400 (not a mistake) and the second $2749. It is true that the 5th generation CPU has 128 threads, vs only 64 for the 6th generation, but is this worth a price premium of 350% ?
NVidia should be afraid. Same GB/s on CPU are much more than on GPU. The problem here will be the price of the whole system. Anyway it's a step in the right direction if we want something like Copilot on premises or at home.
Don't worry, 20 year back most people had no clue how to use the second core. They believed in GHz and when it became obvious the theoretical limit is close thought that's the end of evolution.
People knew how to use the second core 20 years ago, SMP machines had been around for decades by that point. For the most part, the second core was taken care of by the OS which already had support for those SMP systems, and using multithreading or multiprocessing workloads.
Yeah, they are mostly for cloud providers. They will be running all of our docker containers and Kubernetes clusters and function as a service workflows as well as our databases.
It really should mean that cloud data centres should be able to greatly increase capacity without getting larger in terms of physical size. That is a huge net win for the cloud providers.
I assume you end up getting bottlenecks to RAM, disk, and network when there’s this much parallelism in CPU. So even for cloud providers it’s probably most suited to some fairly specific CPU-heavy workloads.
you're under-estimating the power of 512 mb of cache, 12 channel ddr5, pcie 5 NVMEs and 880GB/s networking cards. The giant L3 cache means that memory pressure is significantly reduced (especially if many of the cores use similar data). Ram is the weakest link here (12 channels of ddr5 6000 is a ton from a consumer standpoint, but scaling up from dual channel on a 16 core cpu would only bring you up to 96 cores). The 128 lanes of PCIE though means that you don't have much bottleneck outside of ram (if you're doing things properly). If you use 64 of those for storage, that can be 100s of TB of ridiculously fast storage, and you still have another 64 left over for ridiculously fast networking.
In the cloud vendor space (and really virtualization in general) local disk is practically non-existent. The vast majority of apps these days are stateless and generate very little IO, and even if they do generate IO, it's more than likely going to some kind of network share so that the VM can fail over. If the VM's data is on local disk, you can't migrate it to a new host if the host it's on dies (along with its local disks).
Presuming the other responder is correct about there being 880GB/s of network cards (and that the units are correct at GB/s instead of Gb/s or Gbps), that should be more than plenty. At the largest CPUs, 192 cores would be 4.5 GB/s per core (2.3 GB/s if dual socket). That's more than enough; I see a lot of dual-socket servers with 24 or so total cores running on 10 Gbps or maybe bonded 10 Gbps. That's about 100 MB/s per core.
usually dcs are limited by the power and or cooling per rack, you end up with having to run more power and hvac, which can be prohibitively expensive. Alternatively you half fill racks when you upgrade.
It's been impressive how much SVT-AV1 has increased performance between releases. SVT-AV1 2.2 is a significant step up from 2.1:
https://www.phoronix.com/news/SVT-AV1-2.2-Released
https://gitlab.com/AOMediaCodec/SVT-AV1
That’s also because my favourite software synthesizers are increasingly modelling instruments, rather than sample based instruments. And that reduces the need for RAM and storage, but increases the thirst for CPU cycles.
I compile most of my OS and I would like it faster. I also like being able to compile and game at the same time. Or run many OSs in VMs at the same time.
I’ve done this before with the testcontainers library and it enables some really nice workflows.
It actually becomes my OS, not at the mercy of The Build or provider. Licensing gets tricky here. You have to be careful if intending to redistribute the work.
Open source allows you to compile and not care about much else for your use. The code lets you do what you want or need. Compiled binaries limit your options, one can't as readily change them or find what they do.
Imagine BigCo gives you something to support a widget or tool you use. Nay, need. It stops working, the job is no longer being done. What do you do? What you can or what they allow.
While everyone may not do this, they absolutely benefit from the ability.
Many compile simply to get the most-representative CPU optimizations. See '-march=native', for example. Binary distributions have to make assumptions. Compiling source lets you correct them.
On the Linux side I prefer Fedora. Binary distribution with excellent packaging tools. I can rebuild anything with the same commands, but rarely have to. Good defaults with build options and patches.
Spending dozens of hours compiling packages from source over and over is a comically poor tradeoff in time, energy, and dollars for getting a few minor CPU-hyperspecific optimizations that are utterly unnoticeable for everything outside of extremely specific circumstances. And in those cases compiling one or two things yourself can get you 98% of the sought-after benefits.
To each their own but… I’ll pass.
I don't compile the whole OS, or as you say, spend dozens of hours compiling packages from source.
In the small cases where I do compile something, it's this process:
You're making a strawman with this literal position of compiling everything, bathing it in hyperbole. I agree, that's a waste of time, so I don't run Gentoo. I run Fedora where I can compile what I need to. Reliably. Or I'd still be on Arch.I apply patches, test them, suggest fixes upstream. That's what I accomplish. I can't speak for everyone but I can try to help. You're welcome; it's literal maintenance. Again, not everyone does it. Remember when I said this?
> While everyone may not do this, they absolutely benefit from the ability.
Another reminder, this was the question:
> what do you actually accomplish by compiling your OS?
I do it [for components] so others don't have to. Christ. I'm explaining how it's used, not why you should change anything. I feel we mostly agree, I rambled - like I did here. It's rarely worth it. I found a decent middle ground. Build/test what I have to, most easily.
Getting back on topic: more CPU cores help with the time this requires. Machine time is spent so human time isn't.
If all you’re talking about is compiling a few targeted packages for customization or performance reasons, that’s absolutely a reasonable and sensible approach to things.
My question was targeted at the GP’s use case which I wholeheartedly believe is a largely senseless waste of time, effort, CPU cycles, and money.
This is what they said, emphasis mine:
> **most** of my OS
Want to try again? It's so arbitrary, I'm good. "49% would be acceptable, 51% - hah!". Nobody dare modify/test 'glibc' or any of the precious libraries.
Later, test police :) No hard feelings, I'll admit I don't communicate well and like this stuff a little too much
In this case it becomes insensible somewhere between selecting a few packages which are genuine performance bottlenecks for your workloads or need specific compiler flags for necessary features, and spec’ing out hardware to fit a $5,000+ 192-core CPU so you can feel good about your kernel being able to take advantage of AVX512 for your Linux desktop box.
[1] Samsung Unveils CXL Memory Module Box: Up to 16 TB at 60 GB/s:
https://www.anandtech.com/show/21333/samsung-unveils-cxl-mem...
[2] Huawei unveils its OceanStor A800 AI-specific storage solution; announces 128TB high-capacity SSD
https://www.datacenterdynamics.com/en/news/huawei-unveils-it...
[3] Real-time Linux is officially part of the kernel:
https://news.ycombinator.com/item?id=41594862
If the latter, then maybe they can use multiple cores using things like OpenMP and the like to get some "free" scaling, but they could be made to work with MPI or across machines. However, if it's the former, more cores won't help them.
[1] https://www.intel.com/content/www/us/en/products/sku/237252/...
[2] https://www.intel.com/content/www/us/en/products/sku/240363/...
It really should mean that cloud data centres should be able to greatly increase capacity without getting larger in terms of physical size. That is a huge net win for the cloud providers.
Presuming the other responder is correct about there being 880GB/s of network cards (and that the units are correct at GB/s instead of Gb/s or Gbps), that should be more than plenty. At the largest CPUs, 192 cores would be 4.5 GB/s per core (2.3 GB/s if dual socket). That's more than enough; I see a lot of dual-socket servers with 24 or so total cores running on 10 Gbps or maybe bonded 10 Gbps. That's about 100 MB/s per core.
I suspect it's a sign of the global downturn, cloud adoption in general seems to have stalled. Hence, no new CPU models being deployed at scale.
When the downturn lifts, I full expect the big cloud providers to start deploying huge numbers of these but that seems to be at least a year away.