I just ran some massive tests on our own CI. I use AMD Turin for this on gcp, which was noted as one of the fastest ones in the article.
The most insane part here is that the AMD EPYC 4565p can beat the turin's used on the cloud providers, by as much as 2x in the single core.
Our tests took 2 minutes on GCP, 1 minute flat on the 4565p with its boost to 5.1ghz holding steady vs only 4.1ghz on the gcp ones.
GCP charges $130 a month for 8vcpus. ALSO this is for SPOT that can be killed at any moment.
My 4565p is a $500 cpu... 32 vcpus... racked in a datacenter. The machine cost under 2k.
i am trying hard to convince more people to rack themselves especially for CI actions. The cloud provider charging $130 / mo for 3x less vcpus you break even in a couple months, it doesn't matter if it dies a few months later. On top of that you're getting full dedicated and 2x the perf. Anyways... glad to see I chose the right cpu type for gcloud even though nothing comes close to the cost / perf of self racking
> My 4565p is a $500 cpu... 32 vcpus... racked in a datacenter. The machine cost under 2k.
> The cloud provider charging $140 / mo for 3x less vcpus you break even in a couple months, it doesn't matter if it dies a few months later
How do you calculate break even in a couple months if the machine costs $2,000 and you still have to pay colo fees?
If your colo fees were $100 month you wouldn’t break even for over 4 years. You could try to find cheaper colocation but even with free colocation your example doesn’t break even for over a year.
the 140/mo is for 3x less vcpu, so $420/mo savings if you use all those same cores. sorry for the poor comparison wording there. in a few months already up to $1300+ by 6 months already paid the machine.
colo fees are cheap if you need more than just 1u. even with a 50-100 fee you easily get way more performance and come ahead within a year
You originally said “a couple months” but now it’s 6 months and assumption of $0 collocation fees which isn’t realistic
In my experience situations rarely call for precisely 32 cores for a fixed period of 3 years to support calculations like this anyway. We start with a small set of cloud servers and scale them up as traffic grows. Today’s tooling makes it easy to auto scale throughout the day, even.
When trying to rack a server everyone aims higher because it sucks to start running into limits unexpectedly and be stuck on a server that wasn’t big enough to handle the load. Then you have to start considering having at least two servers in case one starts failing.
Racking a single self-built server is great for hobby projects but it’s always more complicated for serving real business workloads.
Don't nit-pick the "couple". It was used casually - like to mean not terribly long time. So the 2-6 spread, while technically big, is still just a trifle. While I'm nit-picking; up thread is talking about a limited box for CI and you're talking about scaling up real business workloads. That's just like the difference between 2 and 6. Give it a rest.
Everyone: run your scenarios and expectations in a spreadsheet and then use real data to run your CBA. Your case will be unique(ish) so make your case for your situation.
I used to run a site that compares prices[0]. Not only is the ecosystem pull to the cloud strong, but many developers today look at bare metal as downright daunting.
Not sure where that fear comes from. Cloud challenges can be as or more complex than bare metal ones.
The fragmentation and friction! Comparing prices usually requires 10 open browser tabs and a spreadsheet, which is what keeps people locked into their default cloud. I built a tool to solve this called BlueDot (ie, Earth, where all the clouds are)[0]. It’s a TUI that aggregates 58,000+ server configurations across 6 clouds (including Hetzner). It lets you view side-by-side price comparisons and deploy instantly from the terminal. It makes grabbing a cheap Hetzner box just as easy as spinning up something on AWS/GCP.
Self-racking lets you rack a bunch of gear you'd never find in VM/dedicated rentals, like consumer parts or older, still very good parts. Overclocking options are available as well if you DIY.
If you need single-threaded performance, colo is really the only way to go anyway.
We have two full racks and we're super happy with them.
This is basically the premise of https://www.blacksmith.sh/ as far as I know, though without the need to host the hardware yourself and the potential complexity they comes with that.
You can go on OVH and get a dedicated server with 384 threads and a Turin cpu for $1147 a month. You have to pay $1147 for installation and the default has low ram and network speeds but even after upgrading those it's going to be 1/5 of what it would cost on public clouds.
A 16-core 4565p is of course faster in max single thread speed than a 96-core that GCP is running at an economically optimal base clock.
A year ago I gave a talk about optimizing Cloud cost efficiency and I did a comparison of colocation vs cloud over time. You might find it interesting here, linking to the relative part: https://youtu.be/UEjMr5aUbbM?si=4QFSXKTBFJa2WrRm&t=1236
TLDR, colocation broke even in 6 to 18 months for on-demand and 3y reserve cloud respectively. But spot instances can actually be quite cheaper than colocation.
You generally don't go to the cloud for the price (except if we are talking hetzner etc).
They're a typical hardware maker unable to focus on software, which is why NVIDIA is now a multi-trillion dollar corporation and AMD is "just" a few hundred billion.
They've focused too much on CPUs and completely dropped the ball on AI and compute accelerators.
It's especially sad considering that the MI300 and related accelerators on paper are competitive with NVIDIA hardware, it's just that they have nowhere near the same software stack, so nobody cares.
Yeah, remember when 4 core 8 threads were the high-end CPUs until AMD Ryzen came out? If AMD didn't do their best job then we're still stuck with the norm of 4 cores for more years as I can imagine.
A desktop class CPU is definitely quicker single threaded and multithreaded, no surprises there most of these are dual core. The single threaded performance of the C8A is actually pretty good but its also the best of the bunch by a wide margin most of the CPUs are far behind. Memory performance appears to be attrocious all around.
Anyone have experience with Oracle Cloud and ease of moving away?
This benchmark seems to recommend Oracle Cloud, but I’ve heard that Oracle has historically used aggressive licenses and legal terms to keep customers locked-in.
I wrote the article. I would NEVER tell anyone to use Oracle, as the vendor lock in and strong-arming and pricing is ridiculous. That said, I am hosting small projects on Oracle Cloud, due to the super low cost. I can just move them whenever they decide to be naughty, I am not using an oracle DB or anything proprietary, just linux VMs with my own mysql setup.
I signed up for an Oracle Cloud trial. They closed my trial a few days later and shut down my one trial VM without warning.
Weirdly they didn’t allow me to add payment info to continue. Even weirder their sales people kept contacting me asking me to come back. When I explained the situation they all tried to fix it and then went radio silent until the next sales rep came along to try to convince me to stay.
I searched Reddit at the time and a lot of other people had the same experience. A lot of other people were bragging about abusing their free tier and trials without consequences. I still don’t know how they decided to permanently close my account (without informing the sales team)
You can't compare VPS with VMs from major cloud provider, VPS don't offer anything beside basic compute.
Also virtualization from cloud provider is way better because they have custom hardware and software so you don't suffer from noisy neighbours for example.
I am still running ROME epyc cpus that I picked up for couple hundred and they're doing great. Power usage is not the best and singlethread is awful (-50%), but multithreaded performance kicks 9950x in the ass at around 90k vs 70k.
The main thrust of the economic argument has been on the cost of system adminstrators that maintain the hardware. Electricity and cooling being big ongoing costs, but also when AWS released it wasn't uncommon to order a server and have it take 3 months to arrive.
I think in practice the system administrators are still in the company now as AWS engineers, they still keep all that platform stuff running and your paying AWS for their engineers too as well as electricity. It has the advantage of being very quick to spin up another box, but also machines these days can come with 288 cores, its not a big stretch to maintain sufficient surplass and the tools to allow teams to self service.
Things are in a different place to when AWS first released, AWS ought to be charging a lot less for the compute, memory and storage, their business is wildly profitable at current rates because per core machines got cheaper.
I've moved two clients to colo. Dramatic cost savings. So many systems only use VMs and a few basic cloud features. Everyone knows this, but just to make the point, you can still use certain cloud products (cloud storage for example) just fine while running your primary workloads on your own hardware. Sometimes it makes perfect sense, and you just need someone to nudge you and tell you it's going to be ok.
Going to the cloud can't possibly be as cheap as owning your own hardware for obvious reasons - they have to make money somehow. Well, unless you use spot instances, which uses spare nodes.
In any case, you move to the cloud despite the cost if you need the multi-region redundancy, the management/features etc. More commonly it's because the higher ups heard everybody's doing it, but oh well :D
Maintaining and updating your own hardware comes with so much operational overhead compared to magically spinning up and down resources as needed. I don’t think this really needs to be said.
You can extract a lot of value from bare-metal servers from Hetzner but you need to put some effort initially to get them going. That being said, it is not really that difficult. And, frankly, it a lot more fun.
GCP (near the top) 3 years reserved 16 ARM cores + 120 GB costs + 1TB local SSD costs > $1k/month. It does not even match the specs of a Ryzen AI 395 Framework miniPC that goes for ~$4k AFAIK.
"Unfortunately, the [Hetzner] CPX22is available only in eu-central and ap-southeast, but if that’s OK with you it is the best value and fastest overall."
Great analysis! The cloud vs colo debate is fascinating but often misses the operational overhead discussion.
While zackify's math on raw compute cost is compelling, there's hidden complexity: How much time does your team spend on hardware failures, network issues, IPMI troubleshooting, firmware updates, etc? For CI/build workloads specifically, colo makes sense because downtime is just inconvenience.
But for production workloads, I've seen too many "we'll just rack a few servers" projects turn into full-time infrastructure jobs. Cloud's value isn't just compute - it's shifting operational burden.
That said, hybrid approaches work well: Use cloud for production and autoscaling, colo for predictable batch workloads like CI. The benchmark shows AMD Turin's strong performance across providers - that consistency is valuable even if you pay a premium.
The most insane part here is that the AMD EPYC 4565p can beat the turin's used on the cloud providers, by as much as 2x in the single core.
Our tests took 2 minutes on GCP, 1 minute flat on the 4565p with its boost to 5.1ghz holding steady vs only 4.1ghz on the gcp ones.
GCP charges $130 a month for 8vcpus. ALSO this is for SPOT that can be killed at any moment.
My 4565p is a $500 cpu... 32 vcpus... racked in a datacenter. The machine cost under 2k.
i am trying hard to convince more people to rack themselves especially for CI actions. The cloud provider charging $130 / mo for 3x less vcpus you break even in a couple months, it doesn't matter if it dies a few months later. On top of that you're getting full dedicated and 2x the perf. Anyways... glad to see I chose the right cpu type for gcloud even though nothing comes close to the cost / perf of self racking
> The cloud provider charging $140 / mo for 3x less vcpus you break even in a couple months, it doesn't matter if it dies a few months later
How do you calculate break even in a couple months if the machine costs $2,000 and you still have to pay colo fees?
If your colo fees were $100 month you wouldn’t break even for over 4 years. You could try to find cheaper colocation but even with free colocation your example doesn’t break even for over a year.
colo fees are cheap if you need more than just 1u. even with a 50-100 fee you easily get way more performance and come ahead within a year
You originally said “a couple months” but now it’s 6 months and assumption of $0 collocation fees which isn’t realistic
In my experience situations rarely call for precisely 32 cores for a fixed period of 3 years to support calculations like this anyway. We start with a small set of cloud servers and scale them up as traffic grows. Today’s tooling makes it easy to auto scale throughout the day, even.
When trying to rack a server everyone aims higher because it sucks to start running into limits unexpectedly and be stuck on a server that wasn’t big enough to handle the load. Then you have to start considering having at least two servers in case one starts failing.
Racking a single self-built server is great for hobby projects but it’s always more complicated for serving real business workloads.
Everyone: run your scenarios and expectations in a spreadsheet and then use real data to run your CBA. Your case will be unique(ish) so make your case for your situation.
Not sure where that fear comes from. Cloud challenges can be as or more complex than bare metal ones.
[0]: https://baremetalsavings.com/
[0]: https://tui.bluedot.ink
If you need single-threaded performance, colo is really the only way to go anyway.
We have two full racks and we're super happy with them.
A year ago I gave a talk about optimizing Cloud cost efficiency and I did a comparison of colocation vs cloud over time. You might find it interesting here, linking to the relative part: https://youtu.be/UEjMr5aUbbM?si=4QFSXKTBFJa2WrRm&t=1236
TLDR, colocation broke even in 6 to 18 months for on-demand and 3y reserve cloud respectively. But spot instances can actually be quite cheaper than colocation.
You generally don't go to the cloud for the price (except if we are talking hetzner etc).
They're a typical hardware maker unable to focus on software, which is why NVIDIA is now a multi-trillion dollar corporation and AMD is "just" a few hundred billion.
They've focused too much on CPUs and completely dropped the ball on AI and compute accelerators.
It's especially sad considering that the MI300 and related accelerators on paper are competitive with NVIDIA hardware, it's just that they have nowhere near the same software stack, so nobody cares.
We were stuck with Intel, its nice that we have better CPUs.
7 Zip benchmark
9800X3D 130 GIPs compression, 134 GIPs decompress.
C8A 21577 MIPs (21.5GIPs) compression, 9868 MIPS decompression (9.9GIPs).
Geekbench 5
9800X3D 16975 multithread, 2474 single thread
C8A 4049 multithread, 2240 single thread
A desktop class CPU is definitely quicker single threaded and multithreaded, no surprises there most of these are dual core. The single threaded performance of the C8A is actually pretty good but its also the best of the bunch by a wide margin most of the CPUs are far behind. Memory performance appears to be attrocious all around.
This benchmark seems to recommend Oracle Cloud, but I’ve heard that Oracle has historically used aggressive licenses and legal terms to keep customers locked-in.
Weirdly they didn’t allow me to add payment info to continue. Even weirder their sales people kept contacting me asking me to come back. When I explained the situation they all tried to fix it and then went radio silent until the next sales rep came along to try to convince me to stay.
I searched Reddit at the time and a lot of other people had the same experience. A lot of other people were bragging about abusing their free tier and trials without consequences. I still don’t know how they decided to permanently close my account (without informing the sales team)
The account creation process was really confusing, and they kept turning off my instance because usage was not high enough.
It seemed quited oudated/confusing to use last time I tried it a few years ago.
Also virtualization from cloud provider is way better because they have custom hardware and software so you don't suffer from noisy neighbours for example.
Every big corporate I have worked at has lower cost of capital than Amazon, and yet they want to move to AWS. I just dont understand it.
I think in practice the system administrators are still in the company now as AWS engineers, they still keep all that platform stuff running and your paying AWS for their engineers too as well as electricity. It has the advantage of being very quick to spin up another box, but also machines these days can come with 288 cores, its not a big stretch to maintain sufficient surplass and the tools to allow teams to self service.
Things are in a different place to when AWS first released, AWS ought to be charging a lot less for the compute, memory and storage, their business is wildly profitable at current rates because per core machines got cheaper.
"We Moved from AWS to Hetzner. Cut Costs 89%. Here’s the Catch."
https://medium.com/lets-code-future/we-moved-from-aws-to-het...
ChatGPT tells me "no theory, no fluff" all the time :D
Maintaining and updating your own hardware comes with so much operational overhead compared to magically spinning up and down resources as needed. I don’t think this really needs to be said.
You're never just paying for the hardware.
While zackify's math on raw compute cost is compelling, there's hidden complexity: How much time does your team spend on hardware failures, network issues, IPMI troubleshooting, firmware updates, etc? For CI/build workloads specifically, colo makes sense because downtime is just inconvenience.
But for production workloads, I've seen too many "we'll just rack a few servers" projects turn into full-time infrastructure jobs. Cloud's value isn't just compute - it's shifting operational burden.
That said, hybrid approaches work well: Use cloud for production and autoscaling, colo for predictable batch workloads like CI. The benchmark shows AMD Turin's strong performance across providers - that consistency is valuable even if you pay a premium.
Really? How many?