Cloudspecs: Cloud Hardware Evolution Through the Looking Glass

(muratbuffalo.blogspot.com)

52 points | by speckx 1 day ago

3 comments

donavanm 1 day ago
> The first NVMe-backed instance family, i3, appeared in 2016. As of 2025, AWS offers 36 NVMe instance families. Yet the i3 still delivers the best I/O performance per dollar by nearly 2x.
Article should probably explicitly call out the difference between directly attached nvme storage (good ol i3) and “nitro nvme” (m6id and friends). The later is provided via an embedded card which emulates/provides a virtual nvme device directly to the host/instance. Without digging in to the specifics Im oretty sure thats accounting for the $/perf numbers being relatively flat. And “i” series being local storage cost/perf optimized compared to other families.
Edit: see https://d1.awsstatic.com/events/reinvent/2021/Powering_nextg... and similar talks. And notice the language around benefits of more consistent performance due to the better mediation of resources.
[-]
- huntaub 1 day ago
  Yeah, I would keep in mind that not everyone is optimmizing for $/perf, some use cases (where data is colder) are optimizing for $/GiB stored.
dweekly 1 day ago
I think one interesting context to consider in this is cloud repatriation. Economics that didn't really pencil out half a decade ago may be worth revisiting for a lot of organizations who now find that their actual bare metal needs are quite modest and can be well met by a few modern servers. The IOPS/$ graph here contrasting on-prem w/cloud in particular is quite telling.
[-]
- pixl97 23 hours ago
  I've seen a lot of workloads that had multiple servers or large RAID'ed NAS devices get shrank down to a single server after a single NVMe could provide more than enough random IOPS.
- roughly 21 hours ago
  I’m not disagreeing with this necessarily, but I do think a lot of people underestimate the costs of actually doing on-prem to a professional standard. You’ll almost certainly have to hire a dedicated team to manage your hardware, and you’re off in the woods as far as most of the rest of the world’s operating stack - an awful lot assumes you’re on EKS with infinite S3 and ECR available. It’s doable, but it’s not drag & drop - the cloud providers are expensive, but they are providing a lot.
mad44 1 day ago
Does anyone have any explanation or theories about the NVME SSDs pricing anomaly?
[-]
- till-tum 1 day ago
  I don't think this can be definitively answered without working for one of the hyperscalers. But here are some speculations: 1. Device speeds are intentionally capped to increase device lifetime (but this would only make sense for writes) 2. Networked storage services like EBS are more profitable, and AWS would like to phase out instance-attached storage. 3. Technical limitations/virtualization overhead (See comment above). I don’t have enough insight of how AWS SSDs work under the hood, but high network throughput (600 Gbit/s) is possible even in virtualized instances. Then again, we have certainly seen some weird noisy neighbor effects on cloud SSDs. However, it's worth mentioning that the same throughput limitations also apply to bare metal instances, where users don't benefit from virtualization (https://docs.aws.amazon.com/ec2/latest/instancetypes/so.html...). 4. There’s too little customer demand for fast SSDs, and optimization is not worth the effort.
  [-]
  - neerajsi 1 day ago
    Speculating: local ssds aren't as valuable in the cloud since they're effectively ephemeral. If the instance restarts, it would lose its storage. Trying to keep a workload affinitized to an SSD or to migrate data to a different SSD when an instance moves increases cost prohibitively.
    [-]
    - till-tum 1 day ago
      For a lot of use cases such as caching (e.g., the ephemeral caching layer in Snowflake), ephemeral storage is good enough. If you really want to, you could also achieve persistence by replicating to multiple instances (afaik this is what DynamoDB does)
      [-]
      - justincormack 10 hours ago
        Thats difficult for most people to implement in their applciations, and it increases latency to be closer to networked SSD anyway. So it remains fairly niche.
  - huntaub 1 day ago
    I think that number 4 is the big one. AWS only has so much capacity to work on new hardware types, and the number of companies who want to work with on-device NVMe are WAY smaller than then number of companies who just want to slap Kubernetes on some instances with EBS.
- gpapilion 15 hours ago
  Nvme pricing is pretty volatile in the past 2 years I’ve seen it move between 2-3x from its low post Covid.
  I don’t think the prices have adjusted because of that. Additional during Covid the prices were very high and this is baked into the pricing.