This probably has a very simple answer, but I always wonder how the provide load on these sorts of tests. Can you get by with 2-4 other servers with 400Gb/s links and just tons and tons of simulated IPs/ports to activate LACP balancing? Because you probably want to simulate simultaneous clients that stream at varying rates, probably in the range of 0.3 - 10 Mbps, which means hundreds of thousands of clients to saturate at 800 Gbps, right?
Just an interesting observation I had about this once when I noticed that kernel quic implementations weren't very fast.
KTLS is mostly useful if paired with sendfile (I'm ignoring io_uring because I'm not as up to date on that). Otherwise you have to context switch back to userspace constantly.
Assuming the files are encrypted anyway for DRM reasons: why should static content like movies be TLSed? I know I know, "TLS all the things", but it sounds like a high cost at Netflix scale.
I would have thought to prevent a browser mixed content warning (~15% of Netflix viewing happens in browsers).
@drewg123 starts discussing this section at 4:21 in the presentation: https://www.youtube.com/watch?v=WzfADu1qyAM&t=261 ("we had this mandate that we had to start encrypting communications between our servers and our clients")
However, it looks like in 2015 (at iOS 9.0 / macOS 10.11) Apple began requiring TLS for apps. While exceptions are allowed, including for media streaming, they are discouraged and require a justification for App Store review: https://developer.apple.com/documentation/security/preventin...
I refused to connect my TV to the internet and use a Vero V for all of my watching needs. The Vero V is absolutely worse than most other experiences, but I'm happy.
It seems like it took engineering work, but TLS isn't their bottleneck when the data flow is structured correctly for the hardware (which is kind of the thesis of a lot of the Netflix CDN node optimization stuff).
I have a few questions, a lot of things went above my head in this of course but here are my questions.
1. When Netflix is using these specialized NICs, doesn't Netflix use AWS, so would that mean that they can add their own specialized hardware in AWS DC's (so is it co-location?) or does AWS natively support these NIC's
2. Considering this is Netflix whose whole architecture is to optimize for Videos, is this the correct architecture stack for video CDN's, if so, then does Youtube or cloudflare or any platform which also has video CDN at scale also do something similar to what Netflix is doing?
3. Seeing the amount of architectural optimizations, why doesn't netflix have their own DC's instead of Amazon, saturating a 400 Gb/s would lead to some massive bills (I have heard that Amazon makes more from Netflix than their own video service), now I understand that there are lock-ins in using AWS and AWS offered scaling that Netflix needed back then and its a more symbiotic relationship where both parties benefit from one other but seeing this level of optimization problems, I feel like wouldn't Netflix also benefit from something about leaving AWS and then having more freedom overall too? I would love to know more reasoning of it.
4. Does anybody have more resources like these pdf's that I can read about how companies optimize things, I am interested in almost anything about optimization like for example, I would be interested in reading about google's architecture decisions but also the fact that Jane street uses custom FGPA's for their high frequency trading.
5. let's say, I am interested in finding the job/contracts to be the guy who wishes to fix these problems. So how do I establish myself in such optimization to be "the guy", and also, to gain the expertise needed, I suppose I would need to test things out which might require specialized hardware etc. (which would be capital intensive), are there things that I can test without too much capital needed yet still gain some skills in this area because it just fascinates me!
Thanks for reading and I would love to get answers, Thanks and have a nice day!
I'm not a Netflix staff member but I work in the networking realm and can answer some of these questions (also gives me the chance to say something wrong where someone with the real answer can step in :)
1. Netflix does use AWS but it's far more economical for them to embed content caches/servers within ISP networks so that it relies solely on the ISPs network. All major CDN-like providers (Apple with their Edge Cache, Google with their GCC) offer embedded caches which tend to make a lot of sense at sufficient ISP scale (# of users). It's a misconception or just journalistic misunderstanding that everything Netflix runs is from AWS. This is the brunt of Netflix's outbound traffic. It also removes the reliance of Netflix to run in inordinately large backbone to serve content.
I'm not qualified to comment too heavily on Netflix's infra, but I'm fairly sure that they don't _exclusively_ use AWS. There are things they run there, sure, but I understand that their actual content distribution is run on their own metal, and on FreeBSD. AWS hosts other stuff (auth, recommendation algos, etc).
KTLS is mostly useful if paired with sendfile (I'm ignoring io_uring because I'm not as up to date on that). Otherwise you have to context switch back to userspace constantly.
@drewg123 starts discussing this section at 4:21 in the presentation: https://www.youtube.com/watch?v=WzfADu1qyAM&t=261 ("we had this mandate that we had to start encrypting communications between our servers and our clients")
But apparently Netflix began the change in 2016, citing viewer privacy from eavesdropping: https://netflixtechblog.com/protecting-netflix-viewing-priva...
However, it looks like in 2015 (at iOS 9.0 / macOS 10.11) Apple began requiring TLS for apps. While exceptions are allowed, including for media streaming, they are discouraged and require a justification for App Store review: https://developer.apple.com/documentation/security/preventin...
2021 https://news.ycombinator.com/item?id=28584738
2022 https://news.ycombinator.com/item?id=32519881
1. When Netflix is using these specialized NICs, doesn't Netflix use AWS, so would that mean that they can add their own specialized hardware in AWS DC's (so is it co-location?) or does AWS natively support these NIC's
2. Considering this is Netflix whose whole architecture is to optimize for Videos, is this the correct architecture stack for video CDN's, if so, then does Youtube or cloudflare or any platform which also has video CDN at scale also do something similar to what Netflix is doing?
3. Seeing the amount of architectural optimizations, why doesn't netflix have their own DC's instead of Amazon, saturating a 400 Gb/s would lead to some massive bills (I have heard that Amazon makes more from Netflix than their own video service), now I understand that there are lock-ins in using AWS and AWS offered scaling that Netflix needed back then and its a more symbiotic relationship where both parties benefit from one other but seeing this level of optimization problems, I feel like wouldn't Netflix also benefit from something about leaving AWS and then having more freedom overall too? I would love to know more reasoning of it.
4. Does anybody have more resources like these pdf's that I can read about how companies optimize things, I am interested in almost anything about optimization like for example, I would be interested in reading about google's architecture decisions but also the fact that Jane street uses custom FGPA's for their high frequency trading.
5. let's say, I am interested in finding the job/contracts to be the guy who wishes to fix these problems. So how do I establish myself in such optimization to be "the guy", and also, to gain the expertise needed, I suppose I would need to test things out which might require specialized hardware etc. (which would be capital intensive), are there things that I can test without too much capital needed yet still gain some skills in this area because it just fascinates me!
Thanks for reading and I would love to get answers, Thanks and have a nice day!
1. Netflix does use AWS but it's far more economical for them to embed content caches/servers within ISP networks so that it relies solely on the ISPs network. All major CDN-like providers (Apple with their Edge Cache, Google with their GCC) offer embedded caches which tend to make a lot of sense at sufficient ISP scale (# of users). It's a misconception or just journalistic misunderstanding that everything Netflix runs is from AWS. This is the brunt of Netflix's outbound traffic. It also removes the reliance of Netflix to run in inordinately large backbone to serve content.