Ask HN: Stanford CS 153 help

hi hn - i'm volunteering at Stanford next quarter to co-teach cs 153 (infrastructure at scale) - a course i wish had existed during my undergrad years. rather than pure theory, it's focused on how large-scale systems actually work in production

the format combines hands-on projects with a speaker series. we've confirmed some solid speakers (Jensen Huang from NVIDIA, Matthew Prince from Cloudflare etc), but i'm also keen to bring in perspectives from folks who don't fit the standard mold. tbh, many of the best systems eng/devs/infra ppl i've worked with are pretty weird - they think differently, take unconventional paths, and often learn by obsessively building and breaking things rather than following traditional routes. i think it would be cool for the students to realize its a feature, not a bug, to be weirdly obsessive

if you're interested in this kind of stuff, i'd value your thoughts on:

1/ who are the fascinating/unsung heroes in infra/systems eng that students should learn from? especially interested in people who've solved hard scaling problems through unconventional thinking or unique approaches

2/ what kind of projects do you think would fun and meaningfully demonstrate real-world infrastructure challenges while still being achievable in an academic quarter?

prerequisites are CS106/CS111 level programming. draft syllabus here: https://explorecourses.stanford.edu/search?view=catalog&filt...

email: anjney at alumni dot stanford edu if you prefer to share thoughts privately. thank you in advance for any and all help

57 points | by anjneymidha 19 hours ago

16 comments

  • spenczar5 18 hours ago
    Rachel by the Bay (https://rachelbythebay.com/) has long impressed me as someone who clearly is deep in the actual work of systems, day in and day out, and can write well about it.

    Julia Evans has a wonderful approach as well, and has amazing talent for teaching: https://jvns.ca/

    Kellan Elliott-McCrea (https://laughingmeme.org/) has given the world some of the better advice on the hardest parts of software scaling, which is of course scaling the human organizations. New grads are virtually always underestimating that part of the work; eventually you realize the hard problems are usually social and not technical.

    • anjneymidha 17 hours ago
      i've followed Rachel and Julia for a long time, but didn't know about Kellan - thanks so much for that.

      re: human org scaling - true and this was the most surprising thing for me when i was running the platform org at discord. companies ship their org charts whether they like it or not. and refactoring org charts correctly, at scale, is essentially untested in the modern era

  • JohnMakin 18 hours ago
    2/

    Build a multi-cloud architecture. And by this, I mean connect two cloud's networks without traversing the public internet to connect two applications running in each respective cloud. And then, put that into IaC. It sounds like not much, but the issues you uncover are pretty illuminating and it is a fantastic interview question to give to senior-ish infra guys to see how they approach it and the challenges they expect.

    And you're right, we're all weird.

    • revskill 6 hours ago
      I am curious how to connect without public internet. U mean vpn ?
    • anjneymidha 18 hours ago
      this is exactly the type of pointer i was hoping for, thank you
    • orionblastar 18 hours ago
      We are all nerds because we love the technology, science, and math behind it.
  • slaucon 18 hours ago
    A progression of projects that comes to mind:

    1) CI and IAC that deploy a web app running in a container

    2) Add horizontal scaling and load balancer

    3) Add long running tasks / scheduled task support

    4) Deploys will likely break long running tasks. Implement blue/green or rolling deploys or some other sort of advanced deployment scheme

    5) Implement rollbacks

    • dirtbag__dad 14 hours ago
      This! This is what I’ve seen at my companies and is super salient to today’s real life work ~
    • anjneymidha 16 hours ago
      Love this. Easy to Advanced, with 5 for extra credit. Thank you
    • lizzas 13 hours ago
      6) Feature flags, telemetry, soaking

      7) Alarms

  • joschi03 18 hours ago
    At multiple points in my career I stumbled upon stuff from Bredan Greg. He is highly skilled in large-scale distributed computing but also down to the nitty gritty details (bits).
  • WobblyTyre 14 hours ago
    I don't have recommendations like others here. But as a junior engineer still coming upto speed with real engineering, I'd really appreciate it if this was course was made open (interms of lectures, assignments etc) to help folks like me audit & learn
  • qm2crossing 19 hours ago
    kyle kingsbury/aphyr of jepsen seems like an exemplar of #1
    • anjneymidha 18 hours ago
      this is an awesome rec thank you
  • jdenning 15 hours ago
    1) in addition to the excellent recommendations already mentioned:

    Brendan Gregg has a lot of good stuff about monitoring and performance analysis https://brendangregg.com/ https://github.com/brendangregg

    Also Jess Frazelle (lots of good stuff, esp around containerization): https://blog.jessfraz.com/ https://github.com/jessfraz

  • huevosabio 18 hours ago
    1) you should reach out to the Convex.dev folks. They have built a solid infra platform, and their backend is open sourced(ish). They are ex-Dropbox as well. And finally they love to share!

    2) I think multiplayer games could be interesting! Lots of meat while still having a lot of space to calibrate the scope.

    • anjneymidha 18 hours ago
      convex is really elegant and now that you mention it, multiplayer games like their ai-town agent sim is such a great fit for the class - thank you
  • mavelikara 16 hours ago
    Not unsung, but Jay Kreps has made original contributions to the practice of building large scale systems. He also built a big business around it, so that perspective might also be interesting to students.
  • romanhn 18 hours ago
    Charity Majors (https://charity.wtf) is a great writer and speaker, and her work on observability is directly relevant to infra at scale.
  • majke 18 hours ago
    Quite a strong cast of presenters back in Jan 2024 https://cs153.stanford.edu/syllabus.html
    • anjneymidha 17 hours ago
      thanks for noticing! this is the first time we're expanding it from 'security at scale' to 'infra at scale', but we've taught this course 2 yrs in a row now
      • kyawzazaw 13 hours ago
        curious to learn how many undergrads took this?
  • mad44 14 hours ago
  • jjoe 18 hours ago
    Maybe reach out to Netflix's live streaming dept. since we all learn so much more from our own failures.

    Cheers!

  • randomcatuser 18 hours ago
    i didn't know you could do that! how does one volunteer to teach a course?
  • dirtbag__dad 14 hours ago
    Infrastructure for gov cloud is another beast and might make a fun case study
    • dirtbag__dad 14 hours ago
      Also the folks at a company like render, railway, or even supabase might be fascinating - what it takes to write an infra abstraction at scale
  • tayo42 18 hours ago
    couldn't find the syllabus

    deploy something like cassandra and make a system that can update the kernel on the servers running the databases without downtime or losing data

    or come up with some distrubuted blob store thing/cdn for world wide users

    my whole career has been automating updates for software or operating systems lol