Writing an LLM from scratch, part 22 – training our LLM

(gilesthomas.com)

176 points | by gpjt 9 hours ago

4 comments

mettamage 8 hours ago
Here's part 1 [1]. Since his archive goes by date, it makes it a bit easier to guestimate which part is made in which month.
[1] https://www.gilesthomas.com/2024/12/llm-from-scratch-1
mrasong 2 hours ago
The cost comparison between local RTX 3090 and cloud A100 clusters is useful, but I wonder if the author accounted for hidden overhead—like data transfer time for large datasets or the time spent debugging CUDA compatibility issues on local hardware.
js8 2 hours ago
It's based on a book https://www.manning.com/books/build-a-large-language-model-f..., is it a good book?
[-]
- checker659 1 hour ago
  I have done a little bit of DL stuff (with keras) before this. I'm currently in the attention chapter. The book gives you the code, but I feel like there is very little in the way of building intuition. Thankfully, there are tons of videos online to help with that.
  I think it is a great guide. An extended tutorial if you will (at least until this point in my reading). Also having the code right in front of you helps a lot. For example, I was under the impression that embedding vectors were static like in word2vec. Turns out, they are learnable parameters too. I wouldn't have been able to tell for sure if I didn't have the code right in front of me.
roschdal 3 hours ago
Nice, this is a recipe for making an evil AI which will destroy humanity.