What's Next?
Congratulations
You have built a GPT from scratch. You now understand every component of a transformer language model: the autograd engine, tokenizer, linear layers, attention, and the Adam optimizer.
What to Explore Next
- Andrej Karpathy's micrograd -- The autograd engine in this course is inspired by micrograd. The original ~100-line implementation is worth reading.
- Andrej Karpathy's makemore -- A character-level language model built step by step, leading up to a full GPT. The YouTube series is excellent.
- nanoGPT -- A clean, minimal GPT-2 implementation in PyTorch by Karpathy. Around 300 lines of model code.
- Attention Is All You Need -- The 2017 paper that introduced the transformer architecture. Short and readable.
- The Illustrated Transformer -- Jay Alammar's visual walkthrough of the transformer. The best introduction for visual learners.
Tools and Libraries
- PyTorch -- The standard deep learning framework. Now that you understand what
autograddoes, PyTorch will feel familiar. - Hugging Face Transformers -- Pre-trained models including GPT-2.
- tiktoken -- OpenAI's fast BPE tokenizer (what GPT-4 uses).
Further Reading
- Deep Learning by Goodfellow, Bengio & Courville -- The standard textbook.
- The Little Book of Deep Learning by François Fleuret -- A concise, free introduction.