What's Next?

Congratulations

You have built a GPT from scratch. You now understand every component of a transformer language model: the autograd engine, tokenizer, linear layers, attention, and the Adam optimizer.

What to Explore Next

  • Andrej Karpathy's micrograd -- The autograd engine in this course is inspired by micrograd. The original ~100-line implementation is worth reading.
  • Andrej Karpathy's makemore -- A character-level language model built step by step, leading up to a full GPT. The YouTube series is excellent.
  • nanoGPT -- A clean, minimal GPT-2 implementation in PyTorch by Karpathy. Around 300 lines of model code.
  • Attention Is All You Need -- The 2017 paper that introduced the transformer architecture. Short and readable.
  • The Illustrated Transformer -- Jay Alammar's visual walkthrough of the transformer. The best introduction for visual learners.

Tools and Libraries

  • PyTorch -- The standard deep learning framework. Now that you understand what autograd does, PyTorch will feel familiar.
  • Hugging Face Transformers -- Pre-trained models including GPT-2.
  • tiktoken -- OpenAI's fast BPE tokenizer (what GPT-4 uses).

Further Reading

  • Deep Learning by Goodfellow, Bengio & Courville -- The standard textbook.
  • The Little Book of Deep Learning by François Fleuret -- A concise, free introduction.
← Previous