What's Next?

Congratulations

You have built a GPT from scratch. You now understand every component of a transformer language model: the autograd engine, tokenizer, linear layers, attention, and the Adam optimizer.

What to Explore Next

Andrej Karpathy's micrograd -- The autograd engine in this course is inspired by micrograd. The original ~100-line implementation is worth reading.
Andrej Karpathy's makemore -- A character-level language model built step by step, leading up to a full GPT. The YouTube series is excellent.
nanoGPT -- A clean, minimal GPT-2 implementation in PyTorch by Karpathy. Around 300 lines of model code.
Attention Is All You Need -- The 2017 paper that introduced the transformer architecture. Short and readable.
The Illustrated Transformer -- Jay Alammar's visual walkthrough of the transformer. The best introduction for visual learners.

Tools and Libraries

PyTorch -- The standard deep learning framework. Now that you understand what autograd does, PyTorch will feel familiar.
Hugging Face Transformers -- Pre-trained models including GPT-2.
tiktoken -- OpenAI's fast BPE tokenizer (what GPT-4 uses).

What's Next?

Congratulations

What to Explore Next

Tools and Libraries

Further Reading