Lesson 9 of 17
Linear Layer
The Linear Layer
A linear layer computes y = Wx — a matrix-vector product. It is the fundamental building block of every neural network.
Matrix-Vector Multiplication
Given weight matrix W (rows × cols) and input vector x (cols):
y[i] = sum(W[i][j] * x[j] for j in range(cols))
Each output element is the dot product of one row of W with x.
The Two-Line Implementation
In MicroGPT, this is written as a list comprehension:
def linear(x, w):
return [sum(wi * xi for wi, xi in zip(wo, x)) for wo in w]
This works with both plain floats and Value objects — the arithmetic operations are the same either way.
Example
x = [1.0, 0.0]
w = [[1.0, 2.0],
[3.0, 4.0]]
result = linear(x, w)
# row 0: 1*1 + 2*0 = 1.0
# row 1: 3*1 + 4*0 = 3.0
# result = [1.0, 3.0]
The GPT uses 7 linear layers
attn_wq,attn_wk,attn_wv: project tokens to queries, keys, valuesattn_wo: project attention output backmlp_fc1,mlp_fc2: the two-layer feedforward blocklm_head: project hidden state to vocabulary logits
Your Task
Implement linear(x, w).
Python runtime loading...
Loading...
Click "Run" to execute your code.