Lesson 14 of 18
Linear Regression
Fitting a Line
Linear regression finds the best-fit line through data points:
hat{y} = eta_1 x + eta_0
where eta_1 is the slope and eta_0 is the intercept.
def linear_regression(x, y):
n = len(x)
sx, sy = sum(x), sum(y)
sxy = sum(xi*yi for xi, yi in zip(x, y))
sxx = sum(xi**2 for xi in x)
slope = (n*sxy - sx*sy) / (n*sxx - sx**2)
intercept = (sy - slope*sx) / n
return slope, intercept
x = [0, 1, 2, 3, 4]
y = [1, 3, 5, 7, 9] # y = 2x + 1
slope, intercept = linear_regression(x, y)
print(round(slope, 4)) # 2.0
print(round(intercept, 4)) # 1.0
Least Squares
The formula minimizes the sum of squared residuals:
The closed-form solution is:
ight)^2}$$ ### $R^2$ (Coefficient of Determination) $R^2$ measures how much variance in $y$ is explained by $x$: $$R^2 = 1 - rac{ ext{SSR}}{ ext{SST}} = 1 - rac{sum(y_i - hat{y}_i)^2}{sum(y_i - ar{y})^2}$$ - $R^2 = 1.0$ — perfect fit, all points on the line - $R^2 = 0.0$ — the line explains nothing - $R^2 = 0.8$ — the line explains 80% of the variance ```python mean_y = sy / n yhat = [slope*xi + intercept for xi in x] ss_res = sum((yi - yhi)**2 for yi, yhi in zip(y, yhat)) ss_tot = sum((yi - mean_y)**2 for yi in y) r_sq = 1 - ss_res / ss_tot ``` ### Assumptions Linear regression assumes: 1. Linear relationship between $x$ and $y$ 2. Homoscedasticity (equal variance of residuals) 3. Independent observations 4. Approximately normal residuals ### Your Task Implement `linear_regression(x, y)` that prints the **slope**, **intercept**, and $R^2$ (coefficient of determination), each rounded to 4 decimal places.Pyodide loading...
Loading...
Click "Run" to execute your code.