Lesson 13 of 18
Correlation
Measuring Linear Relationships
Pearson's r measures the strength and direction of a linear relationship between two variables. It ranges from to .
r = rac{sum_{i=1}^n (x_i - ar{x})(y_i - ar{y})}{sqrt{sum_{i=1}^n (x_i - ar{x})^2 sum_{i=1}^n (y_i - ar{y})^2}}
def pearson_r(x, y):
n = len(x)
mx = sum(x) / n
my = sum(y) / n
num = sum((xi - mx) * (yi - my) for xi, yi in zip(x, y))
den = (sum((xi - mx)**2 for xi in x) * sum((yi - my)**2 for yi in y)) ** 0.5
return num / den
x = [1, 2, 3, 4, 5]
y = [2, 4, 6, 8, 10] # perfect positive relationship
r = pearson_r(x, y)
print(round(r, 4)) # 1.0
Interpreting r
| value | Interpretation |
|---|---|
| Perfect positive linear relationship | |
| Strong positive | |
| Weak positive | |
| No linear relationship | |
| Strong negative | |
| Perfect negative linear relationship |
Correlation does not imply Causation
A high correlation between and does not mean causes . There may be a confounding variable, or the relationship may be coincidental.
Significance
The -statistic
t = rsqrt{rac{n-2}{1-r^2}}
follows a -distribution with , allowing us to test if is significantly different from 0.
Your Task
Implement pearson_r(x, y) that prints the correlation coefficient (rounded to 4 decimal places) and whether the relationship is statistically significant ().
Pyodide loading...
Loading...
Click "Run" to execute your code.