Lesson 13 of 18

Correlation

Measuring Linear Relationships

Pearson's r measures the strength and direction of a linear relationship between two variables. It ranges from 1-1 to +1+1.

r = rac{sum_{i=1}^n (x_i - ar{x})(y_i - ar{y})}{sqrt{sum_{i=1}^n (x_i - ar{x})^2 sum_{i=1}^n (y_i - ar{y})^2}}

def pearson_r(x, y):
    n = len(x)
    mx = sum(x) / n
    my = sum(y) / n
    num = sum((xi - mx) * (yi - my) for xi, yi in zip(x, y))
    den = (sum((xi - mx)**2 for xi in x) * sum((yi - my)**2 for yi in y)) ** 0.5
    return num / den

x = [1, 2, 3, 4, 5]
y = [2, 4, 6, 8, 10]   # perfect positive relationship

r = pearson_r(x, y)
print(round(r, 4))   # 1.0

Interpreting r

rr valueInterpretation
r=1.0r = 1.0Perfect positive linear relationship
r=0.7r = 0.7Strong positive
r=0.3r = 0.3Weak positive
r=0.0r = 0.0No linear relationship
r=0.7r = -0.7Strong negative
r=1.0r = -1.0Perfect negative linear relationship

Correlation does not imply Causation

A high correlation between XX and YY does not mean XX causes YY. There may be a confounding variable, or the relationship may be coincidental.

Significance

The tt-statistic

t = rsqrt{ rac{n-2}{1-r^2}}

follows a tt-distribution with df=n2df = n - 2, allowing us to test if rr is significantly different from 0.

Your Task

Implement pearson_r(x, y) that prints the correlation coefficient rr (rounded to 4 decimal places) and whether the relationship is statistically significant (p<0.001p < 0.001).

Pyodide loading...
Loading...
Click "Run" to execute your code.