Correlation

Measuring Linear Relationships

Pearson's r measures the strength and direction of a linear relationship between two variables. It ranges from $-1$ to $+1$ .

$r = rac{sum_{i=1}^n (x_i - ar{x})(y_i - ar{y})}{sqrt{sum_{i=1}^n (x_i - ar{x})^2 sum_{i=1}^n (y_i - ar{y})^2}}$

def pearson_r(x, y):
    n = len(x)
    mx = sum(x) / n
    my = sum(y) / n
    num = sum((xi - mx) * (yi - my) for xi, yi in zip(x, y))
    den = (sum((xi - mx)**2 for xi in x) * sum((yi - my)**2 for yi in y)) ** 0.5
    return num / den

x = [1, 2, 3, 4, 5]
y = [2, 4, 6, 8, 10]   # perfect positive relationship

r = pearson_r(x, y)
print(round(r, 4))   # 1.0

Interpreting r

$r$ value	Interpretation
$r = 1.0$	Perfect positive linear relationship
$r = 0.7$	Strong positive
$r = 0.3$	Weak positive
$r = 0.0$	No linear relationship
$r = -0.7$	Strong negative
$r = -1.0$	Perfect negative linear relationship

Correlation does not imply Causation

A high correlation between $X$ and $Y$ does not mean $X$ causes $Y$ . There may be a confounding variable, or the relationship may be coincidental.

Significance

The $t$ -statistic

$t = rsqrt{ rac{n-2}{1-r^2}}$

follows a $t$ -distribution with $df = n - 2$ , allowing us to test if $r$ is significantly different from 0.

Your Task

Implement pearson_r(x, y) that prints the correlation coefficient $r$ (rounded to 4 decimal places) and whether the relationship is statistically significant ( $p < 0.001$ ).

← Previous Next →

Pyodide loading...

Click "Run" to execute your code.